One of the easiest and most reliable ways of getting data into R is to use CSV files.
The CSV file (Comma Separated Values file) is a widely supported file format used to store tabular data. It uses commas to separate the different values in a line, where each line is a row of data.
R’s Built-in csv parser makes it easy to read, write, and process data from CSV files.
Read a CSV File
Suppose you have the following CSV file.
name,age,job,city
Bob,25,Manager,Seattle
Sam,30,Developer,New York
You can open a file and read its contents by using the read.csv()
function specifying its name. It reads the data into a data frame.
# Read entire CSV file into a data frame
mydata <- read.csv("mydata.csv")
mydata
name age job city
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York
Specify a File
When you specify the filename only, it is assumed that the file is located in the current folder. If it is somewhere else, you can specify the exact path that the file is located at.
Remember! while specifying the exact path, characters prefaced by \
(like \n \r \t etc.) are interpreted as special characters.
You can escape them by:
- Changing the backslashes to forward slashes like:
"C:/data/myfile.csv"
- Using the double backslashes like:
"C:\\data\\myfile.csv"
# Specify absolute path like this
mydata <- read.csv("C:/data/mydata.csv")
# or like this
mydata <- read.csv("C:\\data\\mydata.csv")
If you want to read CSV Data from the Web, substitute a URL for a file name. The read.csv()
functions will read directly from the remote server.
# Read CSV file from Web
mydata <- read.csv("http://www.example.com/download/mydata.csv")
R can also read data from FTP servers, not just HTTP servers.
# Read CSV file from FTP server
mydata <- read.csv("ftp://ftp.example.com/download/mydata.csv")
Set Column Names
The read.csv()
function assumes that the first line of your file is a header line. It takes the column names from the header line for the data frame.
# By default first row is used to name columns
mydata <- read.csv("mydata.csv")
mydata
name age job city
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York
If your file does not contain a header like the file below, then you should specify header=FALSE
so that R can create column names for you (V1, V2, V3 and V4 in this case)
Bob,25,Manager,Seattle
Sam,30,Developer,New York
# If your file doesn't contain a header, set header to FALSE
mydata <- read.csv("mydata.csv",
header = FALSE)
mydata
V1 V2 V3 V4
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York
However, if you want to manually set the column names, you specify col.names argument.
# Manually set the column names
mydata <- read.csv("mydata.csv",
header = FALSE,
col.names = c("name", "age", "job", "city"))
mydata
name age job city
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York
Import the Data as is
The read.csv()
function automatically coerces non-numeric data into a factor (categorical variable). You can see that by inspecting the structure of your data frame.
# By default, non-numeric data is coerced into a factor
mydata <- read.csv("mydata.csv")
str(mydata)
'data.frame': 2 obs. of 4 variables:
$ name: Factor w/ 2 levels "Bob","Sam": 1 2
$ age : int 25 30
$ job : Factor w/ 2 levels "Developer","Manager": 2 1
$ city: Factor w/ 2 levels "New York","Seattle": 2 1
If you want your data interpreted as string rather than a factor, set the as.is
parameter to TRUE.
# Set as.is parameter to TRUE to interpret the data as is
mydata <- read.csv("mydata.csv",
as.is = TRUE)
str(mydata)
'data.frame': 2 obs. of 4 variables:
$ name: chr "Bob" "Sam"
$ age : int 25 30
$ job : chr "Manager" "Developer"
$ city: chr "Seattle" "New York"
Set the Classes of the Columns
You can manually set the classes of the columns using the colClasses argument.
mydata <- read.csv("mydata.csv",
colClasses = c("character", "integer", "factor", "character"))
str(mydata)
'data.frame': 2 obs. of 4 variables:
$ name: chr "Bob" "Sam"
$ age : int 25 30
$ job : Factor w/ 2 levels "Developer","Manager": 2 1
$ city: chr "Seattle" "New York"
Limit the Number of Rows Read
If you want to limit the number of rows to read in, specify nrows argument.
# Read only one record from CSV
mydata <- read.csv("mydata.csv",
nrows = 1)
mydata
name age job city
1 Bob 25 Manager Seattle
Handle Comma Within a Data
Sometimes your CSV file contains fields such as an address that contains a comma. This can become a problem when working with a CSV file.
To handle comma within a data, wrap it in quotes. R considers a comma in a quoted string as an ordinary character.
You can specify the character to be used for quoting using the quote argument.
name,age,address
Bob,25,"113 Cherry St, Seattle, WA 98104, USA"
Sam,30,"150 Greene St, New York, NY 10012, USA"
mydata <- read.csv("mydata.csv"
quote = '"')
mydata
name age address
1 Bob 25 113 Cherry St, Seattle, WA 98104, USA
2 Sam 30 150 Greene St, New York, NY 10012, USA
Write a CSV File
To write to an existing file, use write.csv()
method and pass the data in the form of matrix or data frame.
# Write a CSV File from a data frame
df
name age job city
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York
write.csv(df, "mydata.csv")
"","name","age","job","city"
"1","Bob","25","Manager","Seattle"
"2","Sam","30","Developer","New York"
Notice that the write.csv()
function prepends each row with a row name by default. If you don’t want row labels in your CSV file, set row.names
to FALSE.
# Remove row labels while writing a CSV File
write.csv(df, "mydata.csv",
row.names = FALSE)
"name","age","job","city"
"Bob","25","Manager","Seattle"
"Sam","30","Developer","New York"
Notice that all the values are surrounded by double quotes by default. Set quote = FALSE
to change that.
# Write a CSV file without quotes
write.csv(df, "mydata.csv",
row.names = FALSE,
quote = FALSE)
name,age,job,city
Bob,25,Manager,Seattle
Sam,30,Developer,New York
Append Data to a CSV File
By default, the write.csv()
function overwrites entire file content. To append the data to a CSV File, use the write.table()
method instead and set append = TRUE
.
df
name age job city
1 Amy 20 Developer Houston
write.table(df, "mydata.csv",
append = TRUE,
sep = ",",
col.names = FALSE,
row.names = FALSE,
quote = FALSE)
name,age,job,city
Bob,25,Manager,Seattle
Sam,30,Developer,New York
Amy,20,Developer,Houston