Read and Write CSV Files in R

One of the easiest and most reliable ways of getting data into R is to use CSV files.

The CSV file (Comma Separated Values file) is a widely supported file format used to store tabular data.

It uses commas to separate the different values in a line, where each line is a row of data.

R’s Built-in csv parser makes it easy to read, write, and process data from CSV files.

Read a CSV File

Assume you have the following CSV file.

mydata.csv

name,age,job,city
Bob,25,Manager,Seattle
Sam,30,Developer,New York

You can open a file and read its contents by using the read.csv() function specifying its name.

The read.csv() function reads the data into a data frame.

Example: Read entire CSV file into a data frame

> mydata <- read.csv("mydata.csv")
> mydata
  name age       job     city
1  Bob  25   Manager  Seattle
2  Sam  30 Developer New York

Specify a File

When you specify the filename only, it is assumed that the file is located in the current folder.

You can also specify the exact path that the file is located at.

Remember! while specifying the exact path, characters prefaced by \ (like \n \r \t etc.) are interpreted as special characters.

You can escape them by:

  • Changing the backslashes to forward slashes like: "C:/data/myfile.csv"
  • Using the double backslashes like: "C:\\data\\myfile.csv"

Example: Specifying absolute path

> mydata <- read.csv("C:/data/mydata.csv")

-OR-

> mydata <- read.csv("C:\\data\\mydata.csv")

If you want to read CSV Data from the Web, substitute a URL for a file name. The read.csv() functions will read directly from the remote server.

Example: Read CSV file from Web

> mydata <- read.csv("http://www.example.com/download/mydata.csv")

R can also read data from FTP servers, not just HTTP servers.

Example: Read CSV file from FTP server

> mydata <- read.csv("ftp://ftp.example.com/download/mydata.csv")

Set Column Names

The read.csv() function assumes that the first line of your file is a header line.

It takes the column names from the header line for the data frame.

Example: By default first row is used to name columns

File Contents:

name,age,job,city
Bob,25,Manager,Seattle
Sam,30,Developer,New York

Code:

> mydata <- read.csv("mydata.csv")
> mydata
  name age       job     city
1  Bob  25   Manager  Seattle
2  Sam  30 Developer New York

If your file does not contain a header, then you should specify header=FALSE so that R can create column names for you (V1, V2, V3 and V4 in this case)

Example: If your file doesn’t contain a header, set header to FALSE

File Contents:

Bob,25,Manager,Seattle
Sam,30,Developer,New York

Code:

> mydata <- read.csv("mydata.csv",
+                    header = FALSE)
> mydata
   V1 V2        V3       V4
1 Bob 25   Manager  Seattle
2 Sam 30 Developer New York

However, if you want to manually set the column names, you specify col.names argument.

Example: Manually set the column names

> mydata <- read.csv("mydata.csv",
+                    header = FALSE,
+                    col.names = c("name", "age", "job", "city"))
> mydata
  name age       job     city
1  Bob  25   Manager  Seattle
2  Sam  30 Developer New York

Import the Data as is

The read.csv() function automatically coerces non-numeric data into a factor (categorical variable).

You can see that by inspecting the structure of your data frame.

Example: By default, non-numeric data is coerced into a factor

> mydata <- read.csv("mydata.csv")
> str(mydata)
'data.frame':	2 obs. of  4 variables:
 $ name: Factor w/ 2 levels "Bob","Sam": 1 2
 $ age : int  25 30
 $ job : Factor w/ 2 levels "Developer","Manager": 2 1
 $ city: Factor w/ 2 levels "New York","Seattle": 2 1

If you want your data interpreted as string rather than a factor, set the as.is parameter to TRUE.

Example: Set as.is parameter to TRUE to interpret the data as is

> mydata <- read.csv("mydata.csv",
+                    as.is = TRUE)
> str(mydata)
'data.frame':	2 obs. of  4 variables:
 $ name: chr  "Bob" "Sam"
 $ age : int  25 30
 $ job : chr  "Manager" "Developer"
 $ city: chr  "Seattle" "New York"

Set the Classes of the Columns

You can manually set the classes of the columns using the colClasses argument.

Example:

> mydata <- read.csv("mydata.csv",
+                    colClasses = c("character", "integer", "factor", "character"))
> str(mydata)
'data.frame':	2 obs. of  4 variables:
 $ name: chr  "Bob" "Sam"
 $ age : int  25 30
 $ job : Factor w/ 2 levels "Developer","Manager": 2 1
 $ city: chr  "Seattle" "New York"

Limit the Number of Rows Read

If you want to limit the number of rows to read in, specify nrows argument.

Example: Read only one record from CSV

> mydata <- read.csv("mydata.csv",
+                    nrows = 1)
> mydata
  name age     job    city
1  Bob  25 Manager Seattle

Handle Comma Within a Data

To handle comma within a data, wrap it in quotes.

R treats the specified delimiter in a quoted string as an ordinary character.

You can specify the character to be used for quoting using the quote argument.

Example:

File Contents:

name,age,address
Bob,25,"113 Cherry St, Seattle, WA 98104, USA"
Sam,30,"150 Greene St, New York, NY 10012, USA"

Code:

> mydata <- read.csv("mydata.csv")
> mydata
  name age                                address
1  Bob  25  113 Cherry St, Seattle, WA 98104, USA
2  Sam  30 150 Greene St, New York, NY 10012, USA

Write a CSV File

To write to an existing file, use write.csv() method and pass the data in the form of matrix or data frame.

Example: Write a CSV File from a data frame

Code:

> df
  name age       job     city
1  Bob  25   Manager  Seattle
2  Sam  30 Developer New York

> write.csv(df, "mydata.csv")

New File Contents:

"","name","age","job","city"
"1","Bob","25","Manager","Seattle"
"2","Sam","30","Developer","New York"

Notice that the write.csv() function prepends each row with a row name by default.

If you don’t want row labels in your CSV file, set row.names to FALSE.

Example: Remove row labels while writing a CSV File

Code:

> write.csv(df, "mydata.csv",
+           row.names = FALSE)

New File Contents:

"name","age","job","city"
"Bob","25","Manager","Seattle"
"Sam","30","Developer","New York"

Notice that all the values are surrounded by double quotes by default. Set quote = FALSE to change that.

Example: Write a CSV file without quotes

Code:

> write.csv(df, "mydata.csv",
+           row.names = FALSE,
+           quote = FALSE)

New File Contents:

name,age,job,city
Bob,25,Manager,Seattle
Sam,30,Developer,New York

Append Data to a CSV File

By default, the write.csv() function overwrites entire file content.

To append the data to a CSV File, use the write.table() method instead and set append = TRUE.

Example:

Old File Contents:

name,age,job,city
Bob,25,Manager,Seattle
Sam,30,Developer,New York

Code:

> df
  name age       job    city
1  Amy  20 Developer Houston

> write.table(df, "mydata.csv",
+           append = TRUE,
+           sep = ",",
+           col.names = FALSE,
+           row.names = FALSE,
+           quote = FALSE)

New File Contents:

name,age,job,city
Bob,25,Manager,Seattle
Sam,30,Developer,New York
Amy,20,Developer,Houston