Read and Write Excel Files in R

Excel is the most popular spreadsheet software used to store tabular data.

So, it’s important to be able to efficiently import and export data from these files.

R’s xlsx package makes it easy to read, write, and format excel files.

The xlsx Package

The xlsx package provides necessary tools to interact with both .xls or .xlsx format files from R.

In order to get started you first need to install and load the package.

Example: Install and load xlsx package

> install.packages("xlsx")
> library("xlsx")

Read an Excel file

Assume you have the following Excel file.

mydata.xlsx

rexcel01

You can read the contents of an Excel worksheet using the read.xlsx() or read.xlsx2() function.

The read.xlsx() function reads the data and creates a data frame.

Example: Read the first excel worksheet

> library(xlsx)
> mydata <- read.xlsx("mydata.xlsx", sheetIndex=1)
> mydata
  name age       job     city
1  Bob  25   Manager  Seattle
2  Sam  30 Developer New York
3  Amy  20 Developer  Houston

read.xlsx() vs read.xlsx2()

Both the functions work exactly the same except, read.xlsx() is slow for large data sets (worksheet with more than 100 000 cells).

On the contrary, read.xlsx2() is faster on big files.

Specify a File Name

When you specify the filename only, it is assumed that the file is located in the current folder.

You can also specify the exact path that the file is located at.

Remember! While specifying the exact path, characters prefaced by \ (like \n \r \t etc.) are interpreted as special characters.

You can escape them using:

  • Changing the backslashes to forward slashes like: "C:/data/myfile.xlsx"
  • Using the double backslashes like: "C:\\data\\myfile.xlsx"

Example: Specifying absolute path

> mydata <- read.csv("C:/data/mydata.xlsx")

-OR-

> mydata <- read.csv("C:\\data\\mydata.xlsx")

Specify Worksheet

When you use read.xlsx() function, along with a filename you also need to specify the worksheet that you want to import data from.

To specify the worksheet, you can pass either an integer indicating the position of the worksheet (for example, sheetIndex=1) or the name of the worksheet (for example, sheetName="Sheet1" )

The following two lines do exactly the same thing; they both import the data in the first worksheet (called Sheet1):

Example:

> mydata <- read.xlsx("mydata.xlsx", sheetIndex = 1)
> mydata <- read.xlsx("mydata.xlsx", sheetIndex = "Sheet1")

Import the Data as is

The read.xlsx() function automatically coerces character data into a factor (categorical variable).

You can see that by inspecting the structure of your data frame.

Example: By default, character data is coerced into a factor

> mydata <- read.xlsx("mydata.xlsx", sheetIndex = 1)
> str(mydata)
'data.frame':	3 obs. of  4 variables:
 $ name: Factor w/ 3 levels "Amy","Bob","Sam": 2 3 1
 $ age : num  25 30 20
 $ job : Factor w/ 2 levels "Developer","Manager": 2 1 1
 $ city: Factor w/ 3 levels "Houston","New York",..: 3 2 1

If you want your data interpreted as string rather than a factor, set the stringsAsFactors parameter to FALSE.

Example: Set stringsAsFactors parameter to TRUE to interpret the data as is

> mydata <- read.xlsx("mydata.xlsx",
+                     sheetIndex = 1,
+                     stringsAsFactors = FALSE)
> str(mydata)
'data.frame':	3 obs. of  4 variables:
 $ name: chr  "Bob" "Sam" "Amy"
 $ age : num  25 30 20
 $ job : chr  "Manager" "Developer" "Developer"
 $ city: chr  "Seattle" "New York" "Houston"

Read Specific Range

If you want to read a range of rows, specify the rowIndex argument.

Example: Read first three lines of a file

> mydata <- read.xlsx("mydata.xlsx",
+                     sheetIndex = 1,
+                     rowIndex = 1:3)
> mydata
  name age       job     city
1  Bob  25   Manager  Seattle
2  Sam  30 Developer New York

If you want to read a range of columns, specify the colIndex argument.

Example: Read first two columns of a file

> mydata <- read.xlsx("mydata.xlsx",
+                     sheetIndex = 1,
+                     colIndex = 1:2)
> mydata
  name age
1  Bob  25
2  Sam  30
3  Amy  20

Specify Starting Row

Sometimes the excel file may contain notes, comments, headers, etc. at the beginning which you may not want to include.

To start reading data from a specified row in the excel worksheet, pass startRow argument.

Example: Read excel file from third row

File Contents:

rexcel02

Code:

> mydata <- read.xlsx("mydata.xlsx",
+                     sheetIndex = 1,
+                     startRow = 3)
> mydata
  name age       job     city
1  Bob  25   Manager  Seattle
2  Sam  30 Developer New York
3  Amy  20 Developer  Houston

Write Data to an Excel File

To write to an existing file, use write.xlsx() method and pass the data in the form of matrix or data frame.

Example: Export data from R to an excel workbook

Code:

> df
  name age       job     city
1  Bob  25   Manager  Seattle
2  Sam  30 Developer New York
3  Amy  20 Developer  Houston

> write.xlsx(df, file = "mydata.xlsx")

New File Contents:

rexcel03

Notice that the write.xlsx() function prepends each row with a row name by default.

If you don’t want row labels in your excel file, set row.names to FALSE.

Example: Remove row labels while writing an excel File

Code:

> write.xlsx(df, file="mydata.xlsx",
+            row.names = FALSE)

New File Contents:

rexcel01

To set the name of the current worksheet, specify sheetName argument.

Example: Rename current worksheet

Code:

> write.xlsx(df, file="mydata.xlsx",
+            row.names = FALSE,
+            sheetName = "Records")

New File Contents:

rexcel04

Add Multiple Datasets at once

To add multiple data sets in the same Excel workbook, you have to set the append argument to TRUE.

Example:

Code:

> # Write the first data set
> write.xlsx(iris, file = "mydata.xlsx",
+            sheetName = "IRIS", append = FALSE)

> # Add a second data set
> write.xlsx(mtcars, file = "mydata.xlsx",
+            sheetName = "CARS", append = TRUE)

> # Add a third data set
> write.xlsx(Titanic, file = "mydata.xlsx",
+            sheetName = "TITANIC", append = TRUE)

New File Contents:

rexcel05

Create and Format an Excel Workbook

Sometimes you may wish to create a .xlsx file with some formatting.

With the help of xlsx package, you can edit titles, borders, column width, format data table, add plot and much more.

The following example shows how to do so:

Step 1. Create a new excel workbook

You can create a new workbook using the createWorkbook() function.

Example:

> # create new workbook
> wb <- createWorkbook()

Step 2. Define cell styles for formatting the workbook

In R, using the CellStyle() function you can create your own cell styles to change the appearance of, for example:

  • The sheet title
  • The row and column names
  • Text alignment for the columns
  • Cell borders around the columns

Example:

> # define style for title
> title_style <- CellStyle(wb) +
+   Font(wb, heightInPoints = 16,
+        isBold = TRUE)

> # define style for row and column names
> rowname_style <- CellStyle(wb) +
+   Font(wb, isBold = TRUE)
> colname_style <- CellStyle(wb) +
+   Font(wb, isBold = TRUE) +
+   Alignment(wrapText = TRUE, horizontal = "ALIGN_CENTER") +
+   Border(color = "black",
+          position =c("TOP", "BOTTOM"),
+          pen =c("BORDER_THIN", "BORDER_THIN"))

Step 3. Create worksheet and add title

Before you add data, you have to create an empty worksheet in the workbook. You can do this by using the creatSheet() function.

Example:

> # create a worksheet named 'Data'
> ws <- createSheet(wb, sheetName = "Data")

Step 4. Add sheet title

Here’s how you can add a title.

Example:

> # create a new row 
> rows <- createRow(ws, rowIndex = 1)

> # create a cell in the row to contain the title.
> sheetTitle <- createCell(rows, colIndex = 1)

> # set the cell value
> setCellValue(sheetTitle[[1,1]], "Vapor Pressure of Mercury")

> # set the cell style
> setCellStyle(sheetTitle[[1,1]], title_style)

Step 5. Add a table into a worksheet

With the addDataframe() function, you can add the data table in the newly created worksheet.

Below example adds built-in pressure dataset on row #3.

Example:

> # add data table to worksheet
> addDataFrame(pressure, sheet = ws, startRow = 3, startColumn = 1,
+              colnamesStyle = colname_style,
+              rownamesStyle = rowname_style,
+              row.names = FALSE)

Step 6. Add a plot into a worksheet

You can add a plot in the worksheet using the addPicture() function.

Example:

> # create a png plot
> png("plot.png", height=900, width=1600, res=250, pointsize=8)
> plot(pressure, xlab = "Temperature (deg C)",
+      ylab = "Pressure (mm of Hg)",
+      main = "pressure data: Vapor Pressure of Mercury",
+      col="red", pch=19, type="b")
> dev.off()

> # Create a new sheet to contain the plot
> sheet <-createSheet(wb, sheetName = "plot")

> # Add the plot created previously
> addPicture("plot.png", sheet, scale = 1, startRow = 2,
+            startColumn = 1)

> # Remove the plot from the disk
> res<-file.remove("plot.png")

Step 7. Change column width

Now change the column width to fit the contents.

Example:

> # change column width of first 2 columns
> setColumnWidth(sheet = ws, colIndex = 1:2, colWidth = 15)

Step 8. Save the workbook

Finally, save the workbook with the saveWorkbook() function.

Example:

> # save workbook
> saveWorkbook(wb, file = "mydata.xlsx")

Step 9. View the result

r excel format1
r excel format2