R Box-whisker Plot – ggplot2

The box-whisker plot (or a boxplot) is a quick and easy way to visualize complex data where you have multiple samples.

typical box whisker plot

A box plot is a good way to get an overall picture of the data set in a compact manner.

Create a Box-Whisker Plot

To get started, you need a set of data to work with.

Let’s consider the built-in ToothGrowth data set as an example data set.

Here are the first six observations of the data set.

Example: First six observations of the ‘ToothGrowth’ data set

Copied!> head(ToothGrowth)
   len supp dose
1  4.2   VC  0.5
2 11.5   VC  0.5
3  7.3   VC  0.5
4  5.8   VC  0.5
5  6.4   VC  0.5
6 10.0   VC  0.5

ToothGrowth data set

ToothGrowth data set contains observations on effect of vitamin C on tooth growth in 60 guinea pigs, where each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (coded as OJ) or ascorbic acid (coded as VC).

To create a box plot, use ggplot() with geom_boxplot() and specify what variables you want on the X and Y axes.

Example: Create a basic box plot with ggplot

> ggplot(ToothGrowth, aes(x=factor(dose), y=len)) +
+   geom_boxplot()

Plot:

Coloring a Box Plot

Often you want to apply different colors to the boxes in your graph.

By default, box plot use a white color for the boxes. You can change this with the fill argument.

Example: Change the colors of individual boxes (default fill colors)

> ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
+   geom_boxplot()

Plot:

If the default colors aren’t to your liking, you can set the colors manually adding scale_fill_manual()

Example: Manually set fill colors for the boxes

> ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
+   geom_boxplot() +
+   scale_fill_manual(values=c("orange1", "dodgerblue1", "olivedrab2"))

Plot:

It is also possible to use preset color schemes using scale_fill_brewer()

The following palettes are available for use with these scales:

color brewer color schemes

Example: Use preset color schemes for the boxes

> ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
+   geom_boxplot() +
+   scale_fill_brewer(palette="Oranges")

Plot:

Shades of gray come out well in print as well as photocopying.

To display graphs only in gray scale, use scale_fill_grey().

Example: Use grayscale pattern for the boxes

> ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
+   geom_boxplot() +
+   scale_fill_grey()

Plot:

By default, the boxes are colored white and have black outline. You can change that by using fill and color argument.

Example: Change the box color and outline

> ggplot(ToothGrowth, aes(x=factor(dose), y=len)) +
+   geom_boxplot(fill="lightblue1", color="dodgerblue")

Plot:

When there are too many outliers, to avoid overplotting, you can change the size, shape and color of the outlier points with outlier.size, outlier.shape and outlier.color arguments.

By default, the size of the outlier points is 2, shape is 16 and color is black.

Example: Change the size and shape of the outlier points

> ggplot(ToothGrowth, aes(x=factor(dose), y=len)) +
+   geom_boxplot(outlier.size=3, outlier.shape=18, outlier.color="red")

Plot:

Change Theme

The ggplot2 package provides some premade themes to change the overall plot appearance.

With themes you can easily customize some commonly used properties, like background color, panel background color and grid lines.

Example: Change the ggplot theme to ‘Minimal’

> ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
+   geom_boxplot() +
+   theme_minimal()

Plot:

Following themes are available for use:

ggplot theme

Adding Titles and Axis Labels

You can add your own title and axis labels easily by specifying following functions.

FunctionDescription
ggtitle()Main plot title
xlab()x‐axis label
ylab()y‐axis label

Example: Add the title and axis labels to your plot

> ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
+   geom_boxplot() +
+   ggtitle("Tooth Growth in Guinea Pigs") +
+   xlab("Vitamin C dose (mg/day)") +
+   ylab("Length of odontoblasts")

Plot:

Horizontal Box Plot

You can draw draw the box-plot horizontally by incorporating the coord_flip() function, which flips the x and y coordinates.

Example: Make the box plot horizontal

> ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
+   geom_boxplot() +
+   coord_flip()

Plot:

Notched Box Plot

The notched box plot allows you to assess whether the medians are different.

If the notches do not overlap, there is strong evidence (95% confidence) their medians differ.

You add notches to a box plot by setting the notch argument to TRUE in geom_boxplot().

Example: Add notches to a box plot

> ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
+   geom_boxplot(notch=TRUE)

Plot:

Add Means to a Box Plot

The horizontal line in the middle of a box plot is the median, not the mean.

The median alone will not help you understand if the data is normally distributed.

So, you need to add mean markers on your box plot.

Example: Add mean markers on a box plot

> ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
+   geom_boxplot() +
+   stat_summary(fun.y=mean, geom="point", shape=18, size=3, color="white")

Plot:

Grouped Box Plot

You can also easily group box plots by the levels of a categorical variable.

There are two options to create a grouped Box Plot

In the Same Plot

In order to plot the two supplement levels in the same plot, you need to map the categorical variable “supp” to fill.

Example: Plot the two supplement levels in the same plot

> ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=supp)) +
+   geom_boxplot()

Plot:

In Panel Plot

In order to produce a panel plot by supplement levels, you need to add the facet_grid(. ~ supp) option to the plot.

Example: Plot the two supplement levels in separate (panel) plots

> ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
+   geom_boxplot() +
+   facet_grid(. ~ supp)

Plot:

Change the Order of Items

The order of items on a categorical axis can be changed by specifying limits in scale_x_discrete() or scale_y_discrete().

Simply pass a vector of the levels in the desired order.

Example: Change the order of items on a categorical axis

> ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
+   geom_boxplot() +
+   scale_x_discrete(limits=c("2", "1", "0.5"))

Plot:

You can also omit some items with this vector.

Example: Omit 1mg/day dose level

> ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
+   geom_boxplot() +
+   scale_x_discrete(limits=c("2", "0.5"))

Plot:

Box Plot with Dots

Overlaying a symmetrical dot density plot on a box plot has the potential to give the benefits of both plots.

You can achieve this by adding the geom_dotplot() function.

Example: Overlay a symmetrical dot density plot on a box plot

> ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
+   geom_boxplot() +
+   geom_dotplot(binaxis='y', stackdir='center', dotsize=1)

Plot:

Box Plot with Jittered Dots

Sometimes you may want the additional insight that you get from the raw data points.

For example, overlaying all of the data points for that group on each box plot will give you an idea of the sample size of the group.

You can achieve this by adding the geom_jitter() function.

Example: Add jitter over box plot

> ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
+   geom_boxplot(outlier.shape=NA) +
+   geom_jitter(position=position_jitter(0.2))

Plot: