R Box-whisker Plot – Base Graph

The box-whisker plot (or a boxplot) is a quick and easy way to visualize complex data where you have multiple samples.

typical box whisker plot

A box plot is a good way to get an overall picture of the data set in a compact manner.

The boxplot() function

You can use the boxplot() function to create box-whisker plots.

It has many options and arguments to control many things, such as the making it horizontal, adding labels, titles and colors.

Syntax

The syntax for the boxplot() function is:

boxplot(x,names,xlab,ylab,border,col,notch,horizontal,add,)

Parameters

ParameterDescription
xA vector of values from which the boxplots are to be produced
namesGroup labels to be printed under each boxplot
xlabThe label for the x axis
ylabThe label for the y axis
borderA vector of colors for the outlines of the boxplots
colThe foreground color of symbols as well as lines
notchif TRUE, a notch is drawn in each side of the boxes
horizontalSet it to TRUE to draw the box-plot horizontally
addSet it to TRUE to add boxplot to current plot
other graphical parameters

Create a Box-Whisker Plot

To get started with plot, you need a set of data to work with.

Let’s consider the built-in ToothGrowth data set as an example data set.

Here are the first six observations of the data set.

Example: First six observations of the ‘ToothGrowth’ data set

> head(ToothGrowth)
   len supp dose
1  4.2   VC  0.5
2 11.5   VC  0.5
3  7.3   VC  0.5
4  5.8   VC  0.5
5  6.4   VC  0.5
6 10.0   VC  0.5

ToothGrowth data set

ToothGrowth data set contains observations on effect of vitamin C on tooth growth in 60 guinea pigs, where each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (coded as OJ) or ascorbic acid (coded as VC).

To create a box plot just specify any variable of the data set in boxplot() function.

Example: Create a box-whisker plot of length

> boxplot(ToothGrowth$len)

Plot:

Horizontal Box Plot

You can also draw the box-plot horizontally by setting the horizontal argument to TRUE.

Example: Make the box-plot horizontal

> boxplot(ToothGrowth$len,
+         horizontal = TRUE)

Plot:

Notched Box Plot

The notched box plot allows you to assess whether the medians are different.

If the notches do not overlap, there is strong evidence (95% confidence) their medians differ.

You add notches to a box plot by setting the notch argument to TRUE.

Example: Add notches to a box plot

> boxplot(ToothGrowth$len,
+         notch = TRUE)

Plot:

Side-by-Side Box Plots

Often your data set contains a numeric variable (quantitative variable) and a factor (categorical variable).

It can be quite tedious to find whether the numeric variable changes according to the level of the factor.

Information of that nature can be gained by plotting box plots side by side.

In R, you can do this by using the boxplot() function with a formula:

boxplot(x ~ f)

Here, x is the numeric variable and f is the factor.

Example: Creating one box plot for each factor level (dose)

> boxplot(len ~ dose, data = ToothGrowth)

Plot:

Grouped Box Plot

A grouped box plot is used when you have a numerical variable, several groups and subgroups.

You can create a grouped box plot by putting interaction of two categorical variables on x-axis and a numeric variable on y-axis.

The interaction of two variables is indicated by separating their names with an asterisk *

Example: Box plot of length based on interaction of two variables (supplement and dose)

> boxplot(len ~ supp*dose, data = ToothGrowth)

Plot:

Change Group Names

To change names for group of boxes, use names argument.

Example: Change names for group of boxes

> boxplot(len ~ dose, data = ToothGrowth,
+         names=c("0.5 mg","1 mg","2 mg"))

Plot:

Change Colors

Use col argument to change the fill colors used for the boxes.

Example: Change the box color

> boxplot(len ~ dose, data = ToothGrowth,
+         col = "dodgerblue1")

Plot:

You can change the colors of individual boxes by passing a vector of colors to the col argument.

Example: Change the colors of individual boxes

> boxplot(len ~ dose, data = ToothGrowth,
+         col = c("orange1", "dodgerblue1", "olivedrab2"))

Plot:

By using the border argument, you can even change the color used for the border of the boxes.

Example: Change the color used for the border of the boxes

> boxplot(len ~ dose, data = ToothGrowth,
+         col="lightblue1",
+         border="dodgerblue3")

Plot:

Adding Titles and Axis Labels

You can add your own title and axis labels easily by specifying following arguments.

ArgumentDescription
mainMain plot title
xlabx‐axis label
ylaby‐axis label

Example: Add the title and axis labels to your plot

> boxplot(len ~ dose, data = ToothGrowth,
+         main="Tooth Growth in Guinea Pigs",
+         xlab="Vitamin C dose (mg/day)",
+         ylab="Length of odontoblasts")

Plot:

Add Means to a Box Plot

The horizontal line in the middle of a box plot is the median, not the mean.

The median alone will not help you understand if the data is normally distributed.

So, you need to add mean markers on your box plot.

Example: Add mean markers on a box plot

> boxplot(len ~ dose, data=ToothGrowth,
+         col="dodgerblue1")
> meanval <- by(ToothGrowth$len, ToothGrowth$dose, mean)
> points(meanval, col="white", pch=18)

Plot: