The box-whisker plot (or a boxplot) is a quick and easy way to visualize complex data where you have multiple samples.
A box plot is a good way to get an overall picture of the data set in a compact manner.
The boxplot() function
You can use the
boxplot() function to create box-whisker plots.
It has many options and arguments to control many things, such as the making it horizontal, adding labels, titles and colors.
The syntax for the
boxplot() function is:
|x||A vector of values from which the boxplots are to be produced|
|names||Group labels to be printed under each boxplot|
|xlab||The label for the x axis|
|ylab||The label for the y axis|
|border||A vector of colors for the outlines of the boxplots|
|col||The foreground color of symbols as well as lines|
|notch||if TRUE, a notch is drawn in each side of the boxes|
|horizontal||Set it to TRUE to draw the box-plot horizontally|
|add||Set it to TRUE to add boxplot to current plot|
|…||other graphical parameters|
Create a Box-Whisker Plot
To get started with plot, you need a set of data to work with. Let’s consider the built-in ToothGrowth data set as an example data set.
Here are the first six observations of the data set.
# First six observations of the 'ToothGrowth' data set head(ToothGrowth) len supp dose 1 4.2 VC 0.5 2 11.5 VC 0.5 3 7.3 VC 0.5 4 5.8 VC 0.5 5 6.4 VC 0.5 6 10.0 VC 0.5
ToothGrowth data set
ToothGrowth data set contains observations on effect of vitamin C on tooth growth in 60 guinea pigs, where each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (coded as OJ) or ascorbic acid (coded as VC).
To create a box plot just specify any variable of the data set in
Horizontal Box Plot
You can also draw the box-plot horizontally by setting the horizontal argument to TRUE.
boxplot(ToothGrowth$len, horizontal = TRUE)
Notched Box Plot
The notched box plot allows you to assess whether the medians are different. If the notches do not overlap, there is strong evidence (95% confidence) their medians differ.
You add notches to a box plot by setting the notch argument to TRUE.
# Add notches to a box plot boxplot(ToothGrowth$len, notch = TRUE)
Side-by-Side Box Plots
Often your data set contains a numeric variable (quantitative variable) and a factor (categorical variable). It can be quite tedious to find whether the numeric variable changes according to the level of the factor.
Information of that nature can be gained by plotting box plots side by side.
In R, you can do this by using the boxplot() function with a formula:
boxplot(x ~ f)
Here, x is the numeric variable and f is the factor.
# Creating one box plot for each factor level (dose) boxplot(len ~ dose, data = ToothGrowth)
Grouped Box Plot
A grouped box plot is used when you have a numerical variable, several groups and subgroups.
You can create a grouped box plot by putting interaction of two categorical variables on x-axis and a numeric variable on y-axis.
The interaction of two variables is indicated by separating their names with an asterisk
# Box plot of length based on interaction of two variables (supplement and dose) boxplot(len ~ supp*dose, data = ToothGrowth)
Change Group Names
To change names for group of boxes, use names argument.
boxplot(len ~ dose, data = ToothGrowth, names=c("0.5 mg","1 mg","2 mg"))
Use col argument to change the fill colors used for the boxes.
boxplot(len ~ dose, data = ToothGrowth, col = "dodgerblue1")
You can change the colors of individual boxes by passing a vector of colors to the col argument.
boxplot(len ~ dose, data = ToothGrowth, col = c("orange1", "dodgerblue1", "olivedrab2"))
By using the border argument, you can even change the color used for the border of the boxes.
boxplot(len ~ dose, data = ToothGrowth, col="lightblue1", border="dodgerblue3")
Adding Titles and Axis Labels
You can add your own title and axis labels easily by specifying following arguments.
|main||Main plot title|
boxplot(len ~ dose, data = ToothGrowth, main="Tooth Growth in Guinea Pigs", xlab="Vitamin C dose (mg/day)", ylab="Length of odontoblasts")
Add Means to a Box Plot
The horizontal line in the middle of a box plot is the median, not the mean.
The median alone will not help you understand if the data is normally distributed. So, you need to add mean markers on your box plot.
boxplot(len ~ dose, data=ToothGrowth, col="dodgerblue1") meanval <- by(ToothGrowth$len, ToothGrowth$dose, mean) points(meanval, col="white", pch=18)