The box-whisker plot (or a boxplot) is a quick and easy way to visualize complex data where you have multiple samples.

A box plot is a good way to get an overall picture of the data set in a compact manner.

## Create a Box-Whisker Plot

To get started, you need a set of data to work with. Let’s consider the built-in ToothGrowth data set as an example data set.

Here are the first six observations of the data set.

```
# First six observations of the ‘ToothGrowth’ data set
head(ToothGrowth)
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
4 5.8 VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
```

**ToothGrowth data set**

ToothGrowth data set contains observations on effect of vitamin C on tooth growth in 60 guinea pigs, where each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (coded as OJ) or ascorbic acid (coded as VC).

To create a box plot, use `ggplot()`

with `geom_boxplot()`

and specify what variables you want on the X and Y axes.

```
# Create a basic box plot with ggplot
ggplot(ToothGrowth, aes(x=factor(dose), y=len)) +
geom_boxplot()
```

## Coloring a Box Plot

Often you want to apply different colors to the boxes in your graph.

By default, box plot use a white color for the boxes. You can change this with the fill argument.

```
# Change the colors of individual boxes (default fill colors)
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
geom_boxplot()
```

If the default colors aren’t to your liking, you can set the colors manually adding `scale_fill_manual()`

```
# Manually set fill colors for the boxes
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
geom_boxplot() +
scale_fill_manual(values=c("orange1", "dodgerblue1", "olivedrab2"))
```

It is also possible to use preset color schemes using `scale_fill_brewer()`

The following palettes are available for use with these scales:

```
# Use preset color schemes for the boxes
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
geom_boxplot() +
scale_fill_brewer(palette="Oranges")
```

Shades of gray come out well in print as well as photocopying. To display graphs only in gray scale, use `scale_fill_grey()`

.

```
# Use grayscale pattern for the boxes
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
geom_boxplot() +
scale_fill_grey()
```

By default, the boxes are colored white and have black outline. You can change that by using fill and color argument.

```
# Change the box color and outline
ggplot(ToothGrowth, aes(x=factor(dose), y=len)) +
geom_boxplot(fill="lightblue1", color="dodgerblue")
```

When there are too many outliers, to avoid overplotting, you can change the size, shape and color of the outlier points with outlier.size, outlier.shape and outlier.color arguments.

By default, the size of the outlier points is 2, shape is 16 and color is black.

```
# Change the size and shape of the outlier points
ggplot(ToothGrowth, aes(x=factor(dose), y=len)) +
geom_boxplot(outlier.size=3, outlier.shape=18, outlier.color="red")
```

## Change Theme

The ggplot2 package provides some premade themes to change the overall plot appearance.

With themes you can easily customize some commonly used properties, like background color, panel background color and grid lines.

```
# Change the ggplot theme to ‘Minimal’
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
geom_boxplot() +
theme_minimal()
```

Following themes are available for use:

## Adding Titles and Axis Labels

You can add your own title and axis labels easily by specifying following functions.

Function | Description |

ggtitle() | Main plot title |

xlab() | x‐axis label |

ylab() | y‐axis label |

```
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
geom_boxplot() +
ggtitle("Tooth Growth in Guinea Pigs") +
xlab("Vitamin C dose (mg/day)") +
ylab("Length of odontoblasts")
```

## Horizontal Box Plot

You can draw draw the box-plot horizontally by incorporating the `coord_flip()`

function, which flips the x and y coordinates.

```
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
geom_boxplot() +
coord_flip()
```

## Notched Box Plot

The notched box plot allows you to assess whether the medians are different. If the notches do not overlap, there is strong evidence (95% confidence) their medians differ.

You add notches to a box plot by setting the notch argument to TRUE in `geom_boxplot()`

.

```
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
geom_boxplot(notch=TRUE)
```

## Add Means to a Box Plot

The horizontal line in the middle of a box plot is the median, not the mean.

The median alone will not help you understand if the data is normally distributed. So, you need to add mean markers on your box plot.

```
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
geom_boxplot() +
stat_summary(fun.y=mean, geom="point", shape=18, size=3, color="white")
```

## Grouped Box Plot

You can also easily group box plots by the levels of a categorical variable.

There are two options to create a grouped Box Plot

### In the Same Plot

In order to plot the two supplement levels in the same plot, you need to map the categorical variable “supp” to fill.

```
# Plot the two supplement levels in the same plot
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=supp)) +
geom_boxplot()
```

### In Panel Plot

In order to produce a panel plot by supplement levels, you need to add the `facet_grid(. ~ supp)`

option to the plot.

```
# Plot the two supplement levels in separate (panel) plots
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
geom_boxplot() +
facet_grid(. ~ supp)
```

## Change the Order of Items

The order of items on a categorical axis can be changed by specifying limits in `scale_x_discrete()`

or `scale_y_discrete()`

.

Simply pass a vector of the levels in the desired order.

```
# Change the order of items on a categorical axis
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
geom_boxplot() +
scale_x_discrete(limits=c("2", "1", "0.5"))
```

You can also omit some items with this vector.

```
# Omit 1mg/day dose level
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
geom_boxplot() +
scale_x_discrete(limits=c("2", "0.5"))
```

## Box Plot with Dots

Overlaying a symmetrical dot density plot on a box plot has the potential to give the benefits of both plots. You can achieve this by adding the `geom_dotplot()`

function.

```
# Overlay a symmetrical dot density plot on a box plot
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
geom_boxplot() +
geom_dotplot(binaxis='y', stackdir='center', dotsize=1)
```

## Box Plot with Jittered Dots

Sometimes you may want the additional insight that you get from the raw data points.

For example, overlaying all of the data points for that group on each box plot will give you an idea of the sample size of the group.

You can achieve this by adding the `geom_jitter()`

function.

```
# Add jitter over box plot
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose))) +
geom_boxplot(outlier.shape=NA) +
geom_jitter(position=position_jitter(0.2))
```