R Scatter Plot – Base Graph

A scatter plot is a graphical display of relationship between two sets of data.

typical scatter plot

They are good if you to want to visualize how two variables are correlated. That’s why they are also called correlation plot.

The plot() function

The basic plot() function is a generic function that can be used for a variety of different purposes. For the time being, however, you can use the plot() function to create scatter plots.

It has many options and arguments to control many things, such as the plot type, labels, titles and colors.

Syntax

The syntax for the plot() function is:

plot(x,y,type,main,xlab,ylab,pch,col,las,bty,bg,cex,)

Parameters

The plot() function arguments
ParameterDescription
xThe coordinates of points in the plot
yThe y coordinates of points in the plot
typeThe type of plot to be drawn
mainAn overall title for the plot
xlabThe label for the x axis
ylabThe label for the y axis
pchThe shape of points
colThe foreground color of symbols as well as lines
lasThe axes label style
btyThe type of box round the plot area
bgThe background color of symbols (only 21 through 25)
cexThe amount of scaling plotting text and symbols
Other graphical parameters

Create a Scatter Plot

To get started with plot, you need a set of data to work with. Let’s consider the built-in iris flower data set as an example data set.

Here are the first six observations of the data set.

# First six observations of the 'Iris' data set
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Iris data set

Iris data set contains around 150 observations on three species of iris flower: setosa, versicolor and virginica. Every observation contains four measurements of flower’s Petal length, Petal width, Sepal length and Sepal width.

To create a scatter plot just specify any two variables of the data set in plot() function.

# Plot the ‘Iris’ data set
plot(iris$Petal.Length, iris$Petal.Width)

If you have your data contained in a data frame, you can use one of the following approaches to get at the variables; they all produce a similar result.

# $ syntax
plot(iris$Petal.Length, iris$Petal.Width)

# with() function
with(iris, plot(Petal.Length, Petal.Width))

# attach() function
attach(iris)
plot(Petal.Length, Petal.Width)
detach(iris)

# formula syntax
plot(Petal.Width ~ Petal.Length, data=iris)

The formula syntax requires your variables to be in an order y ~ x; which is opposite of the standard syntax plot(x, y).

Change the Shape and Size of the Points

You can use the pch (plotting character) argument to specify symbols to use when plotting points.

Here’s a list of symbols you can use.

With cex (character expansion) argument, you can change the size of the plotted characters.

# Change the shape of the points and scale them down by 0.6
plot(Petal.Width ~ Petal.Length, data=iris,
     pch=16,
     cex=0.6)

Changing the Color

You can change the foreground color of symbols using the col argument.

# Change the color of symbols to blue
plot(Petal.Width ~ Petal.Length, data=iris,
     pch=16,
     col="dodgerblue1")

R has a number of predefined colors that you can use in graphics. Use the colors() function to get a complete list of available names for colors.

# List of predefined colors in R
colors()
[1] "white"         "aliceblue"     "antiquewhite" 
[4] "antiquewhite1" "antiquewhite2" "antiquewhite3"
...

Or you can refer the following color chart.

You can specify colors by index, name, hexadecimal, or RGB value. For example col=1, col="white", and col="#FFFFFF" are equivalent.

Adding Titles and Axis Labels

You can add your own title and axis labels easily by specifying following arguments.

ArgumentDescription
mainMain plot title
xlabx-axis label
ylaby-axis label
plot(Petal.Width ~ Petal.Length, data=iris,
     pch=16,
     col="dodgerblue1",
     main = "Iris Flower Data Set",
     xlab = "Petal Length (cm)",
     ylab = "Petal Width (cm)")

Creating a Scatter Plot of Multiple Groups

Plotting multiple groups in one scatter plot creates an uninformative mess. The graphic would be far more informative if you distinguish one group from another.

Following example uses the pch argument to plot each point with a different plotting character, according to the parallel factor “Species”.

# A scatter plot that shows the points in groups according to their "species"
plot(Petal.Width ~ Petal.Length, data=iris,
     col=c("brown1","dodgerblue1","limegreen")[as.integer(Species)],
     pch=c(1,2,3)[as.integer(Species)])

legend(x="topleft",
       legend=c("setosa","versicolor","virginica"),
       col=c("brown1","dodgerblue1","limegreen"),
       pch=c(1,2,3))

With the legend() function, you can include a legend to your plot, a little box that decodes the graphic for the viewer.

The position of the legend can be specified using the following keywords : “bottomright”, “bottom”, “bottomleft”, “left”, “topleft”, “top”, “topright”, “right” and “center”.

Plotting the Regression Line

To add a regression line (line of Best-Fit) to the existing plot, you first need to estimate a linear regression model using the lm() function.

The result is an object of class lm. You can simply pass the lm object to abline() function to draw the regression line directly.

m <- lm(Petal.Width ~ Petal.Length, data=iris)
plot(Petal.Width ~ Petal.Length, data=iris, col="dodgerblue1")
abline(m, col="brown2")

Plotting the Lowess Line

The lowess() function performs the computations for locally weighted scatter plot smoothing (LOWESS).

Its result can be passed to the lines() function to add a lowess line to the existing plot.

plot(Petal.Width ~ Petal.Length, data=iris, col="dodgerblue1")
lines(lowess(iris$Petal.Length, iris$Petal.Width), col = "brown2")

Scatterplot Matrix

If your data set contains large number of variables, finding relation between them is difficult. In R, you can create scatter plots of all pairs of variables at once.

Following example plots all columns of iris data set, producing a matrix of scatter plots (pairs plot).

plot(iris,
     col=rgb(0,0,1,.15),
     pch=19)

By default, the plot() function takes all the columns in a data frame and creates a matrix of scatter plots. This becomes messy if you have many columns.

You can choose which columns you want to display by using the formula notation.

# Use formula notation to create customized pairs plots
plot(~ Petal.Length + Petal.Width + Sepal.Width,
     col=rgb(0,0,1,.15),
     pch=19,
     data=iris)

Coplots (conditioning scatter plots)

Often your dataset contains a mixture of both continuous and discrete variables. It can be quite tedious to find how a relationship between a pair of variables differs among groups.

Information of that nature can be gained using conditioning plots (or coplots).

Conditioning scatter plots contains multipanel display, where each panel contains a scatter plot for each group.

coplot(Petal.Length ~ Petal.Width | Species,
       data=iris,
       columns=3,
       bar.bg=c(fac="lightskyblue"),
       col="dodgerblue1")

3D scatter plots – scatterplot3D package

There are many packages in R (such as scatterplot3d, RGL, lattice, …) for creating 3D plots. The scaterplot3d package is simple and easy to use among all.

To create a 3D scatter plot, use scatterplot3d() function and pass in three variables representing the x, y, and z coordinates.

library(scatterplot3d)
attach(iris)
scatterplot3d(Sepal.Length, Sepal.Width, Petal.Length)

You can alter the appearance of your 3D scatterplot by using following parameters.

Most common scatterplot3d() function arguments
ParameterDescription
typeThe type of item to plot
‘p’ for points,
‘l’ for lines,
‘h’ for line segments from z = 0,
colorThe color to be used for plotted items
pchPlotting symbol to use
angleAngle between x and y axis
xlab, ylab, zlabLabels for the coordinates
main, subTitle and subtitle
# Changing the appearance of the 3D scatterplot
scatterplot3d(Sepal.Length, Sepal.Width, Petal.Length,
              pch = 16,
              type="h",
              angle = 45,
              xlab = "Sepal length",
              ylab = "Sepal width",
              zlab = "Petal length",
              color = c("brown1","dodgerblue1","limegreen")[as.integer(Species)])

legend("top",
       pch = 16,
       cex = 0.8,
       horiz = TRUE,
       legend = levels(iris$Species),
       col =  c("brown1","dodgerblue1","limegreen"))

3D scatter plots – rgl package

When it comes to 3D plots, it’s important to be able to view them from different angles.

The rgl package offers some simple functions to create 3D plots that you can rotate and zoom in/out. rgl utilizes OpenGL to render the graphics on your computer screen.

To create a 3D scatter plot, use plot3d() of rgl and pass in three variables representing the x, y, and z coordinates.

# Create a spinning 3D scatter plot
library(rgl)
attach(iris)
plot3d(Sepal.Length, Sepal.Width, Petal.Length)

You can rotate the plot by clicking and dragging with the mouse, and zoom in and out with the scroll wheel.

You can alter the appearance of your 3D scatterplot by using following parameters.

Most common plot3d() function arguments
ParameterDescription
typeThe type of item to plot
‘p’ for points,
‘s’ for spheres,
‘l’ for lines,
‘h’ for line segments from z = 0,~’n’ for nothing
colThe color to be used for plotted items
sizeThe size for plotted points
xlab, ylab, zlabLabels for the coordinates
main, subTitle and subtitle
# Changing the appearance of the 3D scatterplot
plot3d(Sepal.Length, Sepal.Width, Petal.Length,
       pch = 16,
       size = 1,
       type = "s",
       xlab = "Sepal length",
       ylab = "Sepal width",
       zlab = "Petal length",
       col = c("brown1","dodgerblue1","limegreen")[as.integer(Species)])

legend3d("topright",
         col=c("brown1","dodgerblue1","limegreen"),
         legend=levels(Species),
         pch=16)