A scatter plot is a graphical display of relationship between two sets of data.
They are good if you to want to visualize how two variables are correlated. That’s why they are also called correlation plot.
The plot() function
The basic plot() function is a generic function that can be used for a variety of different purposes. For the time being, however, you can use the plot()
function to create scatter plots.
It has many options and arguments to control many things, such as the plot type, labels, titles and colors.
Syntax
The syntax for the plot()
function is:
plot(x,y,type,main,xlab,ylab,pch,col,las,bty,bg,cex,…)
Parameters
Parameter | Description |
x | The coordinates of points in the plot |
y | The y coordinates of points in the plot |
type | The type of plot to be drawn |
main | An overall title for the plot |
xlab | The label for the x axis |
ylab | The label for the y axis |
pch | The shape of points |
col | The foreground color of symbols as well as lines |
las | The axes label style |
bty | The type of box round the plot area |
bg | The background color of symbols (only 21 through 25) |
cex | The amount of scaling plotting text and symbols |
… | Other graphical parameters |
Create a Scatter Plot
To get started with plot, you need a set of data to work with. Let’s consider the built-in iris flower data set as an example data set.
Here are the first six observations of the data set.
# First six observations of the 'Iris' data set
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
Iris data set
Iris data set contains around 150 observations on three species of iris flower: setosa, versicolor and virginica. Every observation contains four measurements of flower’s Petal length, Petal width, Sepal length and Sepal width.
To create a scatter plot just specify any two variables of the data set in plot()
function.
# Plot the ‘Iris’ data set
plot(iris$Petal.Length, iris$Petal.Width)
If you have your data contained in a data frame, you can use one of the following approaches to get at the variables; they all produce a similar result.
# $ syntax
plot(iris$Petal.Length, iris$Petal.Width)
# with() function
with(iris, plot(Petal.Length, Petal.Width))
# attach() function
attach(iris)
plot(Petal.Length, Petal.Width)
detach(iris)
# formula syntax
plot(Petal.Width ~ Petal.Length, data=iris)
The formula syntax requires your variables to be in an order y ~ x
; which is opposite of the standard syntax plot(x, y)
.
Change the Shape and Size of the Points
You can use the pch (plotting character) argument to specify symbols to use when plotting points.
Here’s a list of symbols you can use.
With cex (character expansion) argument, you can change the size of the plotted characters.
# Change the shape of the points and scale them down by 0.6
plot(Petal.Width ~ Petal.Length, data=iris,
pch=16,
cex=0.6)
Changing the Color
You can change the foreground color of symbols using the col argument.
# Change the color of symbols to blue
plot(Petal.Width ~ Petal.Length, data=iris,
pch=16,
col="dodgerblue1")
R has a number of predefined colors that you can use in graphics. Use the colors()
function to get a complete list of available names for colors.
# List of predefined colors in R
colors()
[1] "white" "aliceblue" "antiquewhite"
[4] "antiquewhite1" "antiquewhite2" "antiquewhite3"
...
Or you can refer the following color chart.
You can specify colors by index, name, hexadecimal, or RGB value. For example col=1
, col="white"
, and col="#FFFFFF"
are equivalent.
Adding Titles and Axis Labels
You can add your own title and axis labels easily by specifying following arguments.
Argument | Description |
main | Main plot title |
xlab | x-axis label |
ylab | y-axis label |
plot(Petal.Width ~ Petal.Length, data=iris,
pch=16,
col="dodgerblue1",
main = "Iris Flower Data Set",
xlab = "Petal Length (cm)",
ylab = "Petal Width (cm)")
Creating a Scatter Plot of Multiple Groups
Plotting multiple groups in one scatter plot creates an uninformative mess. The graphic would be far more informative if you distinguish one group from another.
Following example uses the pch argument to plot each point with a different plotting character, according to the parallel factor “Species”.
# A scatter plot that shows the points in groups according to their "species"
plot(Petal.Width ~ Petal.Length, data=iris,
col=c("brown1","dodgerblue1","limegreen")[as.integer(Species)],
pch=c(1,2,3)[as.integer(Species)])
legend(x="topleft",
legend=c("setosa","versicolor","virginica"),
col=c("brown1","dodgerblue1","limegreen"),
pch=c(1,2,3))
With the legend()
function, you can include a legend to your plot, a little box that decodes the graphic for the viewer.
The position of the legend can be specified using the following keywords : “bottomright”, “bottom”, “bottomleft”, “left”, “topleft”, “top”, “topright”, “right” and “center”.
Plotting the Regression Line
To add a regression line (line of Best-Fit) to the existing plot, you first need to estimate a linear regression model using the lm()
function.
The result is an object of class lm. You can simply pass the lm object to abline()
function to draw the regression line directly.
m <- lm(Petal.Width ~ Petal.Length, data=iris)
plot(Petal.Width ~ Petal.Length, data=iris, col="dodgerblue1")
abline(m, col="brown2")
Plotting the Lowess Line
The lowess()
function performs the computations for locally weighted scatter plot smoothing (LOWESS).
Its result can be passed to the lines()
function to add a lowess line to the existing plot.
plot(Petal.Width ~ Petal.Length, data=iris, col="dodgerblue1")
lines(lowess(iris$Petal.Length, iris$Petal.Width), col = "brown2")
Scatterplot Matrix
If your data set contains large number of variables, finding relation between them is difficult. In R, you can create scatter plots of all pairs of variables at once.
Following example plots all columns of iris data set, producing a matrix of scatter plots (pairs plot).
plot(iris,
col=rgb(0,0,1,.15),
pch=19)
By default, the plot()
function takes all the columns in a data frame and creates a matrix of scatter plots. This becomes messy if you have many columns.
You can choose which columns you want to display by using the formula notation.
# Use formula notation to create customized pairs plots
plot(~ Petal.Length + Petal.Width + Sepal.Width,
col=rgb(0,0,1,.15),
pch=19,
data=iris)
Coplots (conditioning scatter plots)
Often your dataset contains a mixture of both continuous and discrete variables. It can be quite tedious to find how a relationship between a pair of variables differs among groups.
Information of that nature can be gained using conditioning plots (or coplots).
Conditioning scatter plots contains multipanel display, where each panel contains a scatter plot for each group.
coplot(Petal.Length ~ Petal.Width | Species,
data=iris,
columns=3,
bar.bg=c(fac="lightskyblue"),
col="dodgerblue1")
3D scatter plots – scatterplot3D package
There are many packages in R (such as scatterplot3d, RGL, lattice, …) for creating 3D plots. The scaterplot3d package is simple and easy to use among all.
To create a 3D scatter plot, use scatterplot3d()
function and pass in three variables representing the x, y, and z coordinates.
library(scatterplot3d)
attach(iris)
scatterplot3d(Sepal.Length, Sepal.Width, Petal.Length)
You can alter the appearance of your 3D scatterplot by using following parameters.
Parameter | Description |
type | The type of item to plot ‘p’ for points, ‘l’ for lines, ‘h’ for line segments from z = 0, |
color | The color to be used for plotted items |
pch | Plotting symbol to use |
angle | Angle between x and y axis |
xlab, ylab, zlab | Labels for the coordinates |
main, sub | Title and subtitle |
# Changing the appearance of the 3D scatterplot
scatterplot3d(Sepal.Length, Sepal.Width, Petal.Length,
pch = 16,
type="h",
angle = 45,
xlab = "Sepal length",
ylab = "Sepal width",
zlab = "Petal length",
color = c("brown1","dodgerblue1","limegreen")[as.integer(Species)])
legend("top",
pch = 16,
cex = 0.8,
horiz = TRUE,
legend = levels(iris$Species),
col = c("brown1","dodgerblue1","limegreen"))
3D scatter plots – rgl package
When it comes to 3D plots, it’s important to be able to view them from different angles.
The rgl package offers some simple functions to create 3D plots that you can rotate and zoom in/out. rgl utilizes OpenGL to render the graphics on your computer screen.
To create a 3D scatter plot, use plot3d()
of rgl and pass in three variables representing the x, y, and z coordinates.
# Create a spinning 3D scatter plot
library(rgl)
attach(iris)
plot3d(Sepal.Length, Sepal.Width, Petal.Length)
You can rotate the plot by clicking and dragging with the mouse, and zoom in and out with the scroll wheel.
You can alter the appearance of your 3D scatterplot by using following parameters.
Parameter | Description |
type | The type of item to plot ‘p’ for points, ‘s’ for spheres, ‘l’ for lines, ‘h’ for line segments from z = 0,~’n’ for nothing |
col | The color to be used for plotted items |
size | The size for plotted points |
xlab, ylab, zlab | Labels for the coordinates |
main, sub | Title and subtitle |
# Changing the appearance of the 3D scatterplot
plot3d(Sepal.Length, Sepal.Width, Petal.Length,
pch = 16,
size = 1,
type = "s",
xlab = "Sepal length",
ylab = "Sepal width",
zlab = "Petal length",
col = c("brown1","dodgerblue1","limegreen")[as.integer(Species)])
legend3d("topright",
col=c("brown1","dodgerblue1","limegreen"),
legend=levels(Species),
pch=16)