Plotting

Scatterplots

Especially for presentations, it is meaningful to plot data in order to get an intuition about the relations of interest. In regression analysis this is very often done by scatterplots, where for each observation the variable values of interest are plotted against each other. For this purpose we use the plot() function:

In this example I use the data set from the previous section on summary statistics.

ceosal1 <- read.dta("ceosal1.dta")
plot(ceosal1$salary,ceosal1$roe)

There appears a graph in the lower right window of RStudio. Each point gives the value of salary measured on the x-axis and the ROE value on the y-axis for each of the observations in the sample.

Personally, I like the command scatterplotMatrix() from the car package. It automatically produces a series of scatterplots with regression lines and smooths for all variables. But since this requires a lot of calculations, you might want to make sure that the command does not include too much variables – or that you have a good computer.

The output of scatterplotMatrix allows you to get an intuition about the correlations between the dependent variable and the independent variables,i.e., covariates, and correlations between covariates. This is especially useful when you want to check for multicollinearity, which is a serious problem in OLS estimators.

library(car)
scatterplotMatrix(ceosal1)

Time series (longitudinal) data

If you have observations of the same variable over time, you might want to plot the evolution of these variables. You can also do this with the plot function. Everything you have to do is to set the time variable as the variable of the x-axis. Nothing else changes. So, download the data, read it into R, use the plot function with the year of the observation measured on the x-asis and see what happens.

download.file('http://fmwww.bc.edu/ec-p/data/wooldridge/phillips.dta','phillips.dta',mode="wb")
phillips <- read.dta('phillips.dta')

plot(phillips$year,phillips$inf)

Well, this series of unconnected circles does not look very satisfying, does it? The reason why we have no lines between the data points is that the plot function does not differentiate between between cross sectional and time series data in the first place. Thus, we have to make a small adaptation by using the option type="l" to tell R that we want to connect that data points by lines.

plot(phillips$year,phillips$inf,type="l")

Better, but since a good graph should always contain a title and axes labels you should add the second line of the following code to the previous. <codemain and the following text between the quotation marks specifies the title of the graph. xlab and ylab are used to set the name of the x- and y-axis, respectively.

plot(phillips$year,phillips$inf,type="l",
     main="Inflation",xlab="Time",ylab="Inflation")

If you are interested in more sophisticated time series graphs, you can also go through my post on plotting time series using the ggplot2 package.

Congratulations! You successfully went through the introduction. If you want to proceed with some simple regressions, click here. Otherwise have a nice day!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s