Usually, the first thing you do before estimating econometric models is to analyze the data. There is a simple command in R which provides a first intuition of the data structure,
Recall the script from the post on data import and execute it. Then write
This gives minimum, mean, maximum and some quantile values for all variables in the dataset. Sometimes you might want those summary statistics only for some of the variables in the dataset. This is achieved by using the sign “$”. Think of it as accessing the whole data set and only returning the variable that follows it. For examaple, type in
to get the names of all the variables in the set and choose some interesting ones. Since we will use “salary” and “roe” in the next example we will look at their summary now.
A further approach is to calculate the moments of a variable of interest. The first moment is the varible’s mean, the second its variance, the third its skewness and the forth its kurtosis. The first two moments are the most commonly used ones. For “salary” you get them by executing
In order to obtain the skewness and the kurtosis you need to download and activate the package “e1071”. If you have not installed it, type
install.packages("e1071"). Afterwards execute
Covariances and correlations between “salary” and “roe” are obtained through
Admittedly, this is quite laborious. Fortunately, the package “psych” provides a very handy function which calculates a series of interesting values. Install it with
install.packages("psych") activate it and run
Click here for the last section of our introduction, (scatter)plots.