Warning: This crash course is only meant to assist you in learning econometrics through providing an intuitive and graphic-oriented overview of widely taught methods. It does not dispense you in any way from studying one of those tombs with their technical explanations. You will have to learn them one day (soon!), if you want to do responsible quantitative research.
What is econometrics? Well, which student really cares…?
Usually, when you open a textbook on econometrics you will find one – or more if the textbook is a good one – of those plenty definitions that exist in the literature. And since so many people have come up with a lot of bright ideas on the meaning of the word “econometrics” before me, I will skip it.
As a humble and average student like me you rather might be interested in the one central question:
Is there a causal relationship between two or more observed variables and, if yes, how large is it?
In order to answer this question, you apply the results of econometric research (Yea, I know, that was an indirect definition. Sorry…). But since, for some reason, you are reading this small page and not a time consuming textbook, I will present a somewhat intuitive approach in the following.
Econometrics has to do with models
At the beginning of each research project on the impact of one or more variables – so-called exogenous or independent variables – on another variable – usually called endogenous or dependent variable – there is an idea about how the values of the independent variables interact or sum up to produce the outcome of the dependent variable. In order to use econometric methods, you have to formulate such a relation mathematically. This means that you have to create a model. But how should you start with that?
Get to know your data (visually)
Since humans are visual beings it is worthwhile to begin with visualisations of your data that might give you an intuition about the relation between two or more variables. Take for example the following three figures.
They are called scatterplots and graph the value of one variable (e.g. y) given the value of another variable (here x). Intuitively spoken, if your data reveals a pattern like the cloud-like collection of points in figure 1, you can be quite sure that statistical methods will yield no useful results. And since I made the figure by myself, I can tell you that the points were randomly assigned by my computer.
In contrast to this, figures 2 and 3 imply a linear relation between the variables x and y, because y seems to become large with larger values of x. Although the dispersion in the middle figure is much larger than in the bottom one, the points in both seem to move along an imaginative line from the lower left corner to the upper right. If you observe something like this, you might have found something. At least, it will be correlation.
Note, that all figures from above intuitively seem to allow for the point (0,0) to be a point in the crowd, since there are many points in the near neighbourhood. This does not necessarily have to be the case as the following figure shows. For a value of x=0, figure 4 suggests that the corresponding y-value has to lie somewhere between 150 and 250. However, the linear relationship between x and y prevails. Therefore, we assume that using a linear model is a good approach to describe the relation between variable x and y. More precisely, we assume that y is a linear function of x, where the causality goes from x to y. Mathematically, this can be described through the function that might be familiar from high school: y = k * x + d, where k is the impact of a one unit change in x on y and d is the value of y in the case that x is zero.
Given that your data looks like in figure 2,3 or 4 – a broader or more narrow band of points, which goes from one corner of the graph to the other – you can use the most frequently used statistical method of ordinary least squares to calculate the values of k and d from the equation above.
Ordinary least squares
Intuitivel, ordinary least squares, short OLS, lays a line through the points in the above figures, which fulfils certain mathematical conditions. At this point I leave it to you to look those conditions up in an econometrics textbook. (Seriously, do it!)
In assessing the size of the effect of x on y, the slope of the red line in figure 5 – the line obtained by OLS – is of particular interest. It gives you the reaction of y to a one unit change in x. In the case from above, R estimates a so-called “coefficient” of 0.8, which indicates that a one unit increase in x leads to a 0.8 increase in y. d is estimated to be 200, which can be seen in figure 5, where the red line has a value close to 200 when x is zero.
[To be continued… June 1st, 2015]