This is some code to download and read data from the homepage of Kenneth R. French.
First, visit the data library of Kenneth R. French and look with which data sets you want to work with and copy their links. Then, set your working directory, download (
download.file()) the relevant files and unzip
unzip() them to get the .txt files. You can do all this with R once you know the link to the data set. In the example below, I downloaded monthly values for the three factor model and portfolio returns of portfolios that were formed based on firm size.
download.file("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_TXT.zip",destfile="F-F_Research_Data_Factors.zip") download.file("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Portfolios_Formed_on_ME_TXT.zip",destfile="Portfolios_Formed_on_ME.zip") unzip("F-F_Research_Data_Factors.zip") unzip('Portfolios_Formed_on_ME.zip')
Read the data. Note that we have to use the
read.delim() command, since the observations are not separated by any signs, but by empty space. Additionally, since the first lines of the file contain a description, we have to skip the first four rows in the factor sample and thirteen in the portfolio sample. You will have to inspect the .txt files with an ordinary editor to get an idea about how many rows you can skip. After those 4 (13) rows we tell R to only read 1065 further rows, since the table containing monthly observations ends here. (This number has to be updated every now and then as the sample increases over time. Thus, add an appropriate amount of rows to the value in “nrows” to get the most recent data. I used
head() to see, when the sample starts and
tail() to see if I really took the right amount of rows into consideration. Just try it out yourself. Add some values to “nrows=1065” to see the head of the section, where the area with annual data begins.)
After we read the data, we make sure that the sample contain the same periods by checking the number of observations in each set (just look in the upper right window, where the samples can be seen) and whether the variable t starts and end with the same numbers (use
tail() for this).
fffactors <- read.delim('F-F_Research_Data_Factors.txt', col.names = c('t', 'mkt.rf', 'smb', 'hml', 'rf'), sep = "", nrows = 1067, header = FALSE, skip = 4, stringsAsFactors = FALSE) head(fffactors) tail(fffactors) fffactors<-fffactors[,-1] portfolio < -read.delim('Portfolios_Formed_on_ME.txt', col.names = c("t", "smaller.0", "Lo.30", "Med.40", "Hi.30", "Lo.20", "Qnt.2", "Qnt.3", "Qnt.4", "Hi.20", "Lo.10", "Dec.2", "Dec.3", "Dec.4", "Dec.5", "Dec.6", "Dec.7", "Dec.8", "Dec.9", "Hi.10"), sep = "", nrows = 1067, header = FALSE, skip = 13, stringsAsFactors = FALSE) head(portfolio) tail(portfolio) portfolio <- portfolio[, -1]
Next, we calculate the excess returns of the portfolios by subtracting the risk free rate from returns. Combining the resulting data frame with the factor sample results in the final data set. However, we should add time values, so that we can use commands for time series in R. For this purpose I use the knowledge about the date of the first and the last period of the sample and generate a monthly sequence of dates which I append to the sample.
portfolio.rf<-portfolio-fffactors$rf # Excess returns sample<-cbind(portfolio.rf,fffactors) # Combine samples dates<-seq(as.Date("1926-07-01"),as.Date("2015-05-01"),by="month") # Generate time stamp sample<-cbind(dates,sample) # Combine time values with the sample View(sample) # Take a look at the sample save(sample,file="3factors_size_portfolios.RData") # Store sample for later use