Cincera (1997): Patents, R&D, and technological spillovers at the firm level

Michele Cincera works on the influence of R&D spending on the research output of firms. Since the research output (dependent variable) cannot be measured directly he uses the number of patent applications of a firm as a proxy for it. Beside R&D expenditures he also uses technological spillovers as well as geographic and technological dummies.

As the number of patents cannot be smaller than zero, the dependent variable is constrained. Thus, other methods than OLS have to be utilized in order to estimate valid coefficients. Cincera uses count models for this purpose.

The data can be retrieved from the server of the Journal of Applied Econometrics. The readme-file contains a description of the variables and the zip-file contains the data that we are going to use. But before we can use it, we have to rearrange it since it is an old file format and not yet organized in a way that makes it possible to apply panel data methods to it in R.

So, download the zip-file, extract it and open it with a spreadsheet program – the columns are seperated by tabs. Save the file as a comma seperated csv-file (I named it “cincera.csv”) in your working directory, close it and open R.

Next, we are going to rearrange the data in R. As usual, set the working directory and remove everything in your R memory.

setwd("...")

rm(list=ls())

# Read the file
patents<-read.csv('cincera.csv',header=FALSE)

# Generate lists for patents, R&D flows and spillovers
p<-as.vector(c(patents$V4,patents$V5,patents$V6,patents$V7,patents$V8,patents$V9,patents$V10,patents$V11,patents$V12))
k<-as.vector(c(patents$V13,patents$V14,patents$V15,patents$V16,patents$V17,patents$V18,patents$V19,patents$V20,patents$V21))
spill<-as.vector(c(patents$V22,patents$V23,patents$V24,patents$V25,patents$V26,patents$V27,patents$V28,patents$V29,patents$V30))

# Attribute the firm ID
fi<-rep(1:181,9)

# Years
year<-c()
for (i in 1983:1991){
  year<-append(year,rep(i,181))
}

# Geographic dummies
g<-rep(patents$V3,9)

g.1<-as.numeric(g==1)
g.2<-as.numeric(g==2)
g.3<-as.numeric(g==3)
g.4<-as.numeric(g==4)

# Technological dummies
s<-rep(patents$V2,9)

s.1<-as.numeric(s==1)
s.2<-as.numeric(s==2)
s.3<-as.numeric(s==3)
s.4<-as.numeric(s==4)
s.5<-as.numeric(s==5)
s.6<-as.numeric(s==6)
s.7<-as.numeric(s==7)
s.8<-as.numeric(s==8)
s.9<-as.numeric(s==9)
s.10<-as.numeric(s==10)
s.11<-as.numeric(s==11)
s.12<-as.numeric(s==12)
s.13<-as.numeric(s==13)
s.14<-as.numeric(s==14)
s.15<-as.numeric(s==15)

# Lags of R&D Spending
k.1<-as.vector(c(rep(NA,181),k[1:(length(k)-181)]))
k.2<-as.vector(c(rep(NA,2*181),k[1:(length(k)-2*181)]))
k.3<-as.vector(c(rep(NA,3*181),k[1:(length(k)-3*181)]))
k.4<-as.vector(c(rep(NA,4*181),k[1:(length(k)-4*181)]))

# Lags of spillovers
spill.1<-as.vector(c(rep(NA,181),spill[1:(length(spill)-181)]))
spill.2<-as.vector(c(rep(NA,2*181),spill[1:(length(spill)-2*181)]))
spill.3<-as.vector(c(rep(NA,3*181),spill[1:(length(spill)-3*181)]))
spill.4<-as.vector(c(rep(NA,4*181),spill[1:(length(spill)-4*181)]))

# Generate the finale data frame
data<-data.frame(year,fi,p,k,k.1,k.2,k.3,k.4,spill,spill.1,spill.2,spill.3,spill.4,g.1,g.2,g.3,g.4,
                 s.1,s.2,s.3,s.4,s.5,s.6,s.7,s.8,s.9,s.10,s.11,s.12,s.13,s.14,s.15)

# Give labels to the variables
attributes(data)$var.labels<-c('Year','FirmID','# of patents','R&D spending','Lag R&D 1',
                               'Lag R&D 2','Lag R&D 3','Lag R&D 4','Spillover','Lag spillover 1',
                               'Lag spillover 2','Lag spillover 3','Lag spillover 4',
                               'Geographic dummy 1','Geographic dummy 2','Geographic dummy 3',
                               'Geographic dummy 4','Sector dummy 1','Sector dummy 2',
                               'Sector dummy 3','Sector dummy 4','Sector dummy 5',
                               'Sector dummy 6','Sector dummy 7','Sector dummy 8',
                               'Sector dummy 9','Sector dummy 10','Sector dummy 11',
                               'Sector dummy 12','Sector dummy 13','Sector dummy 14',
                               'Sector dummy 15')


# Export to csv-file
write.csv(data,'patents.csv')

Now we have an appropriate data set with which R can work. It might be reasonable to close the current script and to open a new one. After setting the working directory, cleaning the memory and activating thethe “pglm”-package and reading the data we are going to replicate table 1 in the paper which contains the characteristics of the paper.

setwd("...")
rm(list=ls())

library('pglm')

patents<-read.csv('patents.csv')

# Table 1
table.1a<-data.frame(Mean=NA,Standard.error=NA,Minimum.value=NA,Maximum.value=NA)

table.1a[1,1]<-round(mean(patents$p),digits=2)
table.1a[1,2]<-round(sqrt(var(patents$p)),digits=2)
table.1a[1,3]<-round(min(patents$p),digits=2)
table.1a[1,4]<-round(max(patents$p),digits=2)
table.1a[2,1]<-round(mean(patents$k),digits=2)
table.1a[2,2]<-round(sqrt(var(patents$k)),digits=2)
table.1a[2,3]<-round(min(patents$k),digits=2)
table.1a[2,4]<-round(max(patents$k),digits=2)
table.1a[3,1]<-round(mean(patents$spill),digits=2)
table.1a[3,2]<-round(sqrt(var(patents$spill)),digits=2)
table.1a[3,3]<-round(min(patents$spill),digits=2)
table.1a[3,4]<-round(max(patents$spill),digits=2)


table.1b<-data.frame(P=NA,k=NA,k.1=NA,k.2=NA,k.3=NA)

table.1b[1,1]<-round(cor(patents$k,patents$p),digits=2)
table.1b[2,1]<-round(cor(patents$k.1,patents$p,"complete.obs"),digits=2)
table.1b[3,1]<-round(cor(patents$k.2,patents$p,"complete.obs"),digits=2)
table.1b[4,1]<-round(cor(patents$k.3,patents$p,"complete.obs"),digits=2)
table.1b[5,1]<-round(cor(patents$k.4,patents$p,"complete.obs"),digits=2)

# The "complete.obs" option tells R that despite there are missing values it should procide
# with the evaluation. Ultimately, it will only use observations that are complete, i.e.
# where both columns of an observation contain values.

table.1b[2,2]<-round(cor(patents$k.1,patents$k,"complete.obs"),digits=2)
table.1b[3,2]<-round(cor(patents$k.2,patents$k,"complete.obs"),digits=2)
table.1b[4,2]<-round(cor(patents$k.3,patents$k,"complete.obs"),digits=2)
table.1b[5,2]<-round(cor(patents$k.4,patents$k,"complete.obs"),digits=2)

table.1b[3,3]<-round(cor(patents$k.2,patents$k.1,"complete.obs"),digits=2)
table.1b[4,3]<-round(cor(patents$k.3,patents$k.1,"complete.obs"),digits=2)
table.1b[5,3]<-round(cor(patents$k.4,patents$k.1,"complete.obs"),digits=2)

table.1b[4,4]<-round(cor(patents$k.3,patents$k.2,"complete.obs"),digits=2)
table.1b[5,4]<-round(cor(patents$k.4,patents$k.2,"complete.obs"),digits=2)

table.1b[5,5]<-round(cor(patents$k.4,patents$k.3,"complete.obs"),digits=2)

table.1a
table.1b

In his paper, Cincera (1997) in a first step estimates a simple Poisson model for panel data as a benchmark. In R this can be done with the “pglm”-package for general linear models for panel data.

# Table 2
# (3) Conditional Poisson
s.poisson<-summary(poisson<-pglm(p ~ k + k.1 + k.2 + k.3 + k.4 + 
                      spill + spill.1 + spill.2 + spill.3 + spill.4,
                     data=patents,
                     index=c('fi','year'),
                     model='within',
                     effect="individual",
                     family=poisson()))

s.poisson

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s