/* April2006PEPSbuild Rebuild Data file for PEPS Project --- DO file #2 */ /* for final runs to revise "Voting Counts" for SCID */ /* DO file #2 creates the basic PEPS file used in the analysis */ /* "C:\Research\Democracy\Data\PEPS1.dta" */ /* It incorporates IDEA data on turnout, UN population data, our updates, */ /* Paxton suffrage data, Vanhanen data, IDEA compulsory voting data, */ /* and Penn World Tables data into the previous Polity Freedom House file */ * This do file follows immediately from Rebuild.for.PEPS.2006.do --- DO file #1 **************************************************************************** /* Requires: */ /* Polity */ /* Turnout data from IDEA and updates, via Access from Patrick */ /* Vanhanen data on turnout and voting age population, via Access */ /* UN population data */ /* x Freedom House data */ /* Real GDP capita data from Penn World Tables or World Bank */ /* Maddisson data on GDP per capita and population for early period */ /* Suffrage data from Paxton & Bollen via Access */ /* Compulsory voting dummies from IDEA */ /* Life expectancy and literacy rates */ /* Gender Development Index from UNDP */ */ Read voting data (mostly IDEA), including UN population */ insheet using "C:/Research/Democracy/Data/stataVotes_UNPopulation.csv", clear */ Adjust nation codes in voting data to match Polity/FH file */ replace ccode=255 if ccode==260 & year>=1990 // unified Germany, not West drop if ccode==317 & year<1993 // Slovakia not independent yet drop if ccode==316 // duplicate entries replace ccode=316 if ccode==315 & year>=1993 // Czech Rep., not Czechoslovakia replace ccode=347 if ccode==345 & year>=1992 // Serbia, not Yugoslavia replace ccode=364 if ccode==365 & year>=1922 & year<=1991 // USSR, not Russia replace ccode=529 if ccode==530 & year>=1993 // new Ethiopia minus Eritrea replace ccode=769 if ccode==770 & year<=1971 // unified Pakistan before split replace ccode=818 if ccode==816 & year>=1976 // unified VietNam, not North drop if ccode==680 & year>=1990 // mistaken Yemen entry by coders, always missing drop if ccode==679 // mistaken Yemen entry by coders, always missing replace ccode=679 if ccode==678 & year>=1990 // unified Yemen, not N Arab drop sourcea sourceb rename country Iname rename vapa AVAP rename populationa Apop rename vapb BVAP rename populationb Bpop rename unpopulation UNpop rename unvap UNVAP drop Apop Bpop *Population corrections for cases in which IDEA's VAP is men only replace AVAP=AVAP * 2 if ccode==223 & year<1986 // Liechtenstein replace AVAP=UNVAP if ccode==225 & year<1975 //count women in Switzerland replace AVAP=UNVAP if ccode==350 & year<1956 //count women in Greece replace AVAP=UNVAP if ccode==690 & year<2000 //count women in Kuwait replace AVAP=UNVAP if ccode==404 & year>1999 //Guinea-Bissau */Patch in UN voting age population if IDEA VAP is missing */ replace AVAP = UNVAP if AVAP<0 & UNVAP>0 & UNVAP!=. replace BVAP = UNVAP if BVAP<0 & UNVAP>0 & UNVAP!=. /* Save temporary file and merge with Polity file */ sort ccode year *save "C:\Research\Democracy\Data\tempIDEA2006.dta", replace // corrected turnout ************************************************************************* merge ccode year using "C:/Research/Democracy/Data/PolFH.dta", unique drop if year>2005 *drop if year<1950 drop if _merge==1 // No data in Polity/FH file. The vast majority * of these are countries too small to be included in Polity/FH * but a few records are years before Polity's independence year ********************************************************************************* *Assign scores to countries lacking IDEA records *It is easy to determine why most of them are missing and we can recode those *Only ambiguous cases should be regarded as missing data ********************************************************************************* *Most of the countries for which there is no voting record have been disqualified *by IDEA because they do not have competitive elections. Assign them -2 DQ *The following list have obviously been disqualified by IDEA egen DQidea = eqany(ccode), values(40,265,315,345,620,630,670,694,696,710,760,835) replace DQidea=0 if year<1950 *Also, missing records with 0 democracy have probably been disqualified by IDEA *tab DQidea replace DQidea = 1 if totalvotea==. & democraw == 0 & year>=1950 replace totalvotea = -2 if totalvotea==. & DQidea==1 replace totalvoteb = -2 if totalvoteb==. & DQidea==1 tab totalvotea if totalvotea<=0 & year>=1950 drop if totalvotea==. & democraw==. // No Polity or voting data: drop * No choice but to call the following countries missing data since IDEA has no record * and it cannot be determined if they disqualified the country for the absence of * competitive election. Most of them are former countries (e.g. SVN) replace totalvotea= -9 if totalvotea==. & year>=1950 replace totalvoteb= -9 if totalvoteb==. & year>=1950 tab totalvotea if totalvotea<=0 & year>=1950 * -1 codes are assigned prior to the first election, but this gives missing data * for many countries that are more accurately described as DQ (0 participation) * Those with 0 democracy should be considered DQ replace totalvotea = -2 if totalvotea == -1 & democraw == 0 & year>=1950 replace totalvoteb = -2 if totalvoteb == -1 & democraw == 0 & year>=1950 tab totalvotea if totalvotea<=0 & year>=1950 *Recode exception codes in voting data and create Avote/Bvote *Sundry solutions to missing data recode totalvotea (-15=-2 DQ) (-14=-9 nodata) /// (-13=-4 noelection) (-10=-9 ) (-1 = -9 ) (. = -9), gen(Avote) recode totalvoteb (-15=-2 DQ) (-14=-9 nodata) /// (-13=-4 noelection) (-10=-9 ) (-1 = -9 ) (. = -9) , gen(Bvote) * -9 nodata No data is available and the case should be treated as missing * No presumption concerning participation can be made * The largest block of nations earning this code result from....... * -1 New nations have not yet had elections with available data * (though some new nations are coded -9 originally because of A/B * -9/-10 IDEA shows an election but no votes & no other sources * -14 A future election is noted. * . IDEA dpesn't code former nations like SVN & the split Yemens * -4 noelection Participation is regarded as zero because no election * was held. Almost all Parliamentary systems get -4 codes * for Presidential votes, which do not figure in the xtout * Most -4's result from overdue elections believed not to * have been held as a result of the seven year rule */ For those nations having no voting data because of disqualification /// by IDEA, assign them -2 DQ which will eventually be recoded /// to zero to reflect that nations without competitive elections /// should be seen as having zero participation. A very few updated /// elections were judged -2 by Moon /// This will permit computing PEPS1 and PEPS2 tab totalvotea if totalvotea<=0 & year>=1950 */Calculate turnout two ways. */ toI is the result when "DQ" and "no elections" are considered missing data */ toQ is the result when "DQ" and "no elections" are considered 0 participation */ Computing toI */ gen toutA = Avote / AVAP replace toutA = -9 if AVAP<=0 replace toutA = Avote if Avote<=0 // reinstating missing data codes * Trim extreme scores if due to AVAP data replace toutA = Avote / UNVAP if /// toutA>=1 & toutA<1000 & UNVAP > AVAP & UNVAP>0 & UNVAP!=. gen toutB = Bvote / BVAP replace toutB = -9 if BVAP<=0 replace toutB = Bvote if Bvote<=0 * Trim extreme scores if due to BVAP data replace toutB = Bvote / UNVAP if /// toutB>=1 & toutB<1000 & UNVAP > BVAP & UNVAP>0 & UNVAP!=. */ Computing toQ through the very same process except using vote totals */ with greatly reduced missing data gen AvoteQ = Avote gen BvoteQ = Bvote recode AvoteQ BvoteQ (-2 = 0) (-4=0) gen toutAQ = AvoteQ / AVAP replace toutAQ = -9 if AVAP<=0 replace toutAQ = AvoteQ if AvoteQ<=0 // reinstating missing data codes * Trim extreme scores if due to AVAP data replace toutAQ = AvoteQ / UNVAP if /// toutAQ>=1 & toutAQ<1000 & UNVAP > AVAP & UNVAP>0 & UNVAP!=. gen toutBQ = BvoteQ / BVAP replace toutBQ = -9 if BVAP<=0 replace toutBQ = BvoteQ if Bvote<=0 * Trim extreme scores if due to BVAP data replace toutBQ = BvoteQ / UNVAP if /// toutBQ>=1 & toutBQ<1000 & UNVAP > BVAP & UNVAP>0 & UNVAP!=. mvdecode AVAP BVAP UNVAP, mv(-1 -4 -9) *compare toutA toutB * Create xtout, the larger of the two touts (Presidential or Parliamentary) * 1. It should be coded missing if there is no IDEA data gen xtout = -9 replace xtout = toutA if toutA<0 & toutB<0 * 2. If either is missing, use the other replace xtout = toutA if toutA>=0 & toutB<0 replace xtout = toutB if toutB>=0 & toutA<0 * 3. if both are present, use the larger replace xtout = toutA if toutA>=0 & toutB >= 0 & toutA>=toutB replace xtout = toutB if toutA>=0 & toutB >= 0 & toutA< toutB * 4. trim extreme score replace xtout=1 if xtout>1 & xtout<1000 mvdecode toutA toutB xtout , mv(-1 -2 -4 -9) * Compute xtoutQ, as above * 1. It should be coded missing if there is no IDEA data gen xtoutQ = -9 replace xtoutQ = toutAQ if toutAQ<0 & toutBQ<0 * 2. If either is missing, use the other replace xtoutQ = toutAQ if toutAQ>=0 & toutBQ<0 replace xtoutQ = toutBQ if toutBQ>=0 & toutAQ<0 * 3. if both are present, use the larger replace xtoutQ = toutAQ if toutAQ>=0 & toutBQ >= 0 & toutAQ>=toutBQ replace xtoutQ = toutBQ if toutAQ>=0 & toutBQ >= 0 & toutAQ< toutBQ * 4. trim extreme score replace xtoutQ=1 if xtoutQ>1 & xtoutQ<1000 mvdecode toutAQ toutBQ xtoutQ , mv(-1 -9) format %5.2fc toutA toutB xtout toutAQ toutBQ xtoutQ label variable toutA "IDEA Votes/VAP, Parliamentary" label variable toutB "IDEA Votes/VAP, Presidential" label variable Avote "IDEA Votes, Parliamentary" label variable Bvote "IDEA Votes, Presidential" label variable AVAP "IDEA Voting age population, Parliamentary" label variable BVAP "IDEA Voting age population, Presidential" *mvdecode Avote AVAP toutA Bvote BVAP IVAP Ivote toutB xtout , mv(-1=.a \-2=.b \ -3=.c \ -4=.d \-5= .e \ -9=.f \-99=.f) rename xtout toI label variable toI "IDEA Votes/VAP" rename xtoutQ toQ label variable toQ "Votes/VAP with presumed zero" gen PEPS1i = (Pdemocracy * toI) - Pautocracy gen PEPS1q = (Pdemocracy * toQ) - Pautocracy replace PEPS1i = (0 - Pautocracy) if Pdemocracy == 0 replace PEPS1q = (0 - Pautocracy) if Pdemocracy == 0 gen PEPS2i = (((toI/.05)-10) + Polity3) / 2 gen PEPS2q = (((toQ/.05)-10) + Polity3) / 2 format %5.2fc PEPS1i PEPS2i PEPS1q PEPS2q order ccode Pname Iname FHname year Polity1 Polity2 Polity3 Pdemocracy Pautocracy /// FHfree toI PEPS1i PEPS2i PEPS1q PEPS2q polity1raw drop totalvotea-toutB sort ccode year tsset ccode year, yearly save "C:/Research/Democracy/Data/tempVoting.dta", replace *************************************************************** /* Read Paxton suffrage data into Stata and save */ * .csv prepared by Patrick from Ken Bollen's website *************************************************************** clear insheet using "C:/Research/Democracy/Data/Suffrage.csv" rename value suffrage mvdecode suffrage, mv(-7=.) */ Adjust nation codes in suffrage data to match Polity/FH/toI file */ replace ccode=255 if ccode==260 & year>=1990 // unified Germany, not West replace ccode=316 if ccode==315 & year>=1993 // Czech Rep., not Czechoslovakia replace ccode=347 if ccode==345 & year>=1992 // Serbia, not Yugoslavia replace ccode=364 if ccode==365 & year>=1922 & year<=1991 // USSR, not Russia replace ccode=529 if ccode==530 & year>=1993 // new Ethiopia minus Eritrea replace ccode=769 if ccode==770 & year<=1971 // unified Pakistan before split replace ccode=818 if ccode==816 & year>=1976 // unified VietNam, not North replace ccode=679 if ccode==678 & year>=1990 // unified Yemen, not N Arab sort ccode year merge ccode year using "C:\Research\Democracy\Data\tempVoting.dta", unique drop if _merge==1 & suffrage == . // No voting or suffrage data sort ccode year tab _merge gen cv_strong=0 replace cv_strong=1 if (ccode==900|ccode==211|ccode==352|ccode==950|ccode==212|ccode==970|ccode==830|ccode==225|ccode==165) gen cv_weak=0 replace cv_weak=1 if (ccode==640|ccode==135|ccode==70|ccode==325|ccode==350|ccode==130|ccode==155|ccode==140|ccode==305|ccode==160|ccode==223) gen cv_na = 0 replace cv_na=1 if (ccode==145|ccode==651|ccode==220|ccode==481|ccode==150) gen cv_not = 0 replace cv_not=1 if (ccode==94|ccode==42|ccode==90|ccode==91|ccode==210|ccode==840|ccode==800) gen cv = (cv_strong*4)+(cv_weak*3) +( cv_na*2) + (cv_not*1) drop if _merge ==1 drop _merge save "C:\Research\Democracy\Data\tempVoting.dta", replace // /********************************************************************/ /* 2. Merge Summers & Heston, "Penn World Tables, 6.1" (1950-2000) */ /* (pwt61 STATA file created from Bill's zipfile) */ /* clean up file first */ /********************************************************************/ use "C:/Research/Democracy/Data/pwt61.dta", clear */ Adjust nation codes in Penn GDP data to match Polity/FH/toI file */ replace ccode=255 if ccode==260 & year>=1990 // unified Germany, not West replace ccode=316 if ccode==315 & year>=1993 // Czech Rep., not Czechoslovakia replace ccode=364 if ccode==365 & year>=1922 & year<=1991 // USSR, not Russia replace ccode=529 if ccode==530 & year>=1993 // new Ethiopia minus Eritrea replace ccode=679 if ccode==678 & year>=1990 // unified Yemen, not N Arab replace ccode=769 if ccode==770 & year<=1971 // unified Pakistan before split replace ccode=818 if ccode==816 & year>=1976 // unified VietNam, not North *replace ccode=316 if ccode==315 & year>= 1993 replace ccode = 2113 if iso=="MAC" sort ccode year /*Now merge */ merge ccode year using "C:\Research\Democracy\Data\tempVoting.dta", unique drop xrat cc cg ci kc kg ki csave rgdpl cgdp rgdptt y p pc pg pi /// cgnp rgdpeqa rgdpwok openc openk *drop cv_strong-cv_not drop if _merge==1 & rgdpch==. // Penn World Tables records exist, but contain no data drop if _merge==1 // no Polity or voting data drop _merge sort ccode year tsset ccode year, yearly save "C:\Research\Democracy\Data\tempPEPS.dta" , replace /********************************************************************/ /* Build Vanhanen and merge with previous PEPS file */ /* (vanhanen STATA file created from xls file acquired on Web) */ *Step 1 : Read revised Vanhanen data (.csv) into Stata & save note: .csv file created by Patrick Schmid from original Vanhanen Access file, bypassing weights ** insheet using "C:\STATADATA\vanhanen_ALL.csv",clear *insheet using vanhanendataexpanded.csv, clear /********************************************************************/ insheet using "C:/Research/Democracy/Data/VanhanenExpanded.csv", clear rename electiontype election_type rename voteshare vote_share *Correct Vanhanen population error replace population=population*2 if ccode==481& year<=1978 //Gabon sort ccode year election_type *Calculate toV in place of vote_share contained in the original vanhanen file, /// which is frequently wrong by a fraction and reported in two significant digits gen toV = vote_share replace toV = votes/population if votes > 0 & population > 0 replace toV = vote_share if toV >1000 pwcorr summarize ********************************************************************* *Remove duplicate records when multiple elections are held in same year duplicates tag ccode year election_type, generate(dup) //find duplicates *summarize *duplicates list ccode year election_type, separator(5) //list duplicates *list if dup>0 // examine duplicate cases with all variables *First, drop duplicate cases that are mostly missing data drop if dup>0 & population==0 & year==year[_n-1] //drop the second of two zero duplicates drop dup //restart duplicates tag ccode year election_type, generate(dup) //find remaining duplicates drop if dup>0 & population==0 & year==year[_n+1] //drop the first of two zero duplicates *list if dup>0 // examine remaining duplicates drop dup //restart duplicates tag ccode year election_type, generate(dup) //find remaining duplicates *duplicates list ccode year election_type, separator(5) //list remaining duplicates *list if dup>0 // examine remaining duplicates * Drop the duplicate case with the smaller turnout drop if dup>0 & year==year[_n-1] & toV<= toV[_n-1] //drop the second of duplicates if smaller drop dup duplicates tag ccode year election_type, generate(dup) *list if dup>0 drop if dup>0 & year==year[_n+1] & toV< toV[_n+1] //drop the first of duplicates if smaller drop dup duplicates tag ccode year election_type, generate(dup) duplicates list ccode year election_type, separator(5)//confirms that no duplicates remain drop votes vote_share population dup ****************************************************************** *Reshape data set by pulling A & B elections into the same record, * then choosing the larger turnout of the two reshape wide toV, i( ccode year ) j( election_type ) string summarize * the new file has a toVA for parliamentary elections and toVB for presidential mvencode toVA toVB, mv(.=-1) // temporarily remove missing data code gen toV = toVA replace toV = toVB if toVB>toVA // toV is now the larger of toVA & toVB list if toV<=0 & (toVA>0 | toVB>0) // check for wrong missing data codes mvdecode toV toVA toVB, mv(-1) // recognize -1 as missing data code replace toV=toV*100 /66.67 // TRANSFORM toV FROM POPULATION TO VAP DENOMINATOR replace toV=1 if toV>1 & toV<1000 // truncate extreme scores summarize pwcorr drop toVA toVB */ Adjust nation codes in voting data to match Polity/FH file */ drop if ccode==265 & year== 1990 // E Germany ends in 1989 replace ccode=305 if ccode==300 // Austria replace ccode=342 if ccode==345 & year<=1920 // Recode from Yugoslavia to Serbia replace ccode=99 if ccode==100 & year<=1831 // Recode from Colombia to GranColombia replace ccode=255 if ccode==260 & year>=1990 // unified Germany, not West replace ccode=316 if ccode==315 & year>=1993 // Czech Rep., not Czechoslovakia replace ccode=347 if ccode==345 & year>=1992 // Serbia, not Yugoslavia replace ccode=364 if ccode==365 & year>=1922 & year<=1991 // USSR, not Russia replace ccode=529 if ccode==530 & year>=1993 // new Ethiopia minus Eritrea replace ccode=769 if ccode==770 & year<=1971 // unified Pakistan before split replace ccode=818 if ccode==816 & year>=1976 // unified VietNam, not North replace ccode=679 if ccode==678 & year>=1990 // unified Yemen, not N Arab duplicates tag ccode year, generate(dup) duplicates list ccode year , separator(5) drop dup sort ccode year tsset ccode year, yearly // // Now merge Vanhanen with Polity & Freedom House merge ccode year using "C:\Research\Democracy\Data\tempPEPS.dta" , unique tabulate _merge summarize generate PEPS1v = (Pdemocracy * toV) - Pautocracy replace PEPS1v = (0 - Pautocracy) if Pdemocracy == 0 generate PEPS2v = (((toV/.05)-10) + Polity3) / 2 summarize sort ccode year tsset ccode year, yearly compress drop _merge drop AvoteQ-toutBQ order ccode year isocode Pname Iname FHname polity1raw Polity1-FHfree toI toV toQ /// PEPS1i-PEPS2q PEPS1v PEPS2v label variable ccode "country code, Correlates of War numeric codes from Polity" label variable cv "cumpulsory voting,4=strongly enforced,3=weakly,2=NA,1=not enforced,0=none,IDEA" label variable isocode "country code, 3 digit alpha,from Penn World Tables " label variable Pname "country name, Polity" label variable Iname "country name, IDEA" label variable FHname "country name, Freedom House" label variable Pname "country name, Polity" label variable toV "Vanhanen votes/ two-thirds of Vanhanen population" label variable toQ "(mostly)IDEA votes/VAP, with participation coded zero for noncompetitive elections" label variable PEPS1i "PEPS1, using toI" label variable PEPS2i "PEPS2, using toI" label variable PEPS1q "PEPS1, using toQ" label variable PEPS2q "PEPS2, using toQ" label variable PEPS1v "PEPS1, using toV" label variable PEPS2v "PEPS2, using toV" label variable suffrage "legal suffrage as % of VAP, Paxton and Bollen" label variable rgdpch "real per capita GDP, chain method, Penn World Tables" label variable pop "population in thousands, Penn World Tables" label data "PEPS1, version used in SCID 2006 analysis. Moon et al., Lehigh University Dept. of International Relations" save "C:\Research\Democracy\Data\PEPS1.dta" , replace label data "PEPS1pub, public version described in SCID 2006. Moon et al., Lehigh University Dept. of International Relations" drop cv_strong-cv_not save "C:\Research\Democracy\Data\PEPS1pub.dta" , replace