Election data processing

We obtained our data from IDEA's website by extracting it from all pages with a Java program. This could be repeated in the future, if IDEA chooses again to update its database. However, it requires some manual setup to work correctly. The data was then read into the Access database as table rawDataIDEA. The Java program made the following modifications to the data:

As first step in access, certain rows were filtered out of the dataset, because they represent cases that would throw off our algorithm. The following table lists all these cases. In almost all these cases, two elections were held in the same year. We retained the election with the highest voteVAP (totalVote / VAP). The Malawi cases are different, as it seems that IDEA reported all elections for Malawi (parliamentary & presidential, 1994 & 1999) twice. Hence one of each has to be removed from the dataset. Unfortunately this mistake by IDEA required the unique identifier for each row (field ID) to be used for determining which rows to filter out of the dataset. Therefore if the original IDEA dataset were to be updated, the step of determining these cases and entering them in the special table would have to be repeated. This shouldn't be a problem though, because if IDEA updates its data, we would have to check anyhow again for years with two elections of the same type. The IDEA data without these records is stored in rawDataIDEA_Without_Deleted_Record.

rawDataIDEA_DeleteRecords
ID country type year totalVote registration VAP voteReg invalid voteVAP population
408 DENMARK A 1953 2,077,615.00 2,571,311.00 2,752,470.00 80.80% 0.30% 75.50% 4,369,000.00
628 GREECE A 1989 6,669,228.00 7,892,904.00 7,769,300.00 84.50% 2.10% 85.80% 10,090,000.00
727 ICELAND A 1959 86,147.00 95,050.00 96,320.00 90.60% 1.60% 89.40% 172,000.00
918 LIECHTENSTEIN A 1953 3,025.00 3,333.00 4,013.00 90.80% 3.40% 75.40% 15,000.00
966 MALAWI A 1994 3,021,239.00 3,775,256.00 4,446,670.00 80.00% 2.40% 67.90% 9,461,000.00
968 MALAWI A 1999 4,680,262.00 5,071,822.00 4,419,210.00 92.30% 4.10% 105.90% 9,692,808.00
970 MALAWI B 1994 3,040,665.00 3,775,256.00 4,446,670.00 80.50% 2.00% 68.40% 9,461,000.00
972 MALAWI B 1999 4,755,422.00 5,071,822.00 4,419,210.00 93.80% 1.90% 107.60% 9,692,808.00
1706 SAINT LUCIA A 1987 50,511.00 83,153.00 74,834.00 60.70% 2.30% 67.50% 142,000.00
1591 UNITED KINGDOM A 1974 29,226,810.00 40,072,970.00 40,298,400.00 72.90% 0.10% 72.50% 55,970,000.00

As a next step, the polity country code was assigned to the IDEA records. Several "countries" covered by IDEA where assigned non-Polity codes in this step, because they are not independent countries. Those were: American Samoa (2004), Anguilla (1312), Aruba (1314), Cook Islands (3000), Guam (2276), Macau (1546) and the Occupied Palestinian Territories ("West Bank & Gaza", 2203). "Independence" years for these were set according to when they came into existence as their present-day entities according to the CIA World Facts Book.

Then updates and additions we made to account for the lack of IDEA data were added into the dataset. The result of that and the previous step is stored in IDEA_voteUpdates. It should be noted that cases in which IDEA and the update dataset contained votes for the same election, IDEA was always used. Cases in which the IDEA dataset contained any special code or 0 (hence <=0), but a vote count was available from the update dataset, the vote count from the update dataset was used.

If there were no elections reported in 1945, the following decisions were then made for all types of elections reported in that country:

The output of this step above is stored in IDEA_voteUpdates_With_IndependenceMarkers.

All records were then transposed such that we ended up with one row per year per country. For example the United States before this step was:

IDEA_voteUpdates_With_IndependenceMarkers
ccode country type year totalVote VAP registration population voteReg invalid voteVAP Source
2 United States A 1945 -1 -1 -1 -1 -1 -1 -1 Independence Marker
2 United States A 1946 34279158 88388000 -9 142049065 -9 -9 0.388 IDEA
2 United States A 1948 45839622 95310150 -9 146631000 -9 -9 0.481 IDEA
2 United States A 1950 40253267 94403000 -9 151325798 -9 -9 0.426 IDEA
2 United States A 1952 57582333 96466000 -9 157022000 -9 -9 0.597 IDEA
2 United States A 1954 42509905 98527000 -9 162725667 -9 -9 0.431 IDEA
2 United States A 1956 58434811 106408890 -9 168903000 -9 -9 0.549 IDEA
2 United States A 1958 45966070 103221000 -9 175038232 -9 -9 0.445 IDEA
2 United States A 1960 68838204 109159000 -9 180684000 -9 -9 0.631 IDEA
2 United States A 1962 53141227 112423000 -9 186512143 -9 -9 0.473 IDEA
2 United States A 1964 70644592 114090000 -9 192119000 -9 -9 0.619 IDEA
2 United States A 1966 56188046 116132000 -9 197730744 -9 -9 0.484 IDEA
2 United States A 1968 73211875 120328186 81658180 200710000 0.897 -9 0.608 IDEA
2 United States A 1970 58014338 124498000 82496747 203211926 0.703 -9 0.466 IDEA
2 United States A 1972 77718554 140776000 97328541 208840000 0.799 -9 0.552 IDEA
2 United States A 1974 55943834 146336000 96199020 214305134 0.582 -9 0.382 IDEA
2 United States A 1976 81555789 152309190 105037989 218035000 0.776 -9 0.535 IDEA
2 United States A 1978 58917938 158373000 103291265 221537514 0.57 -9 0.372 IDEA
2 United States A 1980 86515221 164597000 113043734 227738000 0.765 -9 0.526 IDEA
2 United States A 1982 67615576 169938000 110671225 233697676 0.611 -9 0.398 IDEA
2 United States A 1984 92652680 174466000 124150614 236681000 0.746 -9 0.531 IDEA
2 United States A 1986 64991128 178566000 118399984 239529693 0.549 -9 0.364 IDEA
2 United States A 1988 91594693 182778000 126379628 245057000 0.725 -9 0.501 IDEA
2 United States A 1990 67859189 185812000 121105630 248709873 0.56 -9 0.365 IDEA
2 United States A 1992 104405155 189529000 133821178 255407000 0.78 -9 0.551 IDEA
2 United States A 1994 75105860 193650000 130292822 262090745 0.576 -9 0.388 IDEA
2 United States A 1996 96456345 196511000 146211960 265679000 0.66 -9 0.491 IDEA
2 United States A 1998 73117022 210446120 141850558 280298524 0.515 -9 0.347 IDEA
2 United States A 2000 99738383 213954023 156421311 284970789 0.638 -9 0.466 IDEA
2 United States A 2002 -10 -9 -9 -9 -9 -9 -9 IFES
2 United States A 2004 -10 -9 -9 -9 -9 -9 -9 IFES
2 United States B 1948 48692442 95310150 -9 146631000 -9 -9 0.511 IDEA
2 United States B 1952 61551118 102064300 -9 157022000 -9 -9 0.603 IDEA
2 United States B 1956 62026908 106408890 -9 168903000 -9 -9 0.583 IDEA
2 United States B 1960 68838219 109159000 -9 180684000 -9 -9 0.631 IDEA
2 United States B 1964 70644592 114090000 73715818 192119000 0.958 -9 0.619 IDEA
2 United States B 1968 73211875 120328186 81658180 200710000 0.897 -9 0.608 IDEA
2 United States B 1972 77718554 140776000 97328541 208840000 0.799 -9 0.552 IDEA
2 United States B 1976 81555889 152309190 105037986 218035000 0.776 -9 0.535 IDEA
2 United States B 1980 86515221 164597000 113043734 227738000 0.765 -9 0.526 IDEA
2 United States B 1984 92652842 174466000 124150614 236681000 0.746 -9 0.531 IDEA
2 United States B 1988 91594809 182778000 126379628 245057000 0.725 -9 0.501 IDEA
2 United States B 1992 104600366 189529000 133821178 255407000 0.782 -9 0.552 IDEA
2 United States B 1996 92712803 196511000 146211960 265679000 0.634 -9 0.472 IDEA
2 United States B 2000 105404546 213954023 156421311 284970789 0.674 -9 0.493 IDEA
2 United States B 2004 122293548 -9 -9 -9 -9 -9 -9 IFES

After the transpose step, the United States became:

allVotes
ccode country year totalVoteA VAPA registrationA populationA voteRegA invalidA voteVAPA SourceA totalVoteB VAPB registrationB populationB voteRegB invalidB voteVAPB SourceB
2 United States 1945 -1 -1 -1 -1 -1 -1 -1 Independence Marker -9 -9 -9 -9 -9 -9 -9
2 United States 1946 34279158 88388000 -9 142049065 -9 -9 0.388 IDEA -9 -9 -9 -9 -9 -9 -9
2 United States 1948 45839622 95310150 -9 146631000 -9 -9 0.481 IDEA 48692442 95310150 -9 146631000 -9 -9 0.511 IDEA
2 United States 1950 40253267 94403000 -9 151325798 -9 -9 0.426 IDEA -9 -9 -9 -9 -9 -9 -9
2 United States 1952 57582333 96466000 -9 157022000 -9 -9 0.597 IDEA 61551118 102064300 -9 157022000 -9 -9 0.603 IDEA
2 United States 1954 42509905 98527000 -9 162725667 -9 -9 0.431 IDEA -9 -9 -9 -9 -9 -9 -9
2 United States 1956 58434811 106408890 -9 168903000 -9 -9 0.549 IDEA 62026908 106408890 -9 168903000 -9 -9 0.583 IDEA
2 United States 1958 45966070 103221000 -9 175038232 -9 -9 0.445 IDEA -9 -9 -9 -9 -9 -9 -9
2 United States 1960 68838204 109159000 -9 180684000 -9 -9 0.631 IDEA 68838219 109159000 -9 180684000 -9 -9 0.631 IDEA
2 United States 1962 53141227 112423000 -9 186512143 -9 -9 0.473 IDEA -9 -9 -9 -9 -9 -9 -9
2 United States 1964 70644592 114090000 -9 192119000 -9 -9 0.619 IDEA 70644592 114090000 73715818 192119000 0.958 -9 0.619 IDEA
2 United States 1966 56188046 116132000 -9 197730744 -9 -9 0.484 IDEA -9 -9 -9 -9 -9 -9 -9
2 United States 1968 73211875 120328186 81658180 200710000 0.897 -9 0.608 IDEA 73211875 120328186 81658180 200710000 0.897 -9 0.608 IDEA
2 United States 1970 58014338 124498000 82496747 203211926 0.703 -9 0.466 IDEA -9 -9 -9 -9 -9 -9 -9
2 United States 1972 77718554 140776000 97328541 208840000 0.799 -9 0.552 IDEA 77718554 140776000 97328541 208840000 0.799 -9 0.552 IDEA
2 United States 1974 55943834 146336000 96199020 214305134 0.582 -9 0.382 IDEA -9 -9 -9 -9 -9 -9 -9
2 United States 1976 81555789 152309190 105037989 218035000 0.776 -9 0.535 IDEA 81555889 152309190 105037986 218035000 0.776 -9 0.535 IDEA
2 United States 1978 58917938 158373000 103291265 221537514 0.57 -9 0.372 IDEA -9 -9 -9 -9 -9 -9 -9
2 United States 1980 86515221 164597000 113043734 227738000 0.765 -9 0.526 IDEA 86515221 164597000 113043734 227738000 0.765 -9 0.526 IDEA
2 United States 1982 67615576 169938000 110671225 233697676 0.611 -9 0.398 IDEA -9 -9 -9 -9 -9 -9 -9
2 United States 1984 92652680 174466000 124150614 236681000 0.746 -9 0.531 IDEA 92652842 174466000 124150614 236681000 0.746 -9 0.531 IDEA
2 United States 1986 64991128 178566000 118399984 239529693 0.549 -9 0.364 IDEA -9 -9 -9 -9 -9 -9 -9
2 United States 1988 91594693 182778000 126379628 245057000 0.725 -9 0.501 IDEA 91594809 182778000 126379628 245057000 0.725 -9 0.501 IDEA
2 United States 1990 67859189 185812000 121105630 248709873 0.56 -9 0.365 IDEA -9 -9 -9 -9 -9 -9 -9
2 United States 1992 104405155 189529000 133821178 255407000 0.78 -9 0.551 IDEA 104600366 189529000 133821178 255407000 0.782 -9 0.552 IDEA
2 United States 1994 75105860 193650000 130292822 262090745 0.576 -9 0.388 IDEA -9 -9 -9 -9 -9 -9 -9
2 United States 1996 96456345 196511000 146211960 265679000 0.66 -9 0.491 IDEA 92712803 196511000 146211960 265679000 0.634 -9 0.472 IDEA
2 United States 1998 73117022 210446120 141850558 280298524 0.515 -9 0.347 IDEA -9 -9 -9 -9 -9 -9 -9
2 United States 2000 99738383 213954023 156421311 284970789 0.638 -9 0.466 IDEA 105404546 213954023 156421311 284970789 0.674 -9 0.493 IDEA
2 United States 2002 -10 -9 -9 -9 -9 -9 -9 IFES -9 -9 -9 -9 -9 -9 -9
2 United States 2004 -10 -9 -9 -9 -9 -9 -9 IFES 122293548 -9 -9 -9 -9 -9 -9 IFES

The output of this step was stored in allVotes.

In the following step, the data was expanded to cover all years between independence or the first year in the dataset and 2006. We used the following rules for row expansion

The results of this step were stored in expandedVotes and exported as stataVotes. All conversions are performed by executing the macro createDataset. Election data with UN Population data was exported as stataVotes_UNPopulation.


©2006 by Department of International Relations, Lehigh University. Last update: 2006-05-18 13:38:20 -0400.
Contact corresponding author Bruce Moon.