Culture... One Step at a Time


John B. Gatewood

[ Copyright (c) 1999, John B. Gatewood ]

Original of paper published in two parts by The Behavioral Measurement Letter (1999 and 2000).

There is a disciplinary joke in fisheries management circles. A commercial fishery is in crisis, and managers commission studies to ascertain the causes and possible solutions. The biologist evaluates a handful of variables pertaining to fish stock growth and mortality and writes a three page report concluding with a recommended policy action. The economist considers a dozen or so variables pertaining to the costs and benefits of the fishery based on alternative future scenarios and submits a fifteen page report with an either/or recommendation. The anthropologist considers the full multitude of factors contributing to how things got to be in such a messed up state and hands in a 200 page report with no recommendation.

Contrary to such folk humor concerning the idiosyncratic nature of anthropological research, in what follows I review some ways in which anthropologists can and do make headway understanding the cultural part of social life utilizing explicit, replicable methods. I will focus on a few of the rather specialized data collection techniques and analyses being used in cognitive anthropology-- specifically, free-listing, pile-sorts, triadic comparisons, and consensus analysis. Each of these topics is well presented in several recent anthropological methods books (Weller & Romney, 1988; Bernard, 1994; D'Andrade, 1995; Borgatti, 1996; and Bernard, ed., 1998). After introducing the four topics individually, I'll use one small data set to illustrate how these techniques can be used in concert.


One of the initial problems facing anthropologists interested in a cultural domain is how to ask meaningful questions, meaningful to the natives. In particular, one needs to phrase questions using natives' own categories (Frake, 1962, 1964). But what are these categories, what are the elements of the cultural domain, and how can a non-native discover them? The free-listing task is a good way to explore native vocabularies, and it also provides interesting information concerning the informants in one's sample.

A free-listing task is virtually the same as a free-recall task in psychology, except that the period of learning is not controlled by the researcher but rather consists of the previous (and variable) life experiences of the informants. Informants are simply asked to name (if non-literate) or write (if literate) all the items they can think of that match a given description. Examples might include:

"Please make a list of all the contagious diseases you can think of."
"Please name all the parts of a human body you can think of."
"Please write down all the phases of the human life cycle you can think of."
Conceptually, such tasks are quite simple and generally well understood by informants, they do not impose preconceived response categories, and they work well with non-literate as well as literate informants. Further, although the ideal is to work with informants one at a time, the same free-listing task can be given to multiple literate informants at the same time or even as a communal task for focus groups. Still, there are a few choices and problems the researcher should know beforehand.

First, the usual free-listing task (such as the instructions above) is virtually unconstrained, such that it is not always clear when informants are finished. Typically, informants recall the first several of their items rather quickly, but then slow down dramatically. Depending on the domain in question, the informant's knowledge of the domain, and his or her motivation, the task can take from a few seconds to ten minutes before the informant runs out of steam. Related to this 'no clear endpoint' problem, informants seldom enjoy free-listing tasks. The instructions sound easy, and often informants are initially eager. But once they get started and realize the full open-endedness of the task, many begin to feel apologetic for the shortness of their lists or resentful at being made to encounter their own memory limitations.

There are two effective ways to bring closure to the task, each making the task less onerous and more standardized across different informants' doing of it. The instructions can include an explicit time limit, e.g., "Please make a list of all the kinds of diseases you can think of in the next 3 minutes. Ready? Begin." Alternatively, the task can specify a maximum number of items to be identified, e.g., "Can you remember any sponsors of last year's Superbowl? If yes, please name up to three of them." Of course, the specific limitation chosen should be guided by pilot testing the unconstrained version of the task with a few informants to observe when they bog down and, also, by the research objectives and sample size. Specifying a maximum number of items to be listed works well enough if the domain itself is the primary research objective and the task is given to hundreds of informants. On the other hand, I prefer the time limit approach when working with small numbers of informants (e.g., 40 or fewer) and, especially, if I am interested in informant-level attributes. Also, when working with American college students, 90 seconds seems to be a pretty good time limit for many domains, such as kinds of kin, trees, mammals, fish, mixed drinks, hand tools, fabrics, or musical instruments. Less indoctrinated informants might prefer a somewhat longer task time.

The second main problem with free-listing arises after the data have been collected and analysis begins. The aggregated findings of free-listing tasks are displayed as a table in which rows contain the name of an item followed by the number of lists the item appeared in, and usually the table items are sorted by decreasing frequency of mention. Many cultural domains, however, are organized taxonomically (relations based on set inclusion), and hierarchical relations among items in a domain are not captured by a listing task. This, along with regular synonymity, can make establishing how many different items appear in the sample's lists quite problematic. Suppose, for instance, that you asked four informants to list 'five kinds of American cars' and obtained the lists in Table 1. How many different kinds of American cars appear in the four lists?

Table 1. Four Free-Lists of Kinds of American Cars
List A.
List B.
Ford Taurus
Chevrolet Corvette
Jeep Cherokee
Jeep Wrangler
Dodge Neon
List C.
List D.
sport utility vehicle
station wagon

Because we are very familiar with this cultural domain, we can see patterns in these responses. Apparently, Informant A interpreted the instructions to mean 'car companies.' Informant B likes binomial nomenclature but, like Informant C, has interpreted the task as asking for 'car models.' And, Informant D has interpreted the question functionally rather than along brand lines. (This sort of diversity in response indicates that our original expression, 'kinds of American cars,' is ambiguous and needs to be refined.) On the other hand, if we did not know quite a bit about this domain, then our initial tally would have to rely on linguistic differences to identify different items. And on this basis, there are 20 "differently named" items in the four lists.

There are two customary ways to deal these sorts of item identification problems. Immediately after completing the task, each informant can be asked to list alternative names for each item in his or her list. Second, the researcher can compile an initial aggregate table, then ask several of the more knowledgeable natives to judge the distinctiveness of the items. Either way, the idea is to enlist native experts to eliminate redundancy, but even these ploys are often indeterminate, as illustrated in Table 2.

Table 2. Alternative Aggregations of Free-Lists A, B, C, and D
Item Freq. Item Freq.
1. Ford Taurus : Taurus 2 1. Ford : Ford Taurus : Taurus : Explorer 4
2. Chevrolet Corvette : Corvette 2 2. Chevy : Chevrolet Corvette : Corvette 3
3. Dodge Neon : Neon 2 3. Dodge : Dodge Neon : Neon 3
4. Ford 1 4. Jeep : Jeep Cherokee : Jeep Wrangler 3
5. Chevy 1 5. Cadillac : Seville 2
6. Dodge 1 6. minivan 1
7. Cadillac 1 7. sport utility vehicle 1
8. Jeep 1 8. sedan 1
9. Jeep Cherokee 1 9. convertible 1
10. Jeep Wrangler 1 10. station wagon 1
11. Explorer 1
12. Seville 1
13. minivan 1
14. sport utility vehicle 1
15. sedan 1
16.convertible 1
17. station wagon 1

These problem areas aside, free-listing tasks are an excellent research tool to explore cultural domains (Gatewood, 1983), with the added benefit that their results enable interesting comparisons both across domains and across informants (Gatewood, 1984). If a single sample of informants is used, domains can be compared in terms of various indices, such as average/median length of list and number of different items generated. At the same time, informants--either as individuals or grouped by age, gender, expertise, ethnicity, etc.--can be also compared.


Once the items in a cultural domain are identified, we can begin to examine how they are defined and interrelated, their similarities and differences. In the 1950s and 1960s, the preferred technique was componential or semantic feature analysis (Lounsbury, 1956; Goodenough, 1956). But as different feature analyses can be formulated to account for the same data (the "psychological reality" problem), research interest shifted to studying the most salient or important semantic features in a given domain (see D'Andrade, 1995: 31-91). Pile-sorts are an easy way to collect data toward this end.

The single pile-sort asking informants to group items based on their overall similarity is the most commonly used variant of the pile-sort task. Informants are given a collection of stimuli--either the items themselves, cards with names of items, or pictures of items. The basic task is to group the stimuli such that similar items are in the same pile. Informants are free to make as many piles as they want and to place very unusual items in piles by themselves. Indeed, the only constraints are (1) there must be more than one pile: the extreme "lumper" solution is disallowed, and (2) not every item can be in a pile by itself: the extreme "splitter" solution is disallowed. The researcher then records each informant's pile-sort and asks why items were placed in their piles. For example, Fred's and Janine's sortings of 19 kinds of fish might be recorded as shown in Table 3.

Table 3. Two Informants' Single Pile-Sorts of 19 Kinds of Fish
Fred's Piles [Rationales]:
1. barracuda, piranha, shark [dangerous to humans]
2. bass, carp, pike, sunfish, trout [freshwater (mostly sport) fish]
3. marlin [ocean sport fish]
4. catfish, cod, flounder, herring, salmon, swordfish, tuna [grocery store fish]
5. goldfish, minnow [weird, little fish compared to the rest]
6. shad [don't know what this is]
Janine's Piles [Rationales]:
1. piranha, shark [dangerous fish]
2. barracuda, cod, flounder, swordfish, tuna [ocean fish]
3. bass, catfish, pike, sunfish, trout, salmon [fish found in lakes and streams]
4. goldfish, herring, minnow [very small fish]
5. carp, marlin, shad [don't know what these are]
(5. carp ... unknown item)
(6. marlin ... unknown item)
(7. shad ... unknown item)
Singleton items--such as marlin and shad in Fred's sorting--end up being scored as dissimilar with every other item. Given his rationales, Fred's two singleton items are properly separated. By contrast, Janine's fifth pile is problematic. She has lumped carp, marlin, and shad together because she doesn't know anything about the three of them, i.e., what they have in common is only that Janine doesn't know anything about them. The proper way to handle such "unknown" items is to treat each as a singleton; hence, when recording Janine's pile-sort, the researcher should split her residual fifth category into three singleton piles, as illustrated. Only by asking informants for brief rationales, however, can we catch such bogus groupings and prevent the variable of informant ignorance from distorting the results.

In doing a single pile-sort, each informant is essentially judging the similarity of every item vis-à-vis every other item using a dichotomous scale--any two items are either in the same pile (similar) or they are in different piles (not similar). Thus, when preparing these data for analysis, an informant's similarity judgments are represented as an item-by-item matrix in which each cell contains either 1 (items appear in the same pile) or 0 (items appear in different piles), and each informant produces one such matrix. The aggregate judged similarity for any two items, itemi and itemj, is calculated by adding the values in celli j across all N informants and dividing by N. The resulting number is the percentage of informants in the sample who placed itemi and itemj in the same pile. The most salient similarities and differences among items in the domain can then be examined by subjecting the aggregate similarity matrix to multi-dimensional scaling or hierarchical clustering.

The principal advantages of the single pile-sort are that the task is easy to administer and informants enjoy doing it. Another advantage is that the task can be done with relatively many items, such as 30 to 50. And, one can achieve reliability coefficients of .90 or higher with respect to the aggregate similarity matrix using as few as thirty to forty informants (Weller & Romney, 1988: 25). On the other hand, since informants are free to come up with different numbers of piles, single pile-sorts have limited utility with respect to comparing individuals. In particular, the 'lumper' versus 'splitter' variation tends to overwhelm other characteristics that might differentiate informants. Other variants of the pile-sort task--multiple sorts, successive pile-sorts, and the Boster pile-sort--all obtain more information per informant and, by imposing uniform constraints, are better suited to making comparisons among informants. Weller and Romney (1988: 20-31) provide an excellent discussion of the various pile-sort tasks, including their strengths and weaknesses.

Triadic Comparisons

Another way to obtain overall similarity judgments is to present items, three at a time, and ask informants to pick the one that is most different from the other two. The procedure is repeated until each item has been presented in a triad with every pair of other items. To avoid uncontrolled order effects and response biases, however, it is important to randomize the presentations, both among and within the triadic sets. For example, using this method to obtain similarity judgment among four kinds of fish, we might obtain results like those in Table 4 (where capitalization indicates the informant's 'odd item out' judgments).
Table 4. One Informant's Triadic Similarity Judgments among Four Kinds of Fish
Triad 1: BASS salmon trout
Triad 2: tuna BASS salmon
Triad 3: trout bass TUNA
Triad 4: salmon tuna TROUT

Like the single pile-sort, such data can be represented as an item-by-item matrix. Each triad involves three pairwise comparisons, i.e., ABC breaks down into three pairs: AB, AC, and BC. Thus, for each triad, there are three similarity scores: the pair of 'not chosen' items are judged similar (scored 1), and the other two pairings are judged not similar (scored 0). Following this scoring procedure, Table 5 shows the matrix representation of the information contained in Table 4. (Note: Because similarity data are symmetric, we display only the lower-half the matrix.)

Table 5. Matrix Representation of Data from Table 4
Bass Salmon Trout Tuna
Bass --- --- --- ---
Salmon 0 --- --- ---
Trout 1 1 --- ---
Tuna 0 2 0 ---

Informants are quick to understand this task, whether it is administered in a written form or orally. Another advantage of the method is that, because all informants perform the same task, the resulting data enables comparisons among individuals. And so long as there are only a few items in the domain, informants find the task mildly amusing. Unfortunately, as the number of items goes up, the number of triads required increases dramatically--the combination of n things taken three at a time, or n! / 3!(n-3)! --and informants quickly lose patience when confronted with a large number of triads. For example, 56 triads are required for eight items, 120 for ten items, and 969 for nineteen items. Although one can use a balanced incomplete block design (BIB) to reduce the number of triads given each informant, this forces the researcher to chose whether the primary objective concerns items in the domain or informant differences with respect to the domain. If the objective is to compare informants, then the same subset of triads should be given to each informant, whereas if the focus is more on the items themselves, then each informant should get a different randomly selected BIB instantiation (see Borgatti, 1996, for a fuller discussion of this point). Thus, practical concerns dictate that the number of items be no more than 8 to 10 for complete designs and no more than about 25 for balanced incomplete block designs (Weller & Romney, 1988: 37).

In a complete design, such as the example above, each pair of items occurs in n-2 triadic sets. For example, salmon and tuna were judged similar in both of the triads in which they occurred, whereas salmon and trout were judged similar in only one of their two co-occurrences. If we had eight items, then each pair would co-occur in six triadic sets, and so forth. Following this logic (and modified appropriately for balanced incomplete block designs), the aggregate similarity for any two items can be expressed as a percentage, i.e., the number of triads in which the two items are judged similar by all informants divided by the total number of triads in which the two items are presented to all informants. Thus, aggregated data from triadic comparisons are similar in form to the results from pile-sorts and susceptible to the same kinds of subsequent analyses.

Consensus Analysis

So far, I have been reviewing somewhat unusual methods for collecting data. By contrast, consensus analysis is an interesting technique cognitive anthropologists have devised to analyze data. Although many of the ingredient ideas have been percolating in a variety of literature for a couple of decades, the theory and mathematical procedures of consensus analysis were drawn together by Romney, Weller, and Batchelder (1986). Here I can do no more than sketch the basics and refer readers interested in details to the original source.

The essential problem that consensus theory addresses is as follows. Given that members of a culture do not uniformly agree with one another in their beliefs about what is true or proper, how can an outsider tell if there is a 'common culture' underlying their diverse opinions? The key to answering this question lies in realizing that (1) no one knows all of his or her group's culture and (2) agreement is a matter of degree. In particular, experts in a cultural domain should agree with one another more than non-experts do (see Boster, 1985). Following this intuition, consensus theory assumes that "the correspondence between the answers of any two informants is a function of the extent to which each is correlated with the truth" (Romney, Weller, & Batchelder, 1986: 316) and focuses precisely on the variable extent to which informants converge on the same answers to systematically asked questions.

For example, suppose Mr. Smith gives a multiple-choice test to his class, but arriving home discovered that he has lost the answer key. Could he grade the students' answer sheets anyway? Yes, he could (Batchelder & Romney, 1988). If students did not know the correct answer to a question, then they would just guess, and guessing should produce predictable proportions of agreement across the available answers. On the other hand, if students know the correct answer, then they will converge on the same answer (the "correct" one) more frequently than expected just by chance. Knowledge--cultural competence in a domain--produces deviations from equiprobability, and more knowledgeable individuals will agree with one another more often than less knowledgeable individuals do.

The ingenuity of consensus analysis is that it provides a way to estimate the cultural competence of individual informants from the patterning of their agreement. The formal model rests on three assumptions (Romney, Weller, & Batchelder, 1986: 317-318):

1. Common Truth. The informants all come from a common culture, such that whatever their cultural version of the truth is, it is the same for all informants.

2. Local Independence. Informants' answers are given independently of other informants, i.e., there is no collusion or influence among informants.

3. Homogeneity of Items. Questions are all of the same difficulty, such that each informant has a fixed cultural competence over all questions.

If these three assumptions are met, then the Eigenvalue of the first factor of a minimum residual factor analysis of a chance-corrected agreement matrix to a battery of questions will be three or more times larger than the Eigenvalue of the second factor. When this condition obtains, generally informants' loadings on the first factor should all be positive and the mean loading will be between .5 and .9. For such data, each informant's first factor loading is his or her relative "competence score," or how well that individual represents the entire sample's answers to the questions asked. In short, a competence score can be interpreted as the percentage of questions an informant answered "correctly."

Conversely, if the ratio of the first to the second Eigenvalues is less than about three, then one or more of the three assumptions must be not true of the data. Because the local independence assumption can be upheld during data collection and because the homogeneity of items assumption is robust to deviations, should the ratio of Eigenvalues be low, we are generally safe concluding that the reason is lack of a common culture, e.g., sub-cultures (systematically different ways of answering) may exist in our sample.

Consensus analysis works well with many kinds of data: true-false, check lists, belief-frames, multiple-choice, rankings, ratings, and even proximity matrices (such as similarity matrices). There are, however, two important limitations: (1) the battery of questions must be of a single type, such as all multiple-choice, all similarity matrices, etc., and (2) the questions must ask informants for conventional truths or judgments, not their personal preferences or histories. For instance, consensus analysis is well-suited for questions asking informants to check all diseases on a list that are serious, contagious, result in a high fever, or for which one should go to see a doctor, but makes little sense if informants are asked to check all the diseases that they have actually had in their lives. Similarly, consensus analysis is appropriate for pile-sort data of mixed drinks based on their similarities, but not pile-sorts based on whether the drinks taste good. The reason is simple: agreement with respect to preferences or histories is not knowledge-driven.

An Integrative Example

Results from a small class project I did in 1993 will illustrate how the different methods and analyses, discussed above, can be used in concert. The data come from fourteen college students, whose knowledge of a single cultural domain (kinds of fish) was probed via free-listing, single pile-sort, triadic comparison, and some ranking and rating tasks.

Free-listing was the first task. The instructions were to list all the different kinds of fish they could think of in 90 seconds. Collectively, the fourteen informants produced lists containing 226 items totaling to 115 different kinds of fish. Table 6 shows the aggregated results for the 43 kinds of fish mentioned by least two people.

Table 6. The 43 Most Frequently Listed Kinds of Fish (N=14)

Goldfish 10
Trout 8
Salmon 8
Flounder 7
Catfish 7
Sunfish 7
Bass 7
Tuna 7
Piranha 6
Shark 5
Swordfish 5
Rainbow trout 5
Carp 4
Guppy 4
Cod 3
Muskie 3
Red salmon 3
Bluefish 3
Largemouth bass 3
Pickerel 3
Blowfish 2
Walleye 2
Steelhead 2
Rock bass 2
Beta 2
Great white shark 2
Bullhead catfish 2
Brown trout 2
Brook trout 2
Pink salmon 2
Barracuda 2
Bluegill 2
Striped bass 2
Blue catfish 2
Remora 2
Jellyfish 2
Channel catfish 2
Dog salmon 2
Smallmouth bass 2
Clams 2
Minnow 2
Yellowfin tuna 2
Scrod 2

From the 115 kinds of fish the students free-listed, I chose 19 varieties for further study. Given the large number of items in this cultural domain, single pile-sorting was the most obvious method for obtaining similarity judgments, and I could have chosen more items for this task. However, I wanted students to see that similar results could be obtained via triadic comparisons. By using a balanced incomplete block design (n = 19, k = 3, lambda = 2) for the triads task and the same items for the pile-sort, the results of both methods would be directly comparable without overburdening my captive students. Thus, students did the single pile-sort task one day and the triadic comparisons the subsequent class period.

Figure 1 is a non-metric multi-dimensional scaling of the aggregate similarity matrix from the single pile-sort data, and Figure 2 is from the triads data. (In these plots, the closer items appear to one another, the more similar they were judged to be.) Visual comparison of the two figures indicates the two methods produced similar results. Indeed, the Pearson r between the two similarity matrices is .74. Normally, we would expect to find greater reliability between these two methods, but bear in mind that we had only fourteen students doing the single pile-sort rather than the recommended thirty to forty. Also, when instantiating the BIB triads design, I opted for each student receiving the same subset of triads (so I could better compare informants) rather than the domain-focused choice of different instantiations of the design for each student. Given these shortcomings of the class project, the obtained reliability of .74 is not so bad.

Figure 1.

Figure 2.
As my major interest in the project was to compare informants with respect to their knowledge of fish using a variety of methods, I included three auxiliary rating and ranking tasks after we had completed the free-listing, pile-sort, and triadic comparisons. The first of these asked informants to rate their own knowledge for each of the 19 fish varieties using a 5-point scale, where 1 indicated "never heard of it before [before the class project]" and 5 indicated "know quite a bit." And, because there had been some class discussion concerning kinds of fish following the pile-sort and triads tasks, the second and third auxiliary tasks asked each informant to rate all the students on a 5-point 'novice to expert' scale, then to rank order everyone in the class with respect to how much they knew about fish.

In all, the project garnered four direct measures of informants' knowledge of the domain: (length of free-list, self-rating of knowledge, social-rating of expertise, and social-ranking of expertise) and three indirect measures (competence scores from consensus analyses). These informant-level measures are presented in Table 7.

Table 7. Informant-Level Measures
Mean Mean Mean a Pile-Sort Triads Triads
Items in Self-Rating Soc-Rating Soc-Ranking Comp. Score Comp. Score Comp. Score
Free-List of 19 fish of Expertise of Expertise (IndProx) (IndProx) (UnRnd)
Josh 34 4.95 5.00 14.00 .78 .43 .38
Matt 30 3.84 3.50 10.00 .32 .41 .37
Sara 26 3.05 2.64 7.79 .80 .62 .63
Judy 21 3.68 3.00 8.07 .74 .61 .62
Corey 19 3.47 3.07 8.21 .55 .56 .63
Hassan 17 4.16 3.93 11.50 .82 .37 .24
Nora 15 2.95 2.43 6.00 .25 .57 .61
Jennifer 13 2.53 1.86 3.71 .03 .67 .68
Kevin 12 3.37 3.50 10.07 .89 .56 .59
Derek 10 4.00 3.43 10.14 .77 .47 .46
Lisa 10 2.84 1.86 2.36 .70 .66 .69
Beth 7 2.79 1.93 4.50 .74 .52 .53
Serene 7 2.84 1.64 3.14 .68 .62 .64
Hope 5 2.58 2.14 5.50 .43 .63 .67
Mean 16.143 3.361 2.852 7.500 .607 .551 .553
StDev. 8.601 .673 .927 3.317 .247 .093 .133

a The scores for expertise ranking (fourth data column) have been inverted to keep the meaning of correlation coefficients' signs clear in what follows. For example, Josh was uniformly regarded as the most knowledgeable in this domain; hence, his inverted score is 14 rather than 1.

Note that Table 7 displays three competence scores: one for the pile-sort data, and two for the triads data. This is because there are two ways to handle consensus analysis for the triads task. Either one can use the lower-half of each individual's similarity matrix, as is done for pile-sort data (resulting in n(n-1) / 2, or 171, "multiple-choice" items), or one can use the unrandomized tallies of the triadic sets that were actually given to each informant (114 "multiple-choice" items, given the BIB design employed in this instance).

Incidentally, all three consensus analyses meet the conditions of the formal model. For the pile-sort data, the ratio of the first factor's Eigenvalue to the second is 6.873. For the two analyses of the triads data, the ratios are 5.476 and 4.521, respectively. Hence, the patterning of agreement among the fourteen students indicates there is a culturally correct way of doing the pile-sort, and likewise for the triads task. Kevin best exemplifies the sample's "correct" way of pile-sorting (competence score of .89), whereas Jennifer (.67 and .68) and Lisa (.66 and .69) best represent the sample's "correct" responses to the triads task.

The last points to be made from this example concern the correlations among different individual-level measures. Table 8 shows the correlation matrix of all seven measures, which includes a couple of surprising findings.

Table 8. Pearson Correlation Coefficients among Informant-Level Measures
( boldfaced values are significant at p < .05, two-tailed )
Self-Rating Soc-Rating Soc- P-S: Comp Tri: Comp.1 Tri: Comp.2
Free-List of 19 Fish of of (IndProx) (IndProx) (UnRnd)
Free-List .685 .703 .675 .042 - .457 - .460
Self-Rating .685 .952 .914 .429 - .801 - .788
Soc-Rating .703 .952 .981 .378 - .777 - .753
Soc- .675 .914 .981 .384 - .789 - .758
P-S: Comp. .042 .429 .378 .384 - .274 - .279
Tri: Comp.1 - .457 - .801 - .777 - .789 - .274 .974
Tri: Comp.2 - .460 - .788 - .753 - .758 - .279 .974

The seven measures form into three logical groupings. (1) Length of free-list and self-rating of knowledge with respect to 19 fish are both related to how much informants really know about this cultural domain. But as Don Campbell was fond of saying, "All measures are fallible;" and the correlation between these two is somewhat lower than one would like (r = .685). Length of one's free-list is affected by motivation, and self-ratings presume shared understanding of the response scale. (2) The two social evaluations differ only in the manner in which informants are allowed to express their opinions--rating versus ranking--and are highly correlated (r = .981). And, (3) because the instructions for both the pile-sorting and triadic comparisons tasks asked for overall similarity judgments, we are tempted to think that all three competence scores should be positively interrelated. But as inspection of Table 8 reveals, such is not the case.

The two ways of assessing triadic competence are highly correlated (r = .974), as expected, but competence doing the pile-sort task is inversely related to competence at triads (r = -.274 and -.279, respectively). How can this be? What does it mean? Recall that the two methods did produce convergent results with respect to their aggregate item-by-item similarity matrices (e.g., Figures 1 and 2); hence, the negative correlation does not bear on the question of inter-method reliability. Rather, competence scores indicate the degree to which individual's responses represent the patterning of agreement in the entire sample--how well an individual "measures" the group consensus. Thus, the 'surprising' negative correlation simply means that students whose pile-sorts resembled those of other students tended to be more idiosyncratic in the ways they answered the triadic comparisons, and vice versa.

Another interesting finding in Table 8 is the ability of students to evaluate one another's expertise. That is, the correlations between the social evaluations and the self-generated indicators of knowledge are quite high. Having been in the classroom and listened to the brief occasions when anyone demonstrated knowledge or lack thereof, I am awed by the students' sensitivity to and convergent interpretations of very subtle clues.

Lastly, perhaps the most puzzling and important finding in Table 8 is the non-correspondence between the direct measures of informants' knowledge of fish and their cultural competence scores on the pile-sort and triads tasks. The more knowledgeable informants (as determined by the four direct measures) are slightly more typical of the group with respect to their pile-sorts. But contrary to other studies (Brewer 1995), they are very atypical with respect to their triadic judgments. Indeed, for the triads task, the most domain-knowledgeable informants are the poorest representatives of the sample's common culture. Their greater knowledge of fish did not produce greater agreement, rather the sample's consensus was formed by relatively ignorant informants who agreed among themselves. This finding provides a general caution to those who might use consensus analysis uncritically. There are situations where 'knowledge of the common culture' means being fairly ignorant, and sometimes ignorance produces its own patterning of agreement. For some tasks, less knowledgeable people see only one way of doing it, whereas experts are aware of alternatives (see, also, Boster & Johnson, 1989). In such cases, the more informants know, the less likely they are to agree with others. As a way of reminding ourselves that such possibilities exist, perhaps it is would be wise to translate consensus analysis's competence scores as "representativeness scores."


My main purpose in this essay has been to explain enough about free-listing, pile-sorting, triadic comparisons, and consensus analysis to provoke readers to learn more. I might also mention that these techniques are being used increasingly by medical anthropologists and others doing applied research (e.g., Mathews 1983; Weller 1984; Garro 1986; Boster & Weller 1990; Ryan, Martinez, & Pelto, 1996; Weller & Mann, 1997).

In closing, let me note that these data collection and analytical techniques are much less laborious than they once were thanks to ANTHROPAC (Borgatti 1998), a PC-based software package. Whereas tabulating a free-list task with forty informants used to take hours or days of uninterrupted work, ANTHROPAC reads in text files, tallies items, and computes the appropriate descriptive statistics in seconds. Similarly, ANTHROPAC not only reads and analyzes pile-sort and triadic comparison data, but it can generate randomized triads questionnaires using many different BIB designs. It also includes a variety of analytical tools, such as consensus analysis, multi-dimensional scaling, and hierarchical clustering. To learn more about this worthwhile software package, including pricing, see Analytic Technologies' web page: .


Batchelder, W. & Romney, A. K. (1988). Test theory without an answer key. Psychometrika, 53, 71-92.

Bernard, H. R. (1994). Research Methods in Anthropology, 2nd Edition. Thousand Oaks, CA: Sage Publications.

Bernard, H. R. (Ed.) (1998). Handbook of Methods in Cultural Anthropology. Walnut Creek, CA: Altamira Press.

Borgatti, S. P. (1996). ANTHROPAC 4.0 Methods Guide. Natick, MA: Analytic Technologies.

Borgatti, S. P. (1998). ANTHROPAC, Version 4.95X. Natick, MA: Analytic Technologies.

Boster, J. S. (1985). "Requiem for the omniscient informant": There's life in the old girl yet. In J. Dougherty (Ed.), Directions in Cognitive Anthropology (pp. 177-197). Urbana: University of Illinois Press.

Boster, J. S. & Johnson, J. C. (1989). Form or function: A comparison of expert and novice judgments of similarity among fish. American Anthropologist, 91, 866-889.

Boster, J. S. & Weller, S. C. (1990). Cognitive and contextual variation in hot-cold classification. American Anthropologist, 92, 171-179.

Brewer, D. D. (1995). Cognitive indicators of knowledge in semantic domains. Journal of Quantitative Anthropology, 5, 107-128.

D'Andrade, R. (1995). The Development of Cognitive Anthropology. New York: Cambridge University Press.

Frake, C. O. (1962). The ethnographic study of cognitive systems. In T. Gladwin and W.C. Sturtevant (Eds.), Anthropology and Human Behavior (pp. 72-85). Washington, D.C.: Anthropological Society of Washington.

Frake, C. O. (1964). Notes on queries in ethnography. American Anthropologist, 66 (3, Part 2), 132-145.

Garro, L. C. (1986). Intracultural variation in folk medical knowledge: A comparison between curers and noncurers. American Anthropologist, 88, 351-370.

Gatewood, J. B. (1983). Loose talk: Linguistic competence and recognition ability. American Anthropologist, 85, 378-387.

Gatewood, J. B. (1984). Familiarity, vocabulary size, and recognition ability in four semantic domains. American Ethnologist, 11, 507-527.

Goodenough, W. H. (1956). Componential analysis and the study of meaning. Language, 32, 195-216.

Mathews, H. F. (1983). Context-specific variation in humoral classification. American Anthropologist, 85, 826-847.

Lounsbury, F. G. (1956). A semantic analysis of Pawnee kinship usage. Language, 32, 158-194.

Romney, A. K., Weller, S. C., & Batchelder, W. H. (1986). Culture as consensus: A theory of culture and informant accuracy. American Anthropologist, 88, 313-338.

Ryan, G., Martinez, H., & Pelto, G. (1996). Methodological issues for eliciting local signs/symptoms/illness terms associated with acute respiratory illnesses. Archives of Medical Research, 27, 359-365.

Weller, S. C. (1984). Consistency and consensus among informants: Disease concepts in a rural Mexican town. American Anthropologist, 86, 966-975.

Weller, S. C. & Mann, N. C. (1997). Assessing rater performance without a "gold standard" using consensus theory. Medical Decision Making, 17, 71-79.

Weller, S. C. & Romney, A. K. (1988). Systematic Data Collection. Qualitative Research Methods, Volume 10. Newbury Park, CA: Sage Publications.

Number of visitors to this page: