We psychologists hit a wall when we try to publish in sociology or political science journals: Our data is almost never obtained from nationally representative samples. We usually obtain convenience samples, either of undergraduate students at our universities, or from visitors to a website (such as YourMorals.org). These two methods invariably produce samples that are much better educated than would be obtained from nationally representative samples such as those used by the major polling organizations. Sociologists and political scientists therefore think that our findings are ungeneralizable and useless.
But the truth is more complex, and in some ways, our data is better than theirs. In an article just published in Public Opinion Quarterly, Linchiat Chang and Jon Krosnick report the results of a national field experiment in which they collected responses to a complex survey on political knowledge and voting behavior using three parallel methods:
- Representative-Telephone: They used Random Digit Dialing to obtain a nationally representative sample, which was then interviewed by a live interviewer (this was supposedly the gold standard)
- Representative-Internet: They hired Knowledge Networks to administer the survey. KN has painstakingly assembled a panel of people that is more or less nationally representative. Many of these people were given free Web-TV to allow them to connect to the internet. They are paid to respond to specific studies that they do not select.
- Volunteer-Internet: They used a collection of volunteers who had signed up with the Harris Poll Online in response to advertisements on the search engine Excite.com. Harris then emailed invitations to a subset of these volunteers in an effort to approach representativeness by age, sex, and region of the country, but because each person chooses whether to respond to each invitation, based on its content, the final sample was not at all representative.
Chang and Krosnick found–not surprisingly–that methods 1 and 2 yielded data that was more representative of the US population than did method #3. So if what you really need to know is the percentage of Americans who believe X, then you should do your darndest to assemble a nationally representative sample.
But psychologists rarely want to know the percentage of Americans who believe X. We’re trying to figure out how minds work, particularly different kinds of minds, or minds that have just been exposed to different primes or procedures. We need participants who can read and understand directions and then answer questions honestly and thoughtfully. Chang and Krosnick found that the Volunteer-Internet method yielded the highest quality data.
Data collected by telephone was the lowest quality data: these participants showed more random measurement error, more survey satisficing (just giving any answer to keep moving), and more social desirability bias (because they were talking to a live person).
Data collected by internet was much better: the internet has so many advantages over the telephone, such as the fact that people can see the questions and response choices — they don’t have to hold it all in their heads, as in a telephone survey. But the best data of all came from the Volunteer-Internet sample–lowest rates of random error and satisficing–because they were, on average more interested in the topic, more motivated to do a good job, more experienced with internet usage and web surveys, and (most likely) somewhat smarter than the Representative-Internet population.
And here’s another problem with probability samples: they are so expensive to obtain that the sample sizes are usually not very large– typically just one or two thousand–and the researcher is usually forced to use very short instruments, often just a single item, to measure the key constructs. In contrast, on a volunteer site such as YourMorals.org, where people come because they are interested, we can use much longer and better instruments, and we can collect data from tens of thousands of people.
The bottom line is that each method of data collection has advantages and disadvantages. There are times when psychologists want to know the percent of Americans who believe X (as in a survey about explicit attitudes about affirmative action); in such cases they should try to obtain a representative sample. But in the great majority of cases our questions are different. If we want to know how explicit attitudes about race relate to implicit attitudes, we’re asking a question about mental mechanisms, not about population distributions. We’re probably better off using a Volunteer-Internet sample, such as YourMorals.org, or ProjectImplicit.org.
Of course, generalizability matters. If we’re going to make claims about differences between liberals and conservatives from data collected at YourMorals, then we’ve got to be confident that those claims will generalize to liberals and conservatives more broadly. One simple way we do this is to track participants by the site that referred them to YourMorals. If our results hold up for people who came from a social conservative site, a libertarian site, a scientist’s site, and a mainstream media site, then we have good reason to believe that they will generalize quite broadly. (Ravi Iyer has done such an analysis, and demonstrated robustness across referral sites.)
A second method of establishing generalizability is to attempt to replicate our findings in a nationally representative sample. However, such replication may be prohibitively expensive (if the task takes more than 10 minutes), and a failure to replicate would be inconclusive (it may simply be a Type-II error due to the lower interest, motivation, or ability of the participants. The effect might still be generalizable, it just can’t be found in lower-quality data).
There’s an old saw in the social sciences: Psychologists can’t sample, sociologists can’t measure. Both skills are valuable, but for psychologists, good measurement is usually more important than good sampling.
I’m obviously biased to agree with you, but I would add a caveat that psychologists could use to improve research and which I think would make people listen to us more. We need to replicate our research more in diverse samples. I could see 2 things that might help with this….
– Give more research credit for replication, especially in interesting, diverse, or difficult to sample populations. As a research consumer, I’d rather see people build on existing theories/data rather than come up with something that is represented as entirely new, but is really a slight twist on what exists..unfortunately, the latter is rewarded more in my experience.
– More sub-sample cross-validation in internet research. These days, it’s easy to collect more data on the internet. I’m not sure psychologists are aware how simple it is to track participants found by different recruitment methods. It’s a natural and easy way to show that effects replicate and generalize to a larger sample. Maybe that generalizes to just internet users, but that’s not an insignificant sample of the world anymore, such that there are likely few populations it won’t generalize too if it replicates among all the different recruitment methods used.
At the risk of biting the hand that advises me:
I disagree with Jon here. Psychologists may not be that interested in making claims about exact percentages of national or international populations, but we DO want to make claims like “liberals care more than conservatives about Harm and Fairness, and conservatives care more than liberals about Ingroup, Authority and Purity.” And if we’re only studying liberals and conservatives who are educated and interested enough to find academic websites, what evidence do we have that our findings apply to liberals and conservatives in general? We’re not psychologically interested in educated and engaged ideologues, they’re just the ones easiest to reach. Subsample analyses help (and I agree with Ravi about the importance of such replication in our science), but if those subsamples are characterizing different kinds of educated, internet savvy people, then we still don’t know if we can make this claim about libs/cons in general, or only the kind of libs/cons who come to our website. Saying “but it’s higher-quality data this way” seems like field-justification, as does the other common rejoinder from psychologists, “but I’m doing experiments, not correlational studies,” as we as a field temporarily forget about interactions and pretend that experimental effects reveal universal, invariable human processes, so we can go on doing all those experiments with college undergrads. Anyone interested in morality, or prejudice, or any number of psychological phenomena need to also know how these phenomena play out among people who are harder to reach and more likely to give us “low-quality” data. Otherwise we keep studying political ideology as the difference between very liberal and slightly liberal rich university students.
Saying measurement is better than sampling is like a worker in a jelly factory convincing herself that peanut butter sucks and we’re better off without it. Clearly the goal here isn’t all-jelly sandwiches, but workers in both factories stealing from their employers and coming together to make delicious science sandwiches. (In this analogy, statisticians bring the bread.)
Hi Jonathan,
An outsider’s comment … I became a fan of MFT a couple of years ago after my son introduced me to your TED talk from back then. As someone who has found value in the self-awareness that understanding these morals and how they impact our opinions, I’d like to see more people become aware of the test and results.
I’m also a user of facebook and I notice how much people there enjoy taking short surveys and exploring themselves and comparing themselves with others. Some surveys/tests are silly, like “Which hollywood star am I most like?” but others are more interesting. For awhile all my friends were taking and reporting their result from a “Meyer-Briggs like” test you could take online through facebook.
I have no idea if you’d view facebook as a more useful cross section or not, but I ensure you, if you wanted to create a “buzz” around the concepts and theory, integrating the YourMoral.org test into facebook such that people can learn of and take the test would open a large audience. The facebook integrated tests/surveys I’ve seen usually make it optional as to whether your results are announced (to friends and/or family) or not … or whether “just the fact you took the test” is announced (with a link that lets your friends take it too). NOTE: I suspect if I heard “Sally” took the test, that would be one thing. If I learned “Sally” took the test and she was an XYZ … I might be influenced when I jumped in and took the test myself. So the sharing of individual results may pollute the study … I’ll let the experts judge that)
Integrating with facebook would serve two purposes: A) create a larger awareness of the theory which serves those how possess the knowledge and B) it would expand the sample base. I should also add that I’m a conservative and it bothers me a little that there are 5 times as many liberals as conservatives in your current sample base. Thats nothing like what I’ve read is the rough cross section of the population (USA?). 🙂
Anyway … 2 cents from an outside observer.
Dave…facebook is definitely on the agenda…if only there were more hours in the day…:) Thanks for the suggestion.
Jesse, I agree with you too, but I do think Jon has an overall point that demographic representativeness is not the be all/end all that it is sometimes made out to be. Interactions don’t just occur on demographic variables, but also on psychological variables which are rarely measured. Being able to generalize results is a function of not having a weird sample, but also a result of good measurement….and these days, I would say there is response bias in any research. I guess that’s the point your were making with your science analogy, so maybe I’m not adding anything new. Personally, I think all research is imperfect and therefore I only believe any data (representative, well measured, well designed, or not) if it is replicated enough times (a la fivethirtyeight.com in the last election). Non-representativeness is but one of many ways to be imperfect and is often not the most important imperfection to address.
Dear Jon, Ravi
I am not a psychologist either, but I have been following your work for a couple of years and as a student of social sciences, somewhat aware of methodological challenges we face for creating representative samples.
I have often wondered about the possibility of doing experiments (and I don’t mean just dictator games/ ethnographic work) in real world settings. Of course they are messy to carry out and resource intensive, but isn’t most of empirical work in economics also based on household surveys? I agree with Jesse that educated internet savvy individuals is in a group in itself (already exposed lots of ideas/interested in scientific research) – so yes, it’s good data but demographic variables are also correlated to psychological variables to an extent. It might be interesting to learn more about communities that are not studied very often by mainstream social scientists.
PC
[…] Earlier Jon Haidt discussed the “problem” of representativeness of the YourMorals data and concluded that it wasn’t such a problem after all. Convenience samples drawn from the internet can produce reliable data. This is particularly true when we are more interested in taking valid measurements than in painting a representative picture of some underlying population. […]
[…] – Our users are intrinsically motivated volunteers, and research has shown that such volunteers provide better data because they take surveys seriously (see Jon Haidt's review of a Chang & Krosnick paper that showed this). […]
[…] the Knowledge Networks panel study (referred to elsewhere) included an item asking individuals what they felt should be done about the gap. […]
[…] read some of the YourMorals.org blog entries on the subject. For example, Haidt’s entry entitled Nationally Representative Data is (sometimes) Bad Data for Psychology, and Ravi Iyer’s entitled Sampling limitations and what you can deduce from YourMorals […]