A common critique that we get about results using the YourMorals dataset is that our sample is a volunteer sample prone to self-selection bias as all participants have a demonstrated interest in morality research and access to the internet. This clearly does not describe all people, or perhaps even a majority of people. This critique has merit and is important to remember in considering any result that we may report, and as such, I’m making this blog post so that I can link to it to acknowledge such limitations. However, this critique is not unique to our dataset and the problem of generalizing findings from one group to the population at large exists to varying degrees (and in many cases, to greater degrees) in all published research.
All research recruits some part of the population and generalizes findings to others. Nationally representative samples attempt to sample completely at random, achieving greater generalizability, but as a result cannot often go into great depth with people called at random on the phone. At the other end of the spectrum are the many studies conducted on college students, which are a very specific part of the population. The YourMorals dataset clearly falls somewhere between, in that it’s more diverse in terms of age and life situation than college samples (Mean age is around 35 with a standard deviation of 13 years), but clearly not representative.
The limitations of college samples and representative studies are well known, and inform the types of questions you can ask when sampling each group. Representative samples are the ONLY group you can reasonably use to ask questions about percentage rates in the population. For example, if you find that 70% of college students think Jon Stewart is cool, that is unlikely to be the same in pretty much any other group. However, you might instead look at liberal and conservatives within a college student sample and find that liberals like Jon Stewart while conservatives don’t. Or that liberal college students preferred Obama to McCain in the 2008 election. Those results are more likely to generalize since they are not about percentages in the population, but about the relationship between variables. Height may differ across samples, but taller people are likely better at basketball across samples. Exceptions exist, but usually that is not the norm.
Experimental research (which we are doing more of) is even more likely to generalize, but the issue of generalizability still remains. For example, all medical trials are done on volunteer populations, similar to our yourmorals data sample, and it is hoped that random assignment will mean that any result will be due to the treatment being different than the control. However, it is possible that the treatment will work better than the control, but only in that group. Logically, that seems unlikely as human physiology is relatively standard, but one could imagine a case where volunteers for a medical study were all people who had similar diets or had certain common genetic traits that interacted with effects. Again, this is the exception, rather than the rule. And even if we encounter such exceptions, they are usually informative.
With yourmorals data, we never suggest that we can generalize our mean values to the general population (e.g. if 6% of our users say they are libertarian, it has no real relation to the population). But we do feel that if we find a pattern in our data (e.g. our libertarians all score lower on measures of disgust), that our evidence for the existence of that general pattern is at least as good as most published research samples using non-representative data for the following reasons:
– Our users are well-educated and have stable well formed opinions (see David Sears’ article on College sophomores in the laboratory for more information on this).
– Our users are intrinsically motivated volunteers, and research has shown that such volunteers provide better data because they take surveys seriously (see Jon Haidt’s review of a Chang & Krosnick paper that showed this).
– Internet samples compare very well to non-internet samples in terms of data quality (see Gosling et al.)
– The fact that our results replicate a lot of published research (also see our libertarians paper) means that our findings have already replicated in different populations and we replicated our results in representative samples (to be published by Smith, C. & Vaisey, S. – Charitable giving and moral foundations in a nationallyrepresentative sample.) Empirically, the relationships found in our users’ data are typical of other samples, of the kind that other researchers use.
– Our users come from a variety of sources. New York Times readers are different than Dallas Morning News readers, search engine users, and people from the UK. We are able to replicate findings across these different sub-groups in our data. We also routinely replicate findings within our sample using different measurement methods on different groups.
– Our samples are large and diverse enough that even if our results do not generalize to all individuals, our results likely represent a large number of people. In such cases, our findings would represent an interesting interaction effect among better educated, internet savvy users.
Still, we readily admit that our users are not ‘average’ and seek to confirm our findings using materially different (uneducated, non-internet users) groups. No research by any single method or group of researchers is definitive and any study is evidence for the existence of something, not conclusive proof. Our findings are equally in need of replication and confirmation. Certainly, some studies are more conclusive than others and I feel confident that our findings are at least comparable, in terms of generalizability, with other research that asks similar types of questions. It is certainly possible that interaction effects with the types of people who are interested in morality exist (which would be interesting in and of itself), but our research is at least as good evidence as most published research in our area, especially given that the vast majority of research using psychological measures is done on non-representative samples of a smaller number of less diverse, less motivated research participants.
– Ravi Iyer