## 4 Fair Sampling

Introduction

A search of the phrase, “New Study Shows…” using Google churns out a variety of results. A glance over the first page of yields articles with titles stating, “Cheese Is as Addictive as Drugs, New Medical Study Shows…” from US Weekly or, “New Study Shows that People Stop Listening to New Music at 33” from AV Club. One might wonder who is participating in these studies and if they are accurate. This is a relevant question, because the validity of all social statistics hinge on the representativeness of the samples of populations they study (Best, p. 52). In the many wide-ranging fields of science, the analysis of data is used as a means to draw conclusions about the features of a given population. (Figure 1).

Figure 1. Random Sampling

[Untitled Illustration of Random Sampling]. Retrieved October 29, 2015 from

The application of statistics to social science research began to form during the 19th century when it was used as a partisan tool to shape discussions about social issues. Researchers made arithmetical estimations about the prevalence of certain societal problems and molded the numbers to achieve political ends (Best, p. 11). Statistics began to take on a quantitative nature with time, shifting away from the abstract in favor of “accuracy and objectivity” that “offered a way of making studies more precise” (Best, p. 12). While statistics are still manipulated today in order to sway public opinion and achieve specific goals, they have assumed a more numerical and accurate nature.

Scientific studies seek to make accurate, yet widespread statements about populations, ranging in size from the very small to the enormously large. More often than not, researchers cannot study every member of a given population as such an approach would be time consuming, costly, and impractical. Statistics are utilized to study a trend within a small but representative subset of a larger population. This trend can than be used to make assumptions  about the population as a whole. This process of estimating the characteristics of a population through analysis of a reduced test group is referred to as sampling. (Diamond, p. 110 ). Those who contribute to studies involving sampling are known as subjects or participants (Psychology, p. 62). An example of a statistical study of this nature would be, “What is the average income of full time workers in Great Britain?” (Diamond, p. 110). In this example, the population would be all full time workers in Great Britain. Because this study is interested in exploring a large group of people, investigators develop a sample that is a representative of the population of Great Britain in all its nuances. Statisticians may choose to employ random sampling to select a group of 1,000 full time workers by their national insurance numbers (Diamond, p. 110). The resulting statistic may be considered an accurate portrayal of the number of full time workers in Great Britain and would be a good example of fair sampling.

What Is Fair Sampling?

Accurate, well-drawn samples can be established using several diverse methods. Ideally, these sampling methods make representative assessments of trends within a population of interest. These conditions constitute fair sampling. (Diamond, p. 110 ). Unfortunately, these “accurate, well-drawn”  samples are deceptively difficult to produce. When a sample does not adequately represent the population it is studying, it is known as a sampling error. Sampling error often leads to the misleading or misinterpreted statistical studies mentioned earlier in the paper. Additionally, deceptively presented statistical analysis can contribute to the lay public’s faulty interpretations of scientific data.

Sampling size is typically far smaller than the population the sample represents, as it is more economical and efficient. Consequently, sampling can be likened to generalization (Best, p. 52; see Figure 2).

Figure 2. Samples as Generalizations of Population

[Untitled Image of Sampling]. Retrieved October 29th, 2015 from

While sample size can influence the accuracy of statistical analysis, it is possible to create accurate results with a smaller sample size (Best, p. 53). More important than sample size is representativeness of results. Even a larger sample size can reflect bias in sampling tactics (Diamond, p. 111).  Creating a representative sample of the broader population is central to statistical analysis in the social sciences.

In order to create a fair, representative sample, both the population and the property of interest “must be clearly defined” (Diamond, p. 111). Standard deviation is used to express the variability in a defined population, and as the sample size increases, the standard deviation decreases. High standard deviation may reflect sampling bias. When a sample is biased, this means that sampling error has occurred. A sample may no longer be an accurate portrayal of the population about which to make generalizations (Diamond, p. 111).

When a sampling error occurs, this affects the reliability of a study in several ways. For example, the mean of a sample may not reflect the mean of the population at large. This estimate will be “in error” (Diamond, p. 111). Sampling error can be suggestive of the sampling technique used in composing a test group for a study (Diamond, p. 111). The inaccuracies reflected in the data obtained from such studies can be avoided if effective sampling methods are used.

It is important to specify some potential oversights that affect the accuracy of studies that are not rooted in sampling error. Poorly worded survey questions may be misunderstood by participants and cannot be attributed to sampling error . In addition, interview bias on the part of the researchers can negatively influence or skew results. (Diamond, p. 112) It is important to differentiate between these factors and the elements of sampling in order to understand fair sampling.

Most Commonly Used Sampling Strategies

The model sampling method is a simple random sample. Ideally, in a simple random sample, every member of a population has equal likelihood of being chosen. This means that there can be no bias in selection. A truly random sample is fairly simple to obtain from a quantified, catalogued population, such as the students enrolled at a school. In this situation, “purely mechanical methods” (Savage, p. 389) can be used in assembling a sample group. However, because they are extremely dynamic and diverse, the actual proportions of a majority of populations are indefinite. This renders genuinely random samples particularly difficult to come by (Best, p. 54).

Although less optimal than random sampling, convenience sampling is the most prevalent method employed in psychological studies. However, it is difficult to know whether this type of sampling can create a representative investigation of a population because many factors can bias the numbers (Best, p. 56). Recall the trope of the introductory psychology student participating in a research study at college. This cliché has its basis in history, as studies have shown that a great bulk of psychological studies have utilized college students as participants . However, while college students are easy to recruit for academic studies, they are not necessarily representative of the broader population is in question. College students usually don’t reflect the characteristics of a more general population due to a number of factors  (Psychology, 62). Because convenience sampling consists of volunteers, it cannot be random in the literal sense of the word. The typical college student presents a narrow and biased representation of a general population (Savage, p. 389).

Although less common than convenience sampling, stratified sampling can duplicate the effects of random sampling to a certain extent. The aim of stratified sampling is to create a representative depiction of the various strata in a given population. The most common stratification used in stratified sampling is gender and age (Diamond, p. 112). When composing a stratified sample, the population of interest is divided into several strata such as gender or socioeconomic status. Subjects are then selected from each stratum. Random sampling becomes simpler when selecting participants from smaller strata rather than from a much larger population. A stratified sample size will ideally be composed in proportion to the size of the population being studied (Savage, p. 391). More variable strata often include a larger group of participants. This decreases the standard deviation. More consistent strata on the other hand, can be smaller (Diamond, p. 112). While stratified sampling can be more effective than flawed methods of “random” sampling, it requires in-depth knowledge of the nuances of a population and can often be expensive (Savage, p. 391).

Examples of Sampling Error in the Real World

Unfortunately, sampling error has detrimental effects on many research-based studies. The notorious 1936 Literary Digest Poll, involving survey research (Figure 3.), is a prime example. The Literary Digest, a weekly publication, prided itself on having correctly predicted  the outcome of every presidential election since 1920. They attributed their success to “a [straw] Poll fairly and correctly conducted” (Squire, p. 126). On the eve of the 1936 election, the Literary Digest predicted 41% of the vote for Roosevelt and 55% of the vote for his opponent Landon. These results turned out to be entirely off the mark; come election, Roosevelt won 61% of the vote, with Landon only receiving 37%, a landslide victory (Squire, p. 127). Explanations of the Literary Digest’s embarrassing gaffe have persisted over the years. The popular hypothesis is that the magazines sampling methods excluded the “supposed core of Roosevelt’s support, the poor” (Squire, p. 128). This blunder constitutes a major sampling error on the part of of Literary Digest pollsters. Their failure to include low socio-economic status voters in their poll created a sample that was not representative of the voting population . This mistake was reflected in their inaccurate results.

Figure 3. Survey Sampling

(Psychology, p. 52 )

In a more harmful case of sampling error, a discredited 1998 paper linking autism to MMR vaccines continues to influence parents’ decisions whether to immunize their children (Haberman, 2015). Although sampling error was not the only negative facet in the now refuted study, it was certainly an important contributing factor. The study, published in the British medical journal The Lancet, claimed to have investigated 12 children with enterocolitis and regressive developmental disorders, concluding that MMR vaccination was associated with the onset of these conditions (Eggerston, 2010). However, subsequent investigations into the study concluded that participants were “carefully selected” (Eggerston, 2010). This would indicate sampling bias in the testing group. Furthermore, the representativeness of the limited sample size is questionable. Ultimately, the harmful effects of sampling error are apparent in this example. In 2008 and 2009, sizable measles outbreaks in the United Kingdom were credited for the reduced number of children receiving the MMR vaccine (Eggerston, 2010). In a more recent example, this past summer a measles outbreak plagued Disneyland in Southern California (Haberman, 2015). These are just two instances in a string of measles outbreaks since the release of the study. Despite the fact that measles can be fatal, the influence of this 1998 study persists.

Applying Your Knowledge of Fair Sampling

As is clear from the above example, sampling error can contribute to detrimental conceptions about science and society. Sociology professor Joel Best (2012), in his study Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists, advocates for a critical approach to interpreting statistics, stating that, “we think of statistics as facts that we discover, not as numbers we create” (p. 160). When analyzing studies, it is important to be aware of how sampling might have affected the results. Sampling error can, “simplify reality in ways that distort our understanding” (Best, p. 161). Fair sampling, on the other hand, informs studies that impart important truths about society. Keeping in mind both the limitations of sampling as well as the method of sampling used can frame the public’s understanding of scientific research. This approach does not allow for blind confidence in any given study nor does it promote an entirely distrustful approach. Rather it creates a productive and thoughtful environment from which to examine and interpret science.

References

Best, Joel. (2012). Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists. Berkley, CA: University of California Press.

Diamond, Ian. (2001). Beginning Statistics: an Introduction for Social Scientists. Thousand Oaks, CA: Sage.

Eggerston, Laura. (2010, March). Lancet retracts 12-year old article linking autism to MMR vaccines. Cmaj. Retrieved from http://www.cmaj.ca

Haberman, Clyde. (2015, February). A Discredited Vaccine Study’s Continuing Impact on Public Health. The New York Times. Retrieved from

http://www.nytimes.com

Savage, R. D. (Ed.). (1966). Readings in Clinical Psychology. London: Pergamon Press.

Squire, Peverill. (1988). Why the 1936 Literary Digest Poll Failed. Oxford  Journal, 52(1), 125-133.

OpenStax College, Psychology. OpenStax College. 8 December 2014. Retrieved from:

<http://cnx.org/content/col11629/latest/>.