Recall from the previous lesson that all generalizations can be restated the following standard form:
(P1) S is a sample of Xs.
(P2) Proportion 1 of Xs in S are Y.
(C) Proportion 2 of Xs are Y.
In the last lesson, we learned how to evaluate the first premise of a generalization. The first premise can fail the acceptability requirement if the sample isn’t representative of the target population. This can happen two ways: (1) The sample can be too small–which we call ‘hasty generalization’, and (2) the sample can fail to capture the relevant diversity in the target population–called a ‘biased sample.’
In this lesson we’ll learn a few more ways Premise 1 can fail to be acceptable and we’ll learn to evaluate Premise 2. Very broadly we’ll be looking at measurement errors. To learn about measurement errors we’re going to introduce a special kind of generalization called a poll. Polls are just generalizations about people’s attitudes, beliefs, behaviors, and values. You’ve probably encountered lots of them in the media, particularly around election years.
Since polls are generalizations about people, the same structure and rules we learned for generalizations apply. Let’s review:
(a) The sample: Is the sample representative of the target population? And how big is the sample?
(b) The (target) population: What is the group I’m trying to make the generalization about?
(c) The property in question: What is that belief, attitude, behavior, or value I’m trying to attribute to the population?
(P1) S is a representative sample of Xs.(P2) Proportion 1 of Xs in S have property Y (have attitude/belief Y).(C) Proportion 2 of Xs have the property Y (have attitude/belief Y).
Let’s consider an example:
Suppose I want to know what proportion of BGSU students support Trump for president. I set up a booth in the business administration building and randomly ask every second student who walks by. By the end of the day I’ve asked around 600 students. About 60% of my sample said they’re voting for Trump. I conclude that about 60% of BGSU students are voting for Trump.
If I formalize the argument, it looks like this:
P1. The students at the business administration building are a representative sample of the students at BGSU (with respect to political views).
P2. 60% of the sample say they’re voting for Trump.
C. Therefore, 60% of BGSU students are voting for Trump.
As you may have noticed, Premise 1 is going to fail the acceptability criterion. Let’s take a closer look at why…
Premise 1 Measurement Errors: Selection Bias
Premise 1 will fail to be acceptable because the sample (students in the business administration building) doesn’t capture the proportional diversity of political views in the student population. The students in that building are disproportionately business majors and business majors disproportionately vote Republican. Had you randomly surveyed students in the Fine Arts department, you probably would have ended up with different results.
What we have here is a specific kind of measurement error called selection bias. Selection bias occurs when your method of selecting your sample results in a biased sample. Because of where you set up your survey stand, business students’ views were over-represented in your sample. As such, it will not strongly support any conclusions about the entire students populations’ political views.
To recap, selection bias leads to a biased sample which in turn weakens the acceptability of Premise 1.
It should be noted that selection bias leads to one more thing: We aren’t measuring what we think we’re measuring. This may be painfully obvious but it’s worth stating: If I conduct an on-campus poll of political attitudes and I only have a booth in front the business building, I’m not measuring BGSU students’ political attitudes, I’m measuring the political attitudes of a subgroup of BGSU students.
While my data isn’t particularly relevant to conclusions about the general BGSU student population, I can still make legitimate conclusions about the political attitudes of BGSU business students. In other words, I can keep my data but must change my conclusion (i.e., if I change my target group).
Evaluating Premise 2:
Premise 2 of any generalization states that
Proportion n of my sample of Xs have property Y.
There are several ways Premise 2 can fail to be acceptable. In the broadest terms, I can either mismeasure the proportion n or I can mismeasure property Y. I should add that evaluating Premise 2 is a much more difficult skill than evaluating Premise 1 since it often requires background knowledge of the topic the generalization is about. All that means is that we need to start with baby steps!
Here are two of the most common causes of measurement errors:
1. Measurement problem due to indirect measurement: Sometimes the way data is collected affects whether I’m actually measuring the property I think I’m measuring (i.e., the property in question). Suppose a marketing research company asked me to figure out how many undergrads live in the dorms. For whatever reason, the university prohibits researchers from interacting directly with students. I’m not allowed to simply conduct interviews so I’ll have to collect my data indirectly by observing behavior. I collected my data in Premise 2 by tagging students’ ears with a tracking device and tracking their movements from 9am-5pm.
My data shows that 50% of them went to on-campus residences in this time window. From this I conclude that 50% actually live there. It turns out that 15% of them were just visiting their friends but didn’t in fact live there. The way I indirectly measured the trait I’m interested in (i.e., where students live) didn’t exactly track the actual trait. As we will see with polling, measurement errors happen A LOT. It also occurs with causal generalizations which we’ll study in detail next week.
Let me briefly stop to emphasize the difference between the above measurement error and selection bias. Selection bias is an error in how I collect my sample which in turn causes me to have a biased sample (i.e., a problem for Premise 1). A measurement error in Premise 2 has to do with mismeasuring the property in question. I can have a perfectly representative sample but still measure the wrong property.
2. False attribution: This is another type of measurement error that affects the property in question, and hence Premise 2. Sometimes two properties are closely correlated causing a researcher to confuse one for the other. For example, in education research it’s fairly well-established that children from wealthy homes do better in school than students who come from families that are poor. So, I might conclude that wealthy students are better students than poor students because they’re from wealthy families.
It turns out that wealthy parents tend to have higher levels of education themselves and also typically have more time to help their children. So, it’s not that wealthy students are better than poor students, it’s that students with parents that have the education and time to help their children do better. It just so happens the wealth, education, and time are closely correlated. The correct conclusion is that children of parents who have a high level of education and time to help do better than students who don’t have either — not that having wealth makes a child a better student.
Another example of false attribution occurred a few years ago with a study on happiness. An earlier study had concluded that religious people were happier than non-religious people. It turns out they weren’t measuring religiosity but rather being part of a close-knit community. The correct conclusion is that people who are part of close-knit communities are happier. It just so happens that church-members/religious groups are a particularly common kind of close-knit community.
Again, false attribution has to do with errors in measuring the property in question (i.e., Premise 2), not the sample (i.e., Premise 1).
Additional Factors Leading to Measurement Errors in Polling for P1 or P2:
Some of these factors affect the proportion (n) we measure for the property in question. Some cause mismeasurments in the property in question (i.e., we aren’t measuring the property that we think we’re measuring). Some affect representativeness. And some will affect some combination of these.
1. The medium through which questions are asked can bias results. People are more likely to lie the more impersonal the medium (think: internet surveys, phone interviews) than they are in face-to-face surveys. Note that this effect can be reversed in some contexts: E.g., if a teacher asks a student–face to face–what they think of the course vs the anonymous survey handed out at the end of the semester.
2. Vagueness: If the terms aren’t properly operationalized, the target population can interpret the terms differently than the researchers intend.
E.g., Do you drink frequently, rarely, or occasionally? Different people will interpret these adjectives differently leading to measurement errors in the proportion (n). It’s important that terms are operationalized to avoid these problems (e.g., Instead of ‘frequently’ use ‘1-3 times per week’).
3. Time: The time at which a poll takes place can have a tremendous impact on results. E.g., Asking Americans about whether they’d be willing to give up some civil liberties for greater security immediately after 9-11 or a (Muslim) terrorist event vs after the event has faded from the news.
If a poll is intended to measure people’s baseline attitudes on an issue but the survey was conducted after a major event related to that issue, the results won’t measure baseline attitudes. Again this will cause a mismeasurement of proportion (n).
Tip: Look at the date the poll was conducted vs the date it was published and consider if any major relevant events could have influenced the number.
4. Place: For a variety of reasons, location can cause measurement problems. Because similar people tend to cluster in the same areas, a sample can easily be biased if not selected from multiple locations (i.e., affects P1). Also, when people are asked questions while in groups or among friends, they’re more likely to want to conform to what they think are the group’s views rather than express their own. This will lead to measurement problems in P2.
5. Second-hand reporting: Newspapers and media outlets want eyeballs, so they might over-emphasize certain aspects of a poll or interpret the results in a way that sensationalizes them. In short, we should approach polls that are reported second-hand with a grain of salt–and especially if they are from a source that favors a particular view point (i.e., political media). In fact, you can pretty much be certain that the headlines announcing poll results on a political website are misrepresenting the poll. Assume that headlines that contain polling results misrepresent the actual poll. (See: Evaluating the Inference from P2 in the final section below).
6. People are dumb/don’t want to look uninformed. People give you an answer even if they have no idea what you’re asking about. This means you aren’t measuring what you think you’re measuring. You’re mostly measuring people’s willingness to give an opinion, to be on TV, to not seem like they’re an idiot–not their informed position on some issue.
7. Phrasing: How you phrase a question can have a large impact on people’s responses. In political polls, interviews can use loaded questions and set the tone in order to get the kinds of answers they want.
E.g., When Republicans are polled regarding approval/disapproval of Obamacare, the numbers are very low. When no name for the policy is given but its features are merely listed, the approval numbers are similar to those of Democrats.
E.g., Tone: “Given that Monsanto also produced agent orange, a highly toxic chemical, do you think the food they produce is safe to eat?”
E.g., (Thank you Anna Irwin for the example)
Tip: Track down the original poll being quoted and read the “Interview” Section to find out what the actual questions were. If they were worded in a way to favor one result this implies possible problems with proportion n or the property in question or both.
8. Method of collecting data can lead to self-selection: A self-selected group leads to a non-representative sample. E.g., If I’m offering $10 to fill out a survey, students, unemployed, and low-wage workers will be overrepresented. Because I have a biased sample I’m no longer measuring what I think I’m measuring (attitudes in the general population). Self-selection problems lead to problems with P1 and sometimes P2 as well.
9. Bonus measurement problem: Attrition. Often with human trials, participants will drop out over the course of the study. When you read a study it will say they had n number of participants. It might sound like decent study size. The problem is that people drop out. This not only affects the size of the sample but it might also affect representativeness, and consequentially whether we can trust the result.
For example, in weight loss studies, a lot of people often drop out. This skews the results because it’s not random who drops out. People who lost weight usually stay in while those for whom it didn’t work, drop out. This makes the effect size seem even larger than it it really is. Researchers think they’re measuring the effects of the particular treatment (generally) but in fact they’re likely only measuring the effects on a certain physiological or psychological profile.
For an example of the problem of attrition, watch this discussion of trying to compare different weight loss diets from 8:20.
Evaluating the Inference from Premise 2 to the Conclusion:
There are two main ways for the inference from Premise 2 to the conclusion can be weak.
1. The proportion in the conclusion is greater than the proportion in the sample:
If 50% of the students in my sample live on campus, I can’t infer that 60% of all BGSU students live on campus. In other words, the proportion in my conclusion can’t be higher than it is in my sample (without additional justification). This may seem like a fairly obvious point and it doesn’t usually happen within scientific studies but it almost always happens when scientific studies get reported by the media.
Example: A study might say 51% of people believe Y is a good politician. Media supportive of this politician might report, “The majority of Americans approve of Y.” In our mind we might think most people (i.e., well above 51%) support Y.
Rule of thumb: When you read of a scientific study or political survey that’s reported online, assume the results of study has been misinterpreted and exaggerated.
2. Margin of Error Ignored in the Conclusion:
Margin of error measures the degree to which the measurements are dependable. It tells us that the stated numbers in the survey represent a range of values rather than a precise number. Let’s illustrate with an example:
Suppose a survey says 46% of students think Ami should be burned at the stake while 50% say Ami should be hailed as the next messiah. The margin of error is +/-5%. Is Ami’s new cult of critical thinking guaranteed a successful start? Or will Ami have to wait a few more years for world domination (when he arises from the ashes like a phoenix)?
If we ignore margin of error it looks like Ami should undoubtedly be hailed as the next messiah. However, the margin of error tells us that our proportion n could be plus or minus 5%.
This means the number of students who think Ami should be burned at the stake should be as high as 51% (46+5) or as low as 41% (46-5). The number of students who think Ami should be hailed as the next messiah could be as high as 55% (50+5) and as low as 45% (50-5). If we take the highest possible number for the ‘burn him’ crowd (51%) and the lowest number of the messiah crowd (45%) we see that the real value could actually be such that the ‘burn him at the stake’ crowd is greater.
Case Study: The 2020 Election Polls
A. Suggest how the selection method might lead to (a) a biased sample (b) a measurement error.
Question: I want to survey BGSU student attitudes towards various political candidates. I set up a booth in the philosophy department and ask every second person who walks by to fill out a survey.
Answer: Your method of selecting your sample–booth in the philosophy department– will lead to selection bias. People in philosophy departments tend to vote democrat at higher rates than the general population and people near and in the philosophy departments are likely to be philosophy majors and grad students. Because Democrat-leaning people will be over-represented in the sample, you’ll have a biased sample. Also, you’ll probably have a measurement error since you’re more likely measuring philosophers’s political preferences, not those of the general student body.
1. You want to conduct a study of infidelity rates at BGSU. You set up a booth in the cafeteria and ask every second person if they’ve cheated on a boyfriend/girlfriend in the last 12 months.
2. City Hall wants to know public attitudes towards building a new seniors’ center. They call every third name from an alphabetical list of landlines in BG.
3. Read the Sample section of the following study. Explain how selection bias may have caused a biases sample and a measurement error (there are several).
4. Here are the results from some internet polls regarding beliefs about who won the first presidential debate. Explain how selection bias may have caused a biases sample and a measurement error (there are several).
5. I want to conduct a weight loss study on a new diet and exercise plan. I put an ad in the local paper and on a community service messaging board for 300 hundred participants. Over 50% lose and average of 10% of their body weight within the first month. Success! (There are many many problems with this one. See how many you can find).
6. During the coronavirus pandemic in late April 2020, two California doctors, Drs. Dan Erickson and Artin Massihi, held a press conference claiming that the death rate was much lower than other estimates. They arrived at this conclusion because they estimated the total number of infections was higher than estimated (i.e., the denominator was much greater than in other estimates). Here was how they arrived at their numbers: 340 / 5213 (6.5%) diagnostic tests were positive at their urgent cares. They conclude, scaling up, that 6.5% of the entire Central Valley is therefore positive. For Bakersfield, CA: it would mean about 58,000 people had the virus, far more than the nearly 700 confirmed cases. We should calculate mortality and morbidity (hospitalization, ICU) rates accordingly, they argue. What is their mistake? (Watch the first 2 min of this video for more hints)
7. I’ve developed a new method for studying–using this ONE WEIRD TRICK that THEY don’t want you to know about!!!! In small short trials students have shown a 20% increase in their test scores compared to their previous test in the same class. I want to run a year-long trial to make sure the effects “stick”. To get participants, I advertise for volunteers with flyers all over campus and I do a campus-wide email blast. I get 1000 participants to volunteer. As part of the study they have to log their study hours using my one weird study trick. By the end of the first month about 10% of the students stop submitting data and drop out of the study. By the end of the study, I’ve lost about 60% of my volunteers. But chin up, folks! My method works because of the 40% that remained, 87% of them increased their GPA by at least a full letter grade.
How does the selection method bias my results and lead to measurement errors?
B. In class we looked at eight ways measurement problems can occur in polling. In the examples below, identify the source(s) of the measurement problem(s) (there may be more than one source). If there’s selection bias, identify it.
1. Three days following (another) mass shooting, a poll is conducted to survey people’s attitudes toward gun-control regulations. The poll finds a large majority support regulations.
2. The following study (summary) was reported in some media with the headline: Over 40% of Doctors Willing to Assist with the Death Penalty.
(a) Read the Section on How Was the Study Done and What did the Researchers Find. Explain how the headline and the study results differ.
(b) With the same study, it was discovered that only 3% of participants knew that 8 of the 10 practices are against AMA guidelines. How might this have affected the results of the study?
(c) In the What did the Researchers Find section, look at the response rate. How might selection bias be leading to a biased sample and measurement problem?
3. In Canada we have a popular show called This Hour has 22 Minutes. The most popular segment is called Talking to Americans. What the video and explain why a poll on American attitudes towards the issues they are asked about might not show the true values (i.e., why there might be measurement errors). FYI Saskatchewan is North of Montana and North Dakota.
4. You are doing research on student mental health and stress levels. You want to get an idea of how stressed students feel during the year. In order to collect data you set up several booths around campus for a full week from Dec. 12-16. You randomly ask every 2nd person to fill out a survey indicating their stress level from 1-7. You conclude that being a student is extremely stressful. Indicate the various measurement problems you might have and how they were caused.
5. Many studies on police violence rely on self-reported data by police after the incident. Explain how this may cause a measurement problem and what that measurement problem might be.
C. Explain (Omit this question)
Interesting survey and an argument in favor of sending people to mandatory critical thinking FEMA camps:
Significant minorities of Americans (40%+) believe the government is hiding information about extraterrestrials and global warming. Up to 1/3 of Americans find several other conspiracy theories are convincing including plans for a one world government (33%), Obama’s birth certificate is fake (30%) and the AIDS virus was a government plot.
About a fourth of Americans believe there is something suspicious about the death of Supreme Court justice Antonin Scalia (28%) and slightly less than a fourth think the government is covering up information on the (faked) moon landing (24%).
Perhaps most indicative of the conspiratorial nature of Americans is the tenth conspiracy theory we asked about…one which, to our knowledge, we created.
Respondents to the Chapman University Survey of American Fears were asked if “The government is concealing what they know about…the North Dakota crash.” A third of Americans (33%) think the government is concealing information about this invented event.
Were the North Dakota crash added to the ranked list of conspiracies (see above), this invention would rank as number six, just under plans for a one world government.
Bonus: Can you think about what polling/measurement issue might have lead to the high numbers for the North Dakota Crash?