Generalizations and Associated Problems: Part 1

Introduction:

Suppose you want to figure out what percentage of US college students commute to school. One way would be to go to each university in the US and ask every individual university student if they commute. This will give you an accurate answer but by the time you get your final answer, 5 years will have passed, you will have spent a lot of money, and your answer may be obsolete. There must be a better way…but how?

As you may have figured out, the best way to get our number is to ask a sample of US students if they commute. Then we will generalize from the sample to a conclusion about the total US student population.

We can think of a generalization as a two premise inductive argument. Supposing we use my critical thinking students as my sample, the argument will look like this:

Example:

(P1). The students in this class are a representative sample of BGSU undergraduates.
(P2). 30% of the students in this class commute to campus.
(C). Therefore, 30% of BGSU undergrads commute to campus.

All generalizations can be put into a standard form which looks like this:

(P1). S is a representative sample of population/group P.
(P2). Proportion 1 of S are Y (Y=the trait/attribute we’re interested in).
(C). Proportion 2 of P are Y.

Before we learn to evaluate generalizations we need to learn some important terminology.

Key Terms

Suppose I want to know the proportion of Ohio students who commute to school. Before I go about conducting my survey, I’m going to need to specify my sampling frame.

A sampling frame precisely defines both the total population and the trait you want to study. The total population is called the target group/population. In my example, I want to know how many students in Ohio commute to school. Before I even begin to study my sample I need to specify my terms. In fancy talk, when you define your target population and property we say you operationalize your terms. It is essential to operationalize your terms in order to eliminate vagueness.

First, I’d need to operationalize “student.” Do I include elementary, high school, and college? Do graduate students count? Until I specify the term, “student” is left vague. Once I specify, I have identified my target population. I also need to operationalize the property in question (sometimes called relevant property). This is the attribute or quality that the target population has. In our example, the property in question is “commutes to school.” Does this include students driven by parents? Does it include students who ride bikes or skateboards? What about students who live walking distance or those that take public transportation. Do I count them as commuters? Once I specify, I have operationalized the trait that I want to study in the population. Depending on what I want to learn, I will have to make choices about how I define my sampling frame, otherwise vagueness will diminish the usefulness of my study.

The sample is the subset of individuals from the total population you want to study. A sample should be representative of the population you wish to study. That is, you want to ensure that the relevant traits or variables in the target population are also represented proportionally in the sample. For example, suppose I want to study college students in Ohio. For my sample I use BGSU students to infer what proportion of Ohio students commute by car to school. Suppose it’s 30% of BGSU students. Can I now reasonably infer that 30% of Ohio college students commute to school?

Probably not. Many BGSU students live on campus. That might not be true of other Ohio universities. Also, Bowling Green is a small town and most students who live off campus live within walking distance. These transportation and living pattern probably aren’t equally represented in university students who live and go to school in Cleveland or Columbus. If I base my generalization of Ohio college students only on BGSU students, I will have a biased sample. A biased sample is one that doesn’t have the same distribution of properties as the target population. A biased sample isn’t representative of the target population and therefore weakens the strength of my generalization.

In order to avoid having a biased sample, I use a random sample. A random sample uses a selection method to avoid bias: it ensures that every member of the target population has an equal chance of being included in the sample. BGSU students are a biased sample of Ohio students with respect to commuting because it won’t proportionally represent the different transportation and living patterns of Ohio students attending universities in large cities. To avoid having a biased sample, we want to use stratified random sampling. This involves identification of relevant population clusters and making sure that they are represented in the sample in the same proportion that they exist in the population. In short, we want the diversity of properties in our target population to be represented in our sample. For example, if I’m studying all Ohio college students I’ll want to stratify across small town colleges and big city colleges as well as perhaps across various social groups and income levels.

The probability that a sample will be biased is called the sample margin of error. To minimize the margin of error, larger samples are preferred to smaller samples.

Evaluating Generalizations

Basic Method: We’re going to
(a) evaluate each premise for acceptability as well as
(b) evaluate each premise for relevance (i.e., the strength of the logical force between the premises and the conclusion).

Let’s use the sample argument from earlier to learn how to evaluate a generalization argument:

(P1). The students in this class are a representative sample of BGSU undergraduates.
(P2). 30% of the students in this class commute to campus.
(C). Therefore, 30% of BGSU undergrads commute to campus.

I. Evaluating Premise 1:

With Premise 1, we’re primarily concerned with acceptability. The acceptability of Premise 1 is determined by two criteria:

A. Sample size: Is the sample large enough to both capture all the relevant diversity in the target group?

Part of our answer will have to do with what trait we’re interested in studying. If we’re interested in what proportion of students live on campus, asking my critical thinking class is too small a sample to capture all the diversity in the BGSU undergraduate population. Critical thinking is a 100 level class and so freshman will be over-represented in the sample (compared to the target population). BGSU has a policy that requires freshmen to live on campus. If I take my sample only from freshman, I won’t get information about other sub-groups with the student population such as seniors who probably are more likely to live off campus. So, even if I select my sample randomly, if it’s too small it might not include all the relevant diversity in a population. The more diverse a target group, the larger you’ll want a sample to be in order to ensure that the sample captures the target group’s diversity. Generally, you don’t need a sample larger than around 1200 — even for national-level polls.

When a sample is too small the generalization that follows commits a fallacy called hasty generalization. A hasty generalization fails the relevance criteria in supporting its conclusion. In the example, my 100 level critical thinking class is too small a sample to capture relevant diversity and so it isn’t relevant to conclusions about the entire BGSU population. The generalization fails because it relies on a hasty generalization for support.

B. Representativeness: Does the sample group have the same relevant characteristics and distribution of relevant characteristics as the target group? Suppose to correct the fact that my sample is too small I randomly select ten intro level classes for my sample. Now my sample is about 1 000 students. No one can accuse me of using too small a sample. Unfortunately my sample still won’t allow me to make a good generalization since it isn’t representative of the relevant diversity in the student population. Again, most students in the sample are freshmen. First year students at BGSU are mostly required to live on-campus. Seniors, for example, are not. Since seniors aren’t represented in our sample in the same proportion as they exist in the target population, we have a biased sample. Hence, we cannot accept premise 1.

When someone makes a generalization from a non-representative sample we say the argument fails because it relies on a biased sample. As such, Premise 1 will be only weakly relevant to the conclusion at best.

A famous case of biased sample: For most of the history of academic psychology and social psychology, experiments are performed on undergrads then generalized to all humans. This is especially true for studies about moral intuitions. For example, social psychologists and economists use something called the Ultimatum Game to test people’s intuitions of fairness and willingness to punish those who don’t abide by fairness norms.

The game works like this. There are two participants in the study. The researcher gives participant A something of value, perhaps $100. Participant A can choose to give any amount to participant B, from 0$ to all of it. Why give any at all? Here’s the catch, if participant B rejects your offer, neither of you get anything. In other words, participant A has an incentive to give an amount to B that isn’t too low, otherwise they risk getting nothing themselves. It turns out that most proposers will off 30% to 50% and people in B’s position will usually reject anything below around 30%. This study has been replicated many times all over the US. So, for decades economists and social psychologists inferred that these numbers represent universal human intuitions about fairness. Can you spot the problem?

It turns out that if you conduct Ultimatum Game experiments in non-Western countries you get completely different results: In some cultures people reject money if you offer too much! And some are happy to get nothing! The mistake social scientists made was that their samples — although large — were taken from only one culture. And within the context of the world population, Western culture is a minority! In fact, the results you get in Western culture are abnormal compared to the rest of the world. Social scientists and economists had built up large interconnected theories about human behavior that relied on biased outlier sample. All those theories about universal human norms are now suspect and must be reevaluated in light of new internationally collected data. Many will be over-turned. To learn about some of the theories that are in trouble and differences between Western and other intuitions, here’s the main article. .

Anecdotes: Anecdotes are personal experiences that people usually generalize from.

Key point: Generalizations that come from anecdotal evidence (personal experience) commit both errors. Can you think of why? They violate both the sample size the representativeness criteria required for good generalizations. Generalizations based on anecdotal evidence are bad arguments and should be rejected or at least be viewed with extreme skepticism.

There are other ways that premise 1 can be unacceptable which we’ll examine in the lesson on polling. For this lesson we’ll focus only on sample size and representativeness since they are the most common and most important.

Practice (In-Class)

Put the arguments into the correct standard form then evaluate Premise 1 for sample size and representativeness. If the sample size is too small, identify it as a hasty generalization. If the sample isn’t representative, explain why.

A
Informal Presentation: 20% of Ami’s students own trucks therefore around 20% of university students own trucks.

Formal Structure
(P1) The students in Ami’s class are representative of all university students.
(P2) 20% of the students in Ami’s class are truck owners.
(C) Therefore around 20% of university students must own trucks.

B
75% of my friends at school have student loans therefore 75% of students at BGSU have student loans.

C
All my friends have happier and more interesting lives than mine. Every time I check my facebook feed, they’re posting about doing something interesting or fun. My life sucks.

D
Conrad Hilton started out dirt poor and became super-rich, therefore anyone can do it.

E
We asked anyone who was motivated to lose weight to try our new magic diet of eating only natural organic pine tree bark. Over 80% of participants lost weight. 80% of people who try our new magic diet will lose weight.

E
Fox just did a call-in telephone poll of over 10 000 people and 80% of them agreed that Obama is doing a terrible job. That shows that around 80% of Americans think Obama’s doing a horrible job.

F
If you’re sick you should use this homeopathic remedy. It worked for me last time I was sick.

H
1/3 of students in two-year programs at Washington State community colleges graduate within 3 years. Therefore, about 30% of people in 2 year programs graduate within 3 years

HOMEWORK

A. Suppose you are asked to conduct a study but only given vague terms. Operationalize the sampling frame to eliminate vagueness in the target population and in the trait we’re interested in. You are free to operationalize any way you like so long as you eliminate vagueness.

Example:

Generalization: Azodicarbonamide (a chemical used in bread) is hazardous to human health.

Target population: Azodicarbonamide in bread products. I’ll restrict my investigation to the average quantity of azodicarbonamide in one serving of store-bought breads.

Trait in question: Being hazardous to human health. I’ll investigate whether it’s causes any long-term health problems such as respiratory disease and cancers if ingested.

Exercises:

1. Students like junk food.

2. Professors drink frequently.

3. Small dogs are aggressive.

B. Put the argument into standard form for generalizations. Evaluate each premise according to criteria we covered in class:

(P1) Check for sample size, representativeness.

(P2) Check for measurement errors. (Probably not relevant for this set of exercises. We’ll look at this in more detail next class).

(C) Is the proportion in (P2) the same as the one in the conclusion.

1. Most of my friends are going to the football game tonight therefore most BGSU students are going to the football game tonight.

2. LeBron won the world title through hard work and perseverance, therefore anyone can will a world basketball title with hard work and perseverance. All you have to do is believe in yourself.

3. I never found calculus to be useful, therefore it isn’t useful.

4. 50% of students living in Founders Hall stay on campus every non-holiday weekend, therefore 50% of BGSU students stay on campus every non-holiday weekend.

5. 20% of residents of Wood County do agricultural work, therefore 20% of Americans do agricultural work.

6. 40% of bees in Nebraska die every winter, therefore 40% of bees in the US die every winter.

7. 90% of people in LA, San Fransisco, Manhattan, San Diego, Austin, and Seattle support gay marriage, therefore 90% of Americans support it.

8. 45% of people in Colorado are going to vote for Clinton and 35% will vote for Trump, therefore Clinton will win the election.

C. Critical Thinking in the Real World

It’s conventional wisdom that 1 in 5 (20%) female students will experience sexual violence before graduating. However, some studies say it’s more (1 in 4 or 25%) while others say it’s fewer (1 in 6 or 16%). A 10% difference between studies is quite a wide margin.

By taking into account the concepts and methods used for establishing a sampling frame and the various problems that can happen with Premise 1 of a generalization, suggest a few ways for how different studies could end up with different rates. Assume there were no measurement errors.

Reasoning for the Digital Age

How to Fool and Be Fooled

Generalizations and Associated Problems: Part 1

Introduction:

Practice (In-Class)

Introduction:

Practice (In-Class)

Share this: