Homework 6: Chi-square problems

Homework objectives

  • Differentiate and chose between goodness of fit and contingency table chi-square analysis
  • Practice use of R to conduct statistical tests.
  • Practice reading and extracting R output from statistical commands.
  • Evaluate error rates (Type I, Type II) and critical value, p-value.

Homework 6 expectations

Read through the entire homework before starting to answer a question. You are expected to have read the chapter and to have completed preceding homework. Answers are provided to odd numbered problems — turn in your work for even numbered problems.

How to work this homework

You may work together, but each of your must turn in your own report. Don’t “plagiarize” from each other. Do include in your report who you worked with.

What to turn in: A pdf file containing your R code, statistical results, and your answer to the questions. Use of RMarkdown recommended; however copy/paste into a word document is also acceptable.

Submit your work to CANVAS. Obey proper file naming formats.

Resources for this homework

Chapter 9. Mike’s Biostatistics Book

Mike’s Workbook for Biostatistics: A quick look at R and R Commander, Part01 – Part10 and previous homework pages presented in this workbook.

Additional R commands and or code provided below.


Answers to selected problems


Questions

1. Testing for homogeneity of Mendel’s data on seed shape in five F2 plants. (Hint: first get the test for all 5 plants combined, then try separately for each plant. Try the 3:1 ratio for the null hypothesis.)

Plant Observed
Round seeds
Expected Round seeds Observed Wrinkled seeds Expected Wrinkled seeds χ2 p-value
1 45 12
2 27 8
3 24 7
4 19 10
5 32 11
Total 147 48

a. Is this a “goodness of fit” or a “contingency table type” of problem?

b. Should you apply the Yate’s correction?

c. Write out the null and alternate hypotheses, then test the null hypotheses.

d. Complete the table.

e. How many experimental units in Mendel’s experiments on seed shape?

    1. One
    2. Five
    3. Forty-eight
    4. One hundred forty-seven
    5. One hundred ninety-five

f. At what level is there replication in Mendel’s experiment

    1. Plants
    2. Seeds
    3. Traits
    4. None of the above

2. One hundred thirteen (113) F2 tomato seeds were planted in hydroponics setup, grown for two weeks, then scored for leaf color. F2 of tomato progeny from F1 of cross between YY and yy plants, y is lethal in homozygous recessive, plants don’t live much past two weeks. In contrast to complete dominance, the F1 leaf morph is a blend of the two parents, thus exhibiting incomplete dominance. Leaf color of YY is green, the heterozygote Yy are green-yellow, and the yy homozygote are yellow. Fifteen seeds failed to germinate.

Phenotypes Observed counts Expected
Green 30
Green-Yellow 49
Yellow 19

a. Is this a “goodness of fit” or a “contingency table type” of problem?

b. Write out the null and alternate hypotheses, then test the null hypothesis.

c. What if all 15 seeds that failed to germinate were of the yy genotype? Repeat your analysis and compare the results.

3. A dihybrid cross between tall, potato-leaf tomatoes and dwarf, cut-leaf tomatoes. Assume tall dominant over dwarf, cut leaf dominant over potato-leaf. Write out the null and alternate hypotheses, then test the null hypothesis.

Phenotypes Observed Counts Expected
Tall, cut-leaf 926
Tall, potato-leaf 288
Dwarf, cut-leaf 293
Dwarf, potato-leaf 104

data cited by Sokal and Rohlf 1995, Biometry, 3rd ed.

a. Is this a “goodness of fit” or a “contingency table type” of problem?

b. Write out the null and alternate hypotheses, then test the null hypothesis.

4. An early study on the effectiveness of a potential treatment of AIDS (progression defined as a substantial decrease of CD4+ cells). Write out the null and alternate hypotheses, then test the null hypothesis.

AZT Treatment
Disease Progressed No Progression
AZT
76
399
Placebo
129
332
NEJM 329:297-303, 1993

a. Is this a “goodness of fit” or a “contingency table type” of problem?

b. Should you apply the Yate’s correction?

c. Write out the null and alternate hypotheses, then test the null hypothesis.

5. The association between early diabetic nephropathy on mortality and type 2 diabetes in a sample of men 50-75 years (diabetes diagnosed by age 45). At the start of the study, each subject was characterized as having normal or abnormally low levels of albumin excretion. The subjects were followed for 10 years. Write out the null and alternate hypotheses, then test the null hypothesis.

Albumin excretion group
low normal
Died
55
59
Survived
73
17
NEJM 310:356-360, 1984

a. Is this a “goodness of fit” or a “contingency table type” of problem?

b. Write out the null and alternate hypotheses, then test the null hypothesis.

6. Vienna Maternity Hospital in Germany had two clinics. From 1840 through 1846, the maternal mortality rate in the first clinic was 98 per 1000 births, while the rate in the second clinic – the midwives clinic – was only 36 per 1000 births. Almost all the maternal deaths were due to puerperal fever. (You may recognize this story — it’s about the hospital that Ignaz Phillip Semmelweis worked; he’s famous for introducing importance of hand washing by health-care workers.)

a. Is this a “goodness of fit” or a “contingency table type” of problem?

b. Write out the null and alternate hypotheses

c. Test the null hypothesis; report the results.

d. What can we conclude about the statistical evidence for/against the nll hypothesis

7. Bumpus reported differences in body size that correlated with survival (Bumpus 1899), and this report is often taken as an example of Natural Selection (cf. Johnston et al 1972). The study was discussed in Question 5, Chapter 5 of Mike’s Biostatistics Book, and again in Chapter 5.6 of Mike’s Biostatistics Book.

Table 1. Bumpus data set, summarized by sex of birds.

House
sparrows
Lived Died
Female 21 28
Male 51 36

a. Is this a “goodness of fit” or a “contingency table type” of problem?

b. Write out the null and alternate hypotheses

c. Test the null hypothesis; report the results.

d. What can we conclude about the statistical evidence for/against the nll hypothesis

R or Rcmdr commands

Chi-square “goodness of fit” (gof) test

chisq.test (c(O1, O2, ... On), correct = FALSE, p =(c(E1, E2, ... En)))

where O1, O2, ... On refer to counts of first group observations, second group observations, and so on up to the nth group. E1, E2, ... En correspond to the expected counts for group 1, group 2, and so on up to the nth group.

Example

I’ll do the first plant from problem 1. You could plug in the numbers, one pair at a time, but here’s a simple way to take advantage of R’s indexing system. I create an object and store the records for each of the five plants. I then call the element by number to retrieve the counts for plant 1 by adding [1] after the name of the object.

round <- c(45, 27, 24, 19, 32)
wrinkled <- c(12, 8, 7, 10, 11)
chisq.test (c(round[1], wrinkled[1]), p =(c(0.75, 0.25)))

and the results were

Chi-squared test for given probabilities

data: c(round[1], wrinkled[1])
X-squared = 0.47368, df = 1, p-value = 0.4913

 

The 2X2 contingency table analyses

Assuming you have already summarized the data, you can enter the data directly in the Rcmdr contingency table form

Doll and Hill (1950) Example

Smokers Non-smokers
Case controls, no lung cancer 622 27
Lung cancer 647 2

Statistics > Contingency tables > Enter and analyze two-way table… (Fig. 1)

Screenshot, Rcmdr enter contingency tabl

Figure 1. Screenshot, select enter and analyze two-way table Rcmdr menu

Enter your numbers (Fig. 2)

Screenshot Rcmdr 2x2 menu

Figure 2. Screenshot with 2×2 data entered

Next, select Statistics tab and select options. Default Hypothesis test is the chi-square option (Fig. 3).

Screenshot Rcmdr 2x2 menu, choose statistics

Figure 3. Screenshot Rcmdr 2×2 statistics options.

After completing the 2×2 entry, click OK button. Results were

Pearson's Chi-squared test

data: .Table
X-squared = 22.044, df = 1, p-value = 0.000002664

Note: Rcmdr reports the commands. For our report, we want the answer of the hypothesis test, so the edited version of the complete output is shown above.

2X2 Contingency table: Use RcmdrPlugin.EBM as alternative

If you have not already done so (Chapter 7 we introduced you to this plugin), download and install RcmdrPlugin.EBM (Leucuta et al 2014).

install.packages("RcmdrPlugin.EBM")

Then, from Rcmdr, select Tools > Load Rcmdr plugin(s)… and select RcmdrPlugin.EBM from the list. Close and restart Rcmdr and locate the EBM menu. Select Enter two-way table… and proceed as before. The EBM has additional options. Pay attention to how you set up the tables. For data consistent with Prognosis option, the table should be set up as

Disease
+
Disease
Exposure +
Exposure –

 

References

LEUCUȚA, D. C., CĂLINICI, T., Drugan, T., Istrate, D., & ACHIMAȘ, A. (2014). Graphical User Interface Extension in R Commander for Evidence Based Medicine Indicators. Applied Medical Informatics., 35(3), 11-16.

Loudon, I. (2013). Ignaz Phillip Semmelweis’ studies of death in childbirth. Journal of the Royal Society of Medicine, 106(11), 461-463.

Sokal, R.R. and Rohlf, F.J. (1995) Biometry: The Principles and Practice of Statistics in Biological Research. 3rd Edition, W.H. Freeman and Co., New York.