Homework 7: t-tests and ANOVA
Objectives:
- Explore null and alternative hypothesis concepts.
- Explain relationship between outcomes of a test of hypothesis and interpretation of a diagnostic test result.
- Evaluate error rates (Type I, Type II) and critical value, p-value.
Homework 7 expectations
Read through the entire homework before starting to answer a question. There’s also a BONUS, with three questions for you to consider. Read through the entire homework before starting to answer a question — all questions are intended to help you achieve the learning outcomes for the chapter. You are expected to have read the chapter and to have completed preceding homework. Answers are provided to odd numbered problems — turn in your work for even numbered problems.
How to work this homework
You may work together, but each of your must turn in your own report. Don’t “plagiarize” from each other. Do include in your report who you worked with.
What to turn in: A pdf file containing relevant R code, statistical results — edited to support your answers to the questions, and your answer to the questions (even numbered only). Use of RMarkdown recommended — because it is a simple way to include graphs generated; however copy/paste into a word document is also acceptable.
Notes. By relevant we mean provide just the R code and results from R functions necessary to support your answers to the questions. For example, do not include
- the entire data set when head(dataset) will do
- screenshots of R output!! R output is text — copy/paste
- all statistical output from an R function.
See Part09: Making a report for an example homework file.
Submit your work to CANVAS. Obey proper file naming formats.
Resources for this homework
Chapter 10. Mike’s Biostatistics Book
Chapter 12. Mike’s Biostatistics Book
Mike’s Workbook for Biostatistics: A quick look at R and R Commander, Part01 – Part10 and previous homework pages presented in this workbook.
Additional R commands and or code provided below.
Questions
1. In an experiment, immortalized lung epithelial cells were exposed to dilute copper solutions for 30 minutes then washed with PBS. The comet assay was applied to these cells and for comparison, to cells without copper exposure but otherwise treated the same way (controls). Length of comet tails indicate DNA damage.
- Make a box plot
- Test for assumption of normality
Perform the t-test (not the Welch test), i.e., two-tailed hypothesis with assumption of equal variance.
- Which cell group had the greater mean value, Copper-exposed or Control-exposed cells?
- What are the assumptions necessary for you to use the independent sample t-test?
- What does “two-sided” mean?
- What was the null hypothesis?
- Was this a one-tailed or two-tailed test of the null hypothesis?
- What is the value of the test statistic?
- How many degrees of freedom?
- What is the critical value for this test?
- What is the value of the lower limit of the 95% confidence interval?
- What is the value of the lower limit of the 99% confidence interval?
- True or False. If the null hypothesis is accepted, then zero is a value included in the 95% confidence interval.
- Do you accept the null hypothesis? Explain your selection.
2. Microsoft Excel, LibreOffice Calc, and Google sheets spreadsheet software all include t-test functions and return the p-value. Consider two variables big (100, 110, 120, 100, 110, 210, 200) and small (0,1,1,2,0,1,0). (Note — these two groups are obviously very different, calculating a t-test on their difference is silly, just for this question.) If formatting is set to the default two decimal places for Number cell category, the p-value will return as “0.00.” How should you report the p-value in this case?
3. For the t-test, and in general for reporting of all statistical tests, what three numbers reported in the R output should you minimally report?
4. For the following abstract, please identify
- Reference population? Type of sampling from population?
- Type of study: Observational or experimental study?
- Identify the names of the variables?
- Identify data types for each variable
- What is the main scientific hypothesis that was tested?
- Identify the Treatment (Predictor) variables and the Outcome variables
- Identify levels (groups) of Treatment or Predictor variables
- What was(were) the sampling unit(s)?
- How were subjects (sampling units) assigned to Treatment or Predictor variables?
Abstract01
Determining the costs of sexual ornaments is complicated by the fact that ornaments are often integrated with other, non-sexual traits, making it difficult to dissect the effect of ornaments independent of other aspects of the phenotype. Hybridization can produce reduced phenotypic integration, allowing one to evaluate performance across a broad range of multivariate trait values. Here we assess the relationship between morphology and performance in the swordtails Xiphophorus malinche and X. birchmanni, two naturally-hybridizing fish species that differ extensively in non-sexual as well as sexual traits. We took advantage of novel trait variation in hybrids to determine if sexual ornaments incur a cost in terms of locomotor ability. For both fast-start and endurance swimming, hybrids performed at least as well as the two parental species. The sexually-dimorphic sword did not impair swimming performance per se. Rather, the sword negatively affected performance only when paired with a sub-optimal body shape. Studies seeking to quantify the costs of ornaments should consider that covariance with non-sexual traits may create the spurious appearance of costs.
5. Test score on a school achievement test where students have received differing types of test preparation
(group A = individual instruction with tutor; group B = lecture; group C = computer). Example from Kirby 1993, p. 277.
- Write out the null and alternative hypotheses.
- Test assumption of normality
- Conduct one-way ANOVA using GLM
- Use and interpret appropriate posthoc test.
6. O’hia collected from three elevations, grown common garden. Height of plant recorded after several weeks of growth.
- Write out the null and alternative hypotheses.
- Test assumption of normality
- Conduct one-way ANOVA using GLM
-
Use and interpret appropriate posthoc test.
BONUS
Conduct a basic data analysis by one-way ANOVA. This example has just two groups, so also compare results from independent t-test.
The data set, mass (g) of ten bags Mini chocolate M&M, ten bags Mini peanut M&M, and nine bags Mini Skittles.
0. The data set
myBags <- read.table(header=TRUE, sep=",", text=" bag, Mass M_Mc, 14.50 M_Mc, 14.65 M_Mc, 14.51 M_Mc, 13.35 M_Mc, 13.28 M_Mc, 14.59 M_Mc, 12.89 M_Mc, 14.09 M_Mc, 14.40 M_Mc, 12.81 M_Mp, 17.6 M_Mp, 18.43 M_Mp, 18.87 M_Mp, 18.99 M_Mp, 20.74 M_Mp, 18.32 M_Mp, 20.27 M_Mp, 17.74 M_Mp, 16.98 M_Mp, 19.97 M_S, 15.53 M_S, 16.58 M_S, 14.57 M_S, 16.38 M_S, 16.3 M_S, 16.03 M_S, 15.27 M_S, 17.35 M_S, 15.08 ") head(myBags) attach(myBags)
BONUS Question 1. What to report — Include response/answers to items 1, 2, and 3
Item 1. Explore
Make a
- Histogram (combined groups)
hist(Mass)
- Box plot (mass by M&M category)
boxplot(Mass~bag)
Item 2. Create and test a hypothesis about mass
Write out your hypothesis (null and alternate) in two ways: (1) English sentence and (2) using symbols
What is the hypothesis of a test of normality? Report whether accept or reject.
Item 3. Conduct the test using one-way ANOVA
in Rcmdr. Statistics > Means > One-way ANOVA
Here’s the R code
out2 <- aov(Mass~bag, data=myBags) summary(out2)
Alternatively, run as a general linear model
out3 <- lm(Mass~bag, data=myBags) #We also want the ANOVA table anova(out3)
# Compare ANOVA on two groups vs same test of groups with independent sample t-test.
# Note — Our data set has three groups, so you have to pick two groups, of course. T-test doesn’t work on more than two groups.
# A couple of options — subset and create a new data frame containing only the two groups you want.
# Alternatively, use logical indexing. For one example,
t.test(Mass ~ bag,
data = myBags[myBags$bag %in% c("M_Mc", "M_Mp"), ])
You should try additional tests. For three groups there are three possible pairwise comparisons: M_Mc vs M_Mp is just one.
Interpret results and report whether accept or reject null hypothesis.
Additional R or Rcmdr commands
myData <- read.table(header=TRUE, sep="\t", text = " insert your data table here ")
head(myData)
Test normality:
Rcmdr → Statistics → Summaries → Test for normality
(General) linear model:
Rcmdr → Statistics → Fit models → Linear model
Data
Comet data set (Dohm unpublished)
| Treatment | CometTail |
| Control | 17.856139 |
| Control | 16.52125 |
| Control | 14.925449 |
| Control | 14.029174 |
| Control | 13.332945 |
| Control | 8.811185 |
| Control | 14.701654 |
| Control | 9.261025 |
| Control | 21.779311 |
| Control | 6.180284 |
| Control | 9.201752 |
| Control | 5.54472 |
| Control | 6.717885 |
| Control | 2.625092 |
| Control | 7.191583 |
| Control | 5.392866 |
| Control | 11.284813 |
| Control | 15.441254 |
| Control | 17.857176 |
| Control | 4.250956 |
| Copper | 53.214287 |
| Copper | 38.92857 |
| Copper | 18.928572 |
| Copper | 30 |
| Copper | 28.928572 |
| Copper | 15.357142 |
| Copper | 17.857143 |
| Copper | 17.5 |
| Copper | 21.071428 |
| Copper | 29.285715 |
| Copper | 28.214285 |
| Copper | 16.785715 |
| Copper | 21.071428 |
| Copper | 37.5 |
| Copper | 38.214287 |
| Copper | 17.857143 |
| Copper | 29.642857 |
| Copper | 11.071428 |
| Copper | 35 |
| Copper | 49.285713 |
Kirby et al data set
| Student | Instruction.method | test.score |
| 1 | A | 94.4 |
| 2 | A | 75.7 |
| 3 | A | 88.1 |
| 4 | A | 108.1 |
| 5 | A | 94.8 |
| 6 | A | 130.6 |
| 7 | A | 121.1 |
| 8 | A | 82.9 |
| 9 | A | 112 |
| 10 | A | 85.2 |
| 11 | A | 98.7 |
| 12 | A | 50.1 |
| 13 | A | 86.1 |
| 14 | A | 99.8 |
| 15 | A | 121.8 |
| 16 | B | 95.3 |
| 17 | B | 117.2 |
| 18 | B | 97.9 |
| 19 | B | 82.7 |
| 20 | B | 105.3 |
| 21 | B | 85.2 |
| 22 | B | 86.7 |
| 23 | B | 104.8 |
| 24 | B | 67.9 |
| 25 | B | 106.1 |
| 26 | C | 98.1 |
| 27 | C | 101.2 |
| 28 | C | 120.1 |
| 29 | C | 77.5 |
| 30 | C | 124.7 |
| 31 | C | 136.1 |
| 32 | C | 132.6 |
| 33 | C | 130.5 |
| 34 | C | 130 |
| 35 | C | 138.8 |
| 36 | C | 105.8 |
| 37 | C | 70.4 |
Ohia
| Site | Height |
| M-1 | 12.5567 |
| M-1 | 13.2019 |
| M-1 | 8.0699 |
| M-1 | 6.0952 |
| M-1 | 11.3879 |
| M-1 | 12.2242 |
| M-1 | 16.0147 |
| M-1 | 19.7403 |
| M-1 | 36.4824 |
| M-1 | 13.1233 |
| M-1 | 21.7725 |
| M-1 | 14.2013 |
| M-1 | 37.7629 |
| M-1 | 2.8652 |
| M-1 | 0.6456 |
| M-1 | 29.623 |
| M-1 | 10.5812 |
| M-1 | 18.3046 |
| M-1 | 19.0528 |
| M-1 | 2.5693 |
| M-2 | 45.0162 |
| M-2 | 40.8404 |
| M-2 | 27.1032 |
| M-2 | 29.8036 |
| M-2 | 63.8316 |
| M-2 | 42.107 |
| M-2 | 30.0322 |
| M-2 | 34.0516 |
| M-2 | 15.7664 |
| M-2 | 35.1262 |
| M-2 | 43.6988 |
| M-2 | 26.7585 |
| M-2 | 36.7895 |
| M-2 | 30.9458 |
| M-2 | 26.8465 |
| M-2 | 40.3883 |
| M-2 | 30.6555 |
| M-2 | 19.9736 |
| M-2 | 27.676 |
| M-2 | 44.084 |
| M-3 | 15.2646 |
| M-3 | 19.6745 |
| M-3 | 23.275 |
| M-3 | 16.1161 |
| M-3 | 16.8393 |
| M-3 | 23.107 |
| M-3 | 21.5322 |
| M-3 | 13.4191 |
| M-3 | 14.7273 |
| M-3 | 18.4245 |
/MD