Answers: Measurement day

Answers

Homework 2: Measurement day

BI311 Biostatistics

1. n = 63 observations

2. Three variables

  • Person (categorical, nominal)
  • Darts thrown (continuous, meristic)
  • Distance from center (continuous, ratio)

3. NA = 12

4. data truncated, not simply data not recorded. Truncated because darts that didn’t hit/stick to the target were simply not recorded as trials, rather, many cases where additional darts were thrown and only those that stuck were counted.

5. 63 were reported, we don’t know how many total darts were thrown because the class as a whole didn’t agree how many per student. Not 5 students, 12 participated that day.

6.

Histogram of distance [insert plot]

Describe plot: skewed to right; most observations less than 10 inches, a few as great as 25 inches

Boxplot distance by student [insert plot]

Describe plot: medians vary, from 2 inches to about 7.5 inches. Outlier tosses as great as 20+ inches

Boxplot distance by dart [insert plot]

Describe plot: a trend for third dart tossed greater median distance (5 inches vs 2 and 2.5 inches, approximately).

Note: the trick to making the boxplot with darts as the group is to create a new character variable from the dart number. Otherwise, R recognizes 1,2,3 as numbers, not group labels. The command to make a new variable

Rcmdr: Data > Manage variables in active data set > Convert numeric variable to factors

Second part of the question asked for you to describe “what the graph allows you to tell about the data. For example, the histogram: results stacked up at 5 inches or less, but tails off to the right – a few distances very large. For the boxplot by student, we see individual differences in performance: medians ranged from about 2 inches to more than 7 inches. In contrast, no obvious trend shown between first and last dart thrown.

7. repeat graphs, but used KMggplot2 plugin

8. Darts thrown per student

Mean = 9.692308

Median = 6

Mode = 6 (see question 9 answer)

9. You could extract what you need from Statistics > Table of statistics. I used the following command

mean(darts$Distance, na_rm=TRUE)
[1] 4.988431
median
median(darts$Distance, na.rm=TRUE)
[1] 3.54

Mode is tricky. Because the number of of students was small, it’s probably easier to just look at the data and count by hand to get the mode. However, I provided you with a way to calculate mode in R (remember, the function mode(), doesn’t get you the statistic mode).

Here’s a function based on the work presented in Mike’s Biostatistics Book, Chapter 3.1

getMode <- function(x) {
temp = table(as.vector(x))
names (temp)[temp==max(temp)]
}

Next, I need to get the darts thrown per student (question 8)

myDarts <- aggregate(darts$Dart, list(darts$Student), FUN = sum)

I then print out myDarts and get

Group.1 x
1 aar 6
2 aas 6
3 aat 6
4 aau 6
5 aav 6
6 aaw 18
7 aax 18
8 aay 18
9 aaz 18
10 aba 6
11 abb 6
12 abc 6
13 abd 6

So, that tells me what I want is x from myData

getMode(myDarts$x)

and at last, my value is

10.

range sd cv skewness kurtosis n NA

20.08 4.467423 0.8955566 1.902113 3.585679 51 12

11. given the choices, sd for accuracy, cv for precision

12. Lots of ways to address this, but I’m not looking for a discussion of errors. Rather, bias.

To quote someone on stackexchange (qquito, 2015)

We can talk about the error of a single measurement, but bias is the average of errors of many repeated measurements. Bias is a statistical property of the error of a measuring technique.

For example, bias reporting was evident for darts that were tossed, but did not stick to the target or cabinet. Some groups just threw the dart again, did not count the toss that “failed.” Each group in effect was repeating a simple experiment: tossing darts and recording distance from bullseye.