Part 04. Create functions in R
What’s on this page?
What to do
Complete the exercises on this page
- Create simple functions
- Add user input to a function
- A simple coder’s philosophy
How to do it
For these exercises, you may work
- within the Rcmdr script window
- at the command line and R prompt
- from a script document and the R GUI app
- RStudio
Choose one way. The quickest way is to just work within R.
Let’s begin.
Before you start
A reminder, always set your working directory in R first (see Part 02. Getting started with R and Rcmdr), before starting to do your analyses. Makes life a lot easier.
1. Write simple functions to save you time.
In programming, a function is a set of instructions — code — to carry out a specific task. R has many built in functions, like mean()
, sd()
, and more. Packages increase available capacity by adding user-defined functions. We’ll look at extending R with packages more in depth in the next workbook session Part 04. R packages: Making R do more. Put simply, packages contain user-defined functions and this ability, to write functions, is what makes R so powerful. Our purpose this semester in Biostatistics is not to teach you programming — R is a programming language — you should at least be introduced to the promise that writing your own functions may have.
2. Our first function
I’m an NFL fan and like to keep track of passer ratings of the quarterbacks, so I’ll share a simple function to calculate NFL version of the passer rating. The NFL passer rating is on a scale from 0 to 158.3, with higher values reflecting higher passer (quarterback) performance. You can read more about passer rating at Wikipedia.
QBR <- function(ATT, COMP, INT, TD, YDS){ a = ((COMP/ATT)-.3)*5 b = ((YDS/ATT)-3)*.25 c = (TD/ATT)*20 d = 2.375-((INT/ATT)*25) PR = ((a+b+c+d)/6)*100 return(PR) }
The basics of writing a function in R begins with the function()
call. Parameters like ATT (short for pass attempts) are passed to function()
. Code is entered between curly brackets { and }. return()
returns a single object from the function.
Let’s try our function. Enter values for Russell Wilson (Denver Broncos) the 2022 regular season (sources: NFL.com).
ATT <- 483 COMP <- 292 INT <- 11 TD <- 16 YDS <- 3524
Now, call our function (note that instead of writing the variable names you could simply type in the actual numbers — I don’t really advise this because there are so many and it would be easy to make a mistake).
QBR(ATT, COMP, INT, TD, YDS)
And the output is
[1] 84.41598
On your own 1. Copy the function into your R script window. Calculate the passer rating for another quarterback Geno Smith (Seattle Seahawks) given his statistics for the 2022 NFL season: 572 (attempts), 399 (completions), 4282 (yards), 11 (interceptions), and 30 (touchdowns) (sources: NFL.com).
It doesn’t make much sense to report a rating to 7 significant figures (4 decimal). Use the built in function round()
function, or the signif()
function to report the passer rating to four significant digits.
On our own 2. Assign the output for Tom Brady from our QBR function to x
, then compare round(x, n)
and signif(x, n)
, where n
is the number of significant digits. Do you get the same number?
On our own 3. Compare Microsoft Excel’s round function to R’s round()
and signif()
.
Note: In Microsoft Excel, click on any cell and enter “=round(x,n)
“, replacing x and n with the value and number of digits. And of course, don’t type the quotes. The function call is written the same in Google Sheets and LibreOffice Calc.
For more about significant figures, see blog by Neil Gunther at perfdynamics.blogspot.com.
Try another simple function, but get user input.
It’s pretty routine in biology lab — you have a stock solution at 1 Molar concentration, and you need to make a working solution, say, at 200 millimolar (0.2 M). What volume (mL) of the stock solution do you need to make a 200 mL working solution? So, you drag out your calculator and apply the C1V1 = C2V2 equation and solve for V1. And repeat as necessary for all of your working solutions. Any time you have to repeat a calculation, that’s a sign you should write a function!
Let C1 = 1, C2 = 0.2, and V2 = 200 ml. Here’s a simple R function
dilution <- function(){ n1 <- readline(prompt="Enter c1:") c1 <- as.numeric(n1) n2 <- readline(prompt="Enter c2:") c2 <- as.numeric(n2) n3 <- readline(prompt="Enter v2:") v2 <- as.numeric(n3) v1 <- (c2*v2)/c1 return(v1) }
After loading the function, type at the R prompt (the pink text is R requesting information)
dilution() Enter c1: 1 Enter c2: .2 Enter v2: 200
and R returns
[1] 40
Question. Why does R return “[1]” ?
Answer. Even though this function returns only one number, the object is treated as a vector. As we know, elements in vectors are numbered by their location within the vector. So, the first and in this case only element in this vector is 40 at position 1 of the vector.
Note that this is a pretty limited function. A better function would allow the user to choose which of the parameters, C2, V1, or V2, to calculate and to keep track of units. I also haven’t provided any error checking — for example, what happens if V1 returns a value like 0.01 microliters? That value is unrealistic for a volumetric pipettor. All of these scenarios could be accommodated with judicial (i.e., nested) use of if/else statements, demonstrated with this next function, a temperature unit conversion function.
tempConvert <- function() { prompt <- readline(prompt="Enter 0 to convert C to F or 1 to convert F to C: ") deg <- readline(prompt="Enter the temperature: ") deg <- as.numeric(deg) if(prompt==0){ #Convert Celsius to Fahrenheit out <- (deg*1.8)+32} else { #Convert Fahrenheit to Celsius out <- (deg-32)*5/9} return(out) }
After loading the function, type at the R prompt (the pink text is R requesting information)
tempConvert() Enter 0 to convert C to F or 1 to convert F to C: 0 Enter the temperature: 60
and R returns
[1] 140
On your own 4. Create a function to calculate the number of days between any date and today’s date.
On your own 5. Modify the V1 C1 = V2 C2 script so that the user can choose between calculating any of the V1, C1, V2, C2 parameters.
On your own 6. We often need to convert between the different units of measurement, and programs like the tempConvert
function can be helpful. Create a general function to allow a user to convert between Metric and the Imperial/US systems of length measures, e.g., centimeters and inches, meters and yards, etc. (e.g., see Wikipedia Unit of length).
3. About programming style
Style is about rules or guidelines for writing computer code. In many ways our goal is to just get the code to work. However, we should also be aware that others will need to read and interpret the code we write. That said, you should at least make useful comments and be consistent. For more, see the chapter on R style in Hadley’s book, Advanced R, also available free of charge on the web at http://adv-r.had.co.nz/ .
A word of advice for getting started — spend some time searching the internet, e.g., Google search, to see if someone has already written applicable code that you can study and modify. Always give proper credit, but it’s a good way to start. More importantly, spend some time outlining the problem — this is sometimes called pseudocode — helps make sure you understand the steps needed to solve the problem.
4. Page quiz
quiz in development