Part 04. Create functions in R
Instructions on this page are general to use of R and apply to R used on local computer or to users of R in Google CoLab.
Learning objectives
- Define user-defined functions in R to automate repetitive tasks.
- Write and test a custom function using real-world examples.
- Enhance your coding efficiency by reusing functions across analyses.
What’s on this page?
- Write simple functions to save you time.
- Our first function
- A more advanced function with iteration.
- Get user input.
- About programming style
- Quiz
What to do
Complete the exercises on this page
- Create simple functions
- Add user input to a function
- A simple coder’s philosophy
How to do it
For these exercises, you may work
- within the Rcmdr script window
- at the command line and R prompt
- from a script document and the R GUI app
- RStudio
- Google CoLab
Choose one way. The quickest way is to just work within R.
Let’s begin.
Before you start
A reminder, always set your working directory in R first (see Part 02. Getting started with R and Rcmdr), before starting to do your analyses. Makes life a lot easier.
1. Write simple functions to save you time.
In programming, a function is a set of instructions — code — to carry out a specific task. R has many built in functions, like mean(), sd(), and more. Packages increase available capacity by adding user-defined functions. We’ll look at extending R with packages more in depth in the next workbook session Part 04. R packages: Making R do more. Put simply, packages contain user-defined functions and this ability, to write functions, is what makes R so powerful. Our purpose this semester in Biostatistics is not to teach you programming — R is a programming language — you should at least be introduced to the promise that writing your own functions may have.
2. Our first function
I’m an NFL fan and like to keep track of passer ratings of the quarterbacks, so I’ll share a simple function to calculate NFL version of the passer rating. The NFL passer rating is on a scale from 0 to 158.3, with higher values reflecting higher passer (quarterback) performance. You can read more about passer rating at Wikipedia.
QBR <- function(ATT, COMP, INT, TD, YDS){
a = ((COMP/ATT)-.3)*5
b = ((YDS/ATT)-3)*.25
c = (TD/ATT)*20
d = 2.375-((INT/ATT)*25)
PR = ((a+b+c+d)/6)*100
return(PR)
}
The basics of writing a function in R begins with the function() call. Parameters like ATT (short for pass attempts) are passed to function(). Code is entered between curly brackets { and }. return() returns a single object from the function.
Let’s try our function. Enter values for Russell Wilson (Denver Broncos) the 2022 regular season (sources: NFL.com).
ATT <- 483 COMP <- 292 INT <- 11 TD <- 16 YDS <- 3524
Now, call our function (note that instead of writing the variable names you could simply type in the actual numbers — I don’t really advise this because there are so many and it would be easy to make a mistake).
QBR(ATT, COMP, INT, TD, YDS)
And the output is
[1] 84.41598
On your own 1. Copy the function into your R script window. Calculate the passer rating for another quarterback Geno Smith (Seattle Seahawks) given his statistics for the 2022 NFL season: 572 (attempts), 399 (completions), 4282 (yards), 11 (interceptions), and 30 (touchdowns) (sources: NFL.com).
It doesn’t make much sense to report a rating to 7 significant figures (4 decimal). Use the built in function round() function, or the signif() function to report the passer rating to four significant digits.
On our own 2. Assign the output for Tom Brady from our QBR function to x, then compare round(x, n) and signif(x, n), where n is the number of significant digits. Do you get the same number?
On our own 3. Compare Microsoft Excel’s round function to R’s round() and signif().
Note 1: In Microsoft Excel, click on any cell and enter “=round(x,n)“, replacing x and n with the value and number of digits. And of course, don’t type the quotes. The function call is written the same in Google Sheets and LibreOffice Calc.
For more about significant figures, see blog by Neil Gunther at perfdynamics.blogspot.com.
3. A more advanced function with iteration
In Note 1, Chapter 2.2 of Mike’s Biostatistics Book we obtain the number of base R downloads for August of 2025. Here’s code to accomplish the task for a range of years from the start of the mirror site until 2025. That was one-at-a-time code; much better, introduce iteration. Iteration refer to a program that repeats a set of code multiple times, usually over a list of items or a range of numbers. R has many ways to accomplish iteration.
Our first script uses a for loop. The second script example replaces the for loop with the Map() function. Install the package cranlogs before calling the script.
R output from either version looks like
Year Downloads 1 2015 75052 2 2016 45437 3 2017 62631 4 2018 61715 5 2019 153197 6 2020 340546 7 2021 370634 8 2022 539847 9 2023 500332 10 2024 704176 11 2025 582524
For loop version.
library(cranlogs)
# Parameters
startYear <- 2015
endYear <- 2025
monthDay <- c("-08-01", "-08-31")
# Build start and end date vectors
years <- seq(startYear, endYear)
startDate <- as.Date(paste0(years, monthDay[1]))
endDate <- as.Date(paste0(years, monthDay[2]))
# Download counts
output_vec <- numeric(length(years))
for (i in seq_along(years)) {
out <- cran_downloads("R", from = startDate[i], to = endDate[i])
output_vec[i] <- sum(out$count)
}
# Results table
results <- data.frame(Year = years, Downloads = output_vec)
print(results)
Try without the for loop.
library(cranlogs)
# Parameters
startYear <- 2015
endYear <- 2025
monthDay <- c("-08-01", "-08-31")
# Build start and end date vectors
years <- seq(startYear, endYear)
startDate <- as.Date(paste0(years, monthDay[1]))
endDate <- as.Date(paste0(years, monthDay[2]))
# Download counts with Map (pairs start/end dates)
output_vec <- unlist(Map(function(s, e) {
out <- cran_downloads("R", from = s, to = e)
sum(out$count)
}, startDate, endDate))
# Results table
results <- data.frame(Year = years, Downloads = output_vec)
print(results)
Note 2: No time-to-complete benefit Map() over the for loop version — the limiting factor in both scripts is from the call to the network via cran_downloads(). However, if you want to confirm, wrap the entire script in the function system.time({script here}).
4. Try another simple function, but get user input.
It’s pretty routine in biology lab — you have a stock solution at 1 Molar concentration, and you need to make a working solution, say, at 200 millimolar (0.2 M). What volume (mL) of the stock solution do you need to make a 200 mL working solution? So, you drag out your calculator and apply the C1V1 = C2V2 equation and solve for V1. And repeat as necessary for all of your working solutions. Any time you have to repeat a calculation, that’s a sign you should write a function!
Let C1 = 1, C2 = 0.2, and V2 = 200 ml. Here’s a simple R function
dilution <- function(){
n1 <- readline(prompt="Enter c1:")
c1 <- as.numeric(n1)
n2 <- readline(prompt="Enter c2:")
c2 <- as.numeric(n2)
n3 <- readline(prompt="Enter v2:")
v2 <- as.numeric(n3)
v1 <- (c2*v2)/c1
return(v1)
}
After loading the function, type at the R prompt (the pink text is R requesting information)
dilution() Enter c1: 1 Enter c2: .2 Enter v2: 200
and R returns
[1] 40
Question. Why does R return “[1]” ?
Answer. Even though this function returns only one number, the object is treated as a vector. As we know, elements in vectors are numbered by their location within the vector. So, the first and in this case only element in this vector is 40 at position 1 of the vector.
Note that this is a pretty limited function. A better function would allow the user to choose which of the parameters, C2, V1, or V2, to calculate and to keep track of units. I also haven’t provided any error checking — for example, what happens if V1 returns a value like 0.01 microliters? That value is unrealistic for a volumetric pipettor. All of these scenarios could be accommodated with judicial (i.e., nested) use of if/else statements, demonstrated with this next function, a temperature unit conversion function.
tempConvert <- function() {
prompt <- readline(prompt="Enter 0 to convert C to F or 1 to convert F to C: ")
deg <- readline(prompt="Enter the temperature: ")
deg <- as.numeric(deg)
if(prompt==0){
#Convert Celsius to Fahrenheit
out <- (deg*1.8)+32} else {
#Convert Fahrenheit to Celsius
out <- (deg-32)*5/9}
return(out)
}
After loading the function, type at the R prompt (the pink text is R requesting information)
tempConvert() Enter 0 to convert C to F or 1 to convert F to C: 0 Enter the temperature: 60
and R returns
[1] 140
On your own 4. Create a function to calculate the number of days between any date and today’s date.
On your own 5. Modify the V1 C1 = V2 C2 script so that the user can choose between calculating any of the V1, C1, V2, C2 parameters.
On your own 6. We often need to convert between the different units of measurement, and programs like the tempConvert function can be helpful. Create a general function to allow a user to convert between Metric and the Imperial/US systems of length measures, e.g., centimeters and inches, meters and yards, etc. (e.g., see Wikipedia Unit of length).
5. About programming style
Style is about rules or guidelines for writing computer code. In many ways our goal is to just get the code to work. However, we should also be aware that others will need to read and interpret the code we write. That said, you should at least make useful comments and be consistent. For more, see the chapter on R style in Hadley’s book, Advanced R, also available free of charge on the web at http://adv-r.had.co.nz/ .
A word of advice for getting started — spend some time searching the internet, e.g., Google search, to see if someone has already written applicable code that you can study and modify. Always give proper credit, but it’s a good way to start. More importantly, spend some time outlining the problem — this is sometimes called pseudocode — helps make sure you understand the steps needed to solve the problem.
6. Page quiz
quiz in development