Functions Explained
Adapted from Software Carpentry
Overview
Today we will learn to:
Define a function that takes arguments
Return a value from a function
Check argument conditions with stopifnot()
Test a function
Set default values for function arguments
Divide programs into small, single-purpose functions
Questions
How can I write a new function in R?
How do I make my code more reusable?
How can I check that my function is working correctly?
Why Functions?
If we only analyze one dataset:
Load in spreadsheet
Calculate simple statistics
Done!
But what if:
Data is updated periodically?
Need to re-run analysis?
Get similar data from different sources?
Functions let us repeat operations with a single command!
What is a Function?
Functions gather a sequence of operations into a whole, providing:
A name we can remember and invoke
Relief from remembering individual operations
A defined set of inputs and outputs
Rich connections to the larger programming environment
If you have written a function, you are a computer programmer!
Function Structure
The general structure of a function:
my_function <- function (parameters) {
# perform action
# return value
}
Three key parts: name, parameters, and body
Defining a Function
Convert Fahrenheit to Kelvin:
fahr_to_kelvin <- function (temp) {
kelvin <- ((temp - 32 ) * (5 / 9 )) + 273.15
return (kelvin)
}
Function Anatomy
Breaking down fahr_to_kelvin:
Name : fahr_to_kelvin
Parameter : temp (in parentheses)
Body : Code within curly braces {}
Return : Value sent back to caller
Indentation (2 spaces) makes code readable!
Functions as Recipes
Think of creating functions like writing a cookbook:
Define “ingredients” (parameters)
Say what to do with them (body)
Serve the result (return value)
When we call the function, arguments are assigned to parameters.
Return Statement
R is unique: The return statement is not required!
R automatically returns the last line in the function body.
But for clarity , we will explicitly use return() statements.
Calling Our Function
Test the function with known values:
# Freezing point of water
fahr_to_kelvin (32 )
# Boiling point of water
fahr_to_kelvin (212 )
Challenge 1
Write a function called kelvin_to_celsius() that takes a temperature in Kelvin and returns that temperature in Celsius .
Hint: To convert from Kelvin to Celsius you subtract 273.15
Try it yourself before looking at the solution!
Challenge 1 Solution
kelvin_to_celsius <- function (temp) {
celsius <- temp - 273.15
return (celsius)
}
# Test it
kelvin_to_celsius (273.15 ) # Should be 0
Combining Functions
The real power: mixing, matching, and combining functions!
Let’s define both conversion functions:
fahr_to_kelvin <- function (temp) {
kelvin <- ((temp - 32 ) * (5 / 9 )) + 273.15
return (kelvin)
}
kelvin_to_celsius <- function (temp) {
celsius <- temp - 273.15
return (celsius)
}
Challenge 2
Define a function to convert directly from Fahrenheit to Celsius , by reusing the two functions above.
Think about how you can call one function from inside another!
Challenge 2 Solution
fahr_to_celsius <- function (temp) {
temp_k <- fahr_to_kelvin (temp)
result <- kelvin_to_celsius (temp_k)
return (result)
}
# Test it
fahr_to_celsius (32 ) # Freezing point = 0°C
fahr_to_celsius (212 ) # Boiling point = 100°C
Defensive Programming
Important concept: Ensure functions only work in their intended use cases!
Defensive programming:
Frequently check conditions
Throw errors if something is wrong
Use assertion statements
Make debugging easier
Problem with Current Function
What happens with bad input?
fahr_to_kelvin ("hot" ) # A string, not a number
This will fail with a cryptic error. We should fail early and clearly !
Using stop()
Check conditions with if statements:
fahr_to_kelvin <- function (temp) {
if (! is.numeric (temp)) {
stop ("temp must be a numeric vector." )
}
kelvin <- ((temp - 32 ) * (5 / 9 )) + 273.15
return (kelvin)
}
The stopifnot() Function
Better approach for multiple conditions:
fahr_to_kelvin <- function (temp) {
stopifnot (is.numeric (temp))
kelvin <- ((temp - 32 ) * (5 / 9 )) + 273.15
return (kelvin)
}
stopifnot() lists requirements that should be TRUE
Testing stopifnot()
Works with proper input:
fahr_to_kelvin (temp = 32 )
Testing stopifnot() - Error
Fails instantly with improper input:
fahr_to_kelvin (temp = as.factor (32 ))
Error in fahr_to_kelvin():
! is.numeric(temp) is not TRUE
Clear error message!
Why stopifnot()?
Benefits of stopifnot():
Handles multiple conditions easily
Acts as extra documentation
Fails fast and clearly
Better than cryptic errors later
List all requirements at the start of your function!
Challenge 3
Use defensive programming to ensure that fahr_to_celsius() throws an error immediately if the argument temp is specified inappropriately.
Add a stopifnot() check to your function!
Challenge 3 Solution
fahr_to_celsius <- function (temp) {
stopifnot (is.numeric (temp))
temp_k <- fahr_to_kelvin (temp)
result <- kelvin_to_celsius (temp_k)
return (result)
}
More Complex Functions
Let’s work with the gapminder dataset!
Define a function to calculate GDP:
# Takes a dataset and multiplies the population column
# with the GDP per capita column.
calcGDP <- function (dat) {
gdp <- dat$ pop * dat$ gdpPercap
return (gdp)
}
Function Structure Review
Breaking down calcGDP:
Argument : dat (a data frame)
Body : Multiply population by GDP per capita
Return : The calculated GDP values
Indentation makes code readable!
Testing calcGDP
Load the gapminder data and test:
gapminder <- read.csv (
"https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/main/episodes/data/gapminder_data.csv"
)
calcGDP (head (gapminder))
Output: [1] 6567086330 7585448670 8758855797 9648014150 9678553274 11697659231
Not very informative!
Adding More Arguments
Make it more flexible with optional arguments:
calcGDP <- function (dat, year= NULL , country= NULL ) {
if (! is.null (year)) {
dat <- dat[dat$ year %in% year, ]
}
if (! is.null (country)) {
dat <- dat[dat$ country %in% country, ]
}
gdp <- dat$ pop * dat$ gdpPercap
new <- cbind (dat, gdp= gdp)
return (new)
}
Default Arguments
Notice year=NULL and country=NULL:
These are default values
Used if user doesn’t specify
Makes function more flexible
Use = in function definition for defaults
Loading Functions
Save functions to a file, then load them:
source ("functions/functions-lesson.R" )
Loads all functions from that file into your session!
How calcGDP Works
Plain English explanation:
Subset by year (if provided)
Subset by country (if provided)
Calculate GDP on the subset
Add GDP as new column
Return the enhanced data frame
Much more informative than just numbers!
Testing with Year
Extract data for a specific year:
head (calcGDP (gapminder, year= 2007 ))
Output includes country, year, population, continent, life expectancy, GDP per capita, and calculated GDP!
Testing with Country
Extract data for a specific country:
calcGDP (gapminder, country= "Australia" )
Returns all years for Australia with calculated GDP.
Testing with Both
Combine year and country:
calcGDP (gapminder, year= 2007 , country= "Australia" )
Returns just one row: Australia in 2007!
Understanding NULL Checks
The condition checks:
if (! is.null (year)) {
dat <- dat[dat$ year %in% year, ]
}
Check if argument was provided
If yes: subset the data
If no: use full dataset
Function Flexibility
Now calcGDP can calculate GDP for:
The whole dataset
A single year
A single country
A combination of year and country
Multiple years or countries (using %in%)
Building in conditionals makes functions flexible!
Pass by Value
Important: Functions in R make copies of data!
When we modify dat inside the function
We modify a copy , not the original
Original variable remains unchanged
This is “pass-by-value” - makes coding safer!
Function Scope
Scoping: Variables created inside a function only exist during execution!
When we call calcGDP():
dat, gdp, new only exist inside the function
Don’t affect variables outside the function
Even if they have the same names!
Challenge 4
Test out your GDP function by calculating the GDP for New Zealand in 1987 .
How does this differ from New Zealand’s GDP in 1952 ?
Challenge 4 Solution
calcGDP (gapminder, year= 1987 , country= "New Zealand" )
calcGDP (gapminder, year= 1952 , country= "New Zealand" )
# Or compare both:
calcGDP (gapminder, year= c (1952 , 1987 ),
country= "New Zealand" )
Challenge 5
The paste() function combines text together:
best_practice <- c ("Write" , "programs" , "for" ,
"people" , "not" , "computers" )
paste (best_practice, collapse= " " )
Write a function called fence() that takes two vectors: text and wrapper, and prints the text wrapped with the wrapper.
Challenge 5 Expected Output
Expected output:
fence (text= best_practice, wrapper= "***" )
Should produce:
[1] "*** Write programs for people not computers ***"
Hint: paste() has a sep argument for separators!
Challenge 5 Solution
fence <- function (text, wrapper) {
text_combined <- paste (text, collapse= " " )
result <- paste (wrapper, text_combined, wrapper, sep= "" )
return (result)
}
# Test it
fence (text= best_practice, wrapper= "***" )
Testing and Documenting
Essential practices for functions:
Write a function
Comment parts to document behavior
Load the source file
Experiment in console to test
Fix any bugs
Repeat until it works!
Why Test Functions?
Testing ensures:
Function does what you think it does
Catches bugs early
Makes debugging easier
Builds confidence in your code
Always test with expected inputs and edge cases!
Documentation Matters
Good documentation:
Explains what the function does
Lists required parameters
Describes what is returned
Provides examples
Helps future you (and others!)
Comments are your friends!
Building Good Functions
Best practices:
Single purpose : One function, one task
Clear names : Describe what it does
Document : Comment your code
Test : Verify it works correctly
Defensive : Check inputs with stopifnot()
Modular : Build complex from simple
Why Small Functions?
Divide programs into small, single-purpose functions because:
Easier to understand
Easier to test
Easier to debug
Easier to reuse
Easier to maintain
Think LEGO blocks, not monoliths!
When to Write a Function?
Consider writing a function when:
You copy-paste code more than twice
Logic is complex and needs a name
You need to reuse an operation
Code would be clearer with abstraction
You want to test a specific piece
DRY principle: Don’t Repeat Yourself
Function Design Process
Write code that works once
Identify repeated patterns
Extract into function
Add parameters for flexibility
Add defensive checks
Document and test
Refine and improve
Creating a Script File
Organize your functions:
Create a functions/ directory
Create functions-lesson.R file
Write all functions there
Load with source("functions/functions-lesson.R")
Keeps workspace clean and organized!
Key Points
Use function to define a new function in R
Use parameters to pass values into functions
Use stopifnot() to check function arguments
Load functions into programs using source()
Functions make code more readable and reusable
Key Points (Continued)
Functions can call other functions
Default values make functions flexible
R uses pass-by-value (copies data)
Variables inside functions have local scope
Always test and document your functions
Common Mistakes
Avoid these pitfalls:
Forgetting to use return()
Not checking input arguments
Making functions too complex
Poor naming choices
Lack of documentation
Not testing edge cases
Example: Good vs Bad Names
Bad names:
f(), my_function(), do_stuff()
Good names:
calc_mean(), filter_by_year(), plot_histogram()
Names should describe what the function does!
Function Arguments Best Practices
Required arguments first , optional after
Use sensible defaults when possible
Check arguments with stopifnot()
Use descriptive parameter names
Document what types are expected
Composition is Powerful
Build complex operations from simple functions:
# Simple functions
kelvin_to_celsius <- function (temp) { ... }
fahr_to_kelvin <- function (temp) { ... }
# Composed function
fahr_to_celsius <- function (temp) {
fahr_to_kelvin (temp) %>% kelvin_to_celsius ()
}
Return Values
What should functions return?
Single value : Number, string, logical
Vector : Multiple values of same type
List : Multiple values of different types
Data frame : Tabular results
Choose based on what makes sense!
Advanced Topics (Preview)
Beyond this lesson:
Anonymous functions (lambda)
Functions as arguments
Environments and closures
S3 and S4 object systems
Package development
Resources: R Language Manual, Advanced R by Hadley Wickham
Practice Exercise
On your own, write a function that:
Takes a data frame and column name
Calculates summary statistics
Returns a named vector with min, max, mean, median
Bonus: Add error checking!
Recap
Today we learned:
How to define functions
Function anatomy (name, parameters, body, return)
Combining functions
Defensive programming with stopifnot()
Default arguments for flexibility
Testing and documentation
Remember
Functions are the building blocks of programming!
Start simple, build complexity gradually
Test early and often
Document for future you
Don’t repeat yourself - write functions!
Resources
Software Carpentry: r-novice-gapminder
R Documentation: ?function, ?stopifnot
Advanced R by Hadley Wickham
R Language Manual