Adapted from Software Carpentry
Today we will learn to:
if...else statementsifelse() function for vectorized conditionsfor() loopswhile() loopsOften when coding, we want to control the flow of our actions:
Control flow is essential for automating complex analyses!
The most common approaches for conditional statements:
Print a message if a variable has a particular value:
The print statement doesn’t appear because x is not greater than 10.
To print a message for numbers less than 10, add else:
Test multiple conditions using else if:
Important: R looks for a logical element (TRUE or FALSE) inside if() statements.
The “not equal” message was printed because x is FALSE:
Use an if() statement to print a suitable message reporting whether there are any records from 2002 in the gapminder dataset.
Now do the same for 2012.
Hint: Think about how to check if any values match a condition!
First, let’s load the data:
Check for records from 2002:
Did you try this for 2012?
You may have received a warning or error!
The if() function only accepts singular (length 1) inputs:
Error in `if (gapminder$year == 2012) ...`:
! the condition has length > 1
The if() function will only evaluate the condition in the first element of the vector.
To use if(), make sure your input is singular (length 1)!
R’s built-in ifelse() function accepts both singular and vector inputs:
TRUEFALSEWorks with vectors too!
Two useful functions for checking conditions on vectors:
any() - returns TRUE if at least one TRUE value is foundall() - returns TRUE only if all values are TRUECheck if any records exist for a year:
Similar to the %in% operator!
If you want to iterate over a set of values and perform the same operation on each, use a for() loop:
Print numbers 1 through 10:
The 1:10 creates a vector on the fly.
You can iterate over any vector:
Use a for() loop nested within another to iterate over two things:
When the first index (i) is set to 1:
j) iterates through its full setj is complete, i is incrementedWrite loop output to a new object:
“Growing” results (building the result object incrementally) is computationally inefficient.
The problem:
Better approach: Define an empty results object with appropriate dimensions before filling in values.
Define your output object before filling values:
Compare the objects output_vector and output_vector2.
output_vector2 the same as output_vector?Check if they’re identical:
They’re not the same! But all elements exist in both:
The elements are sorted in different order because as.vector() outputs elements by column.
Fix: Transpose the output matrix:
Sometimes you need to repeat an operation as long as a condition is met:
R interprets a condition being met as TRUE.
Generate random numbers until you get one less than 0.1:
Be careful with while() loops!
FALSEWrite a script that loops through the gapminder data by continent and prints out whether the mean life expectancy is smaller or larger than 50 years.
Hint: Use unique() to get continent names, then subset and calculate means.
Step 1: Get unique continents:
Step 2: Loop and calculate:
thresholdValue <- 50
for (iContinent in unique(gapminder$continent)) {
tmp <- mean(gapminder[gapminder$continent == iContinent, "lifeExp"])
if (tmp < thresholdValue) {
cat("Average Life Expectancy in", iContinent, "is less than", thresholdValue, "\n")
} else {
cat("Average Life Expectancy in", iContinent, "is greater than", thresholdValue, "\n")
}
rm(tmp)
}Modify the script from Challenge 3 to loop over each country.
This time print out whether the life expectancy is:
Add two thresholds and extend the if-else statements:
lowerThreshold <- 50
upperThreshold <- 70
for (iCountry in unique(gapminder$country)) {
tmp <- mean(gapminder[gapminder$country == iCountry, "lifeExp"])
if (tmp < lowerThreshold) {
cat("Average Life Expectancy in", iCountry, "is less than", lowerThreshold, "\n")
} else if (tmp > lowerThreshold && tmp < upperThreshold) {
cat("Average Life Expectancy in", iCountry, "is between", lowerThreshold, "and", upperThreshold, "\n")
} else {
cat("Average Life Expectancy in", iCountry, "is greater than", upperThreshold, "\n")
}
rm(tmp)
}Write a script that loops over each country in the gapminder dataset:
Hint: Use grep() with value = TRUE to find countries starting with “B”.
Find countries starting with “B”:
Complete solution:
thresholdValue <- 50
candidateCountries <- grep("^B", unique(gapminder$country), value = TRUE)
for (iCountry in candidateCountries) {
tmp <- mean(gapminder[gapminder$country == iCountry, "lifeExp"])
if (tmp < thresholdValue) {
cat("Average Life Expectancy in", iCountry, "is less than", thresholdValue,
"- plotting life expectancy graph...\n")
with(subset(gapminder, country == iCountry),
plot(year, lifeExp,
type = "o",
main = paste("Life Expectancy in", iCountry, "over time"),
ylab = "Life Expectancy",
xlab = "Year"
)
)
}
}The advice of many R users:
for() loopsIf order doesn’t matter, consider vectorized alternatives like the purrr package for better computational efficiency.
if and else to make choicesfor to repeat operationswhile for condition-based repetitionif() expects a single logical value (length 1)any() or all() to summarize logical vectorsifelse() for vectorized conditional operationswhile loops?if, ?for, ?while