Create a logical vector that is TRUE for southeast Asian countries.
Come up with 3 approaches:
Wrong way (using only ==)
Clunky way (using == and |)
Elegant way (using %in%)
Challenge 3 Solution
# Wrong - gives warningcountries == seAsia# Clunky - works but tediouscountries =="Myanmar"| countries =="Thailand"| countries =="Cambodia"| countries =="Vietnam"| countries =="Laos"# Elegant - best approachcountries %in% seAsia
Handling Special Values
R has special functions for dealing with missing/invalid data:
is.na() - finds NA or NaN
is.nan() - finds NaN only
is.infinite() - finds Inf
is.finite() - finds normal values (excludes NA, NaN, Inf)
na.omit() - removes all missing values
Factor Subsetting
Factors work like vectors:
f <-factor(c("a", "a", "b", "c", "c", "d"))f[f =="a"]
f[f %in%c("b", "c")]
Factors Keep All Levels
Skipping elements doesn’t remove levels:
f[-3] # Removed "b" value
Notice: Levels still shows all 4 levels (a, b, c, d)
A. m[2,4,2,5]
B. m[2:5]
C. m[4:5,2]
D. m[2,c(4,5)]
Challenge 4 Solution
Answer: D
m[2, c(4,5)] # Row 2, columns 4 and 5
Row 2, column 4 = 11
Row 2, column 5 = 14
List Subsetting with []
[ returns a list:
xlist <-list(a ="Software Carpentry", b =1:10, data =head(mtcars))xlist[1] # Returns list with one element
List Subsetting Multiple Elements
xlist[1:2] # Returns list with two elements
Extracting Elements with [[]]
[[]] extracts the actual element:
xlist[[1]] # Returns the vector itself
Now the result is a character vector, not a list!
[[]] Limitations
Can’t extract multiple elements:
xlist[[1:2]] # Error!
Error: subscript out of bounds
Can’t skip elements:
xlist[[-1]] # Error!
Extracting by Name
Use names with [[]]:
xlist[["a"]]
The $ Shortcut
$ is shorthand for extracting by name:
xlist$data
Equivalent to xlist[["data"]]
Challenge 5
Given:
xlist <-list(a ="Software Carpentry", b =1:10, data =head(mtcars))
Extract the number 2 from xlist.
Hint: The number 2 is in the “b” item.
Challenge 5 Solution
xlist[[2]][2] # or xlist[["b"]][2] or xlist$b[2]
First [[2]] extracts the vector, then [2] gets the second element.
Challenge 6
Given a linear model:
mod <-aov(pop ~ lifeExp, data=gapminder)
Extract the residual degrees of freedom.
Hint: attributes() will help you!
Challenge 6 Solution
attributes(mod) # See all available attributesmod$df.residual
Data Frame Subsetting: Single Argument
[ with one argument acts on columns:
gapminder <-read.csv("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/main/episodes/data/gapminder_data.csv")head(gapminder[3]) # Returns data frame with column 3
Data Frame Subsetting: [[]]
[[]] extracts a column as a vector:
head(gapminder[["lifeExp"]])
Data Frame Subsetting: $
$ is the convenient shorthand:
head(gapminder$year)
Data Frame: Two Arguments
With [row, column], acts like a matrix:
gapminder[1:3, ] # First 3 rows, all columns
Single Row Subsetting
Single row returns a data frame:
gapminder[3, ]
Mixed types preserved in data frame structure.
Challenge 7
Fix these common data frame subsetting errors:
Extract observations from 1957:
gapminder[gapminder$year =1957,]
Extract all columns except 1 through 4:
gapminder[, -1:4]
Challenge 7 Continued
Extract rows where life expectancy > 80:
gapminder[gapminder$lifeExp >80]
Extract first row, columns 4 and 5:
gapminder[1, 4, 5]
Extract rows for years 2002 and 2007:
gapminder[gapminder$year ==2002|2007,]
Challenge 7 Solution
# 1. Use == not =gapminder[gapminder$year ==1957,]# 2. Wrap range in parenthesesgapminder[, -(1:4)]# 3. Need comma for rowsgapminder[gapminder$lifeExp >80, ]# 4. Use c() for multiple columnsgapminder[1, c(4, 5)]# 5. Complete both comparisonsgapminder[gapminder$year ==2002| gapminder$year ==2007,]# or better:gapminder[gapminder$year %in%c(2002, 2007),]
Challenge 8
Why does gapminder[1:20] return an error? How does it differ from gapminder[1:20, ]?
Create a new data frame called gapminder_small that only contains rows 1 through 9 and 19 through 23. You can do this in one or two steps.
Challenge 8 Solution
# 1. gapminder[1:20] tries to get columns 1-20# gapminder[1:20, ] gets rows 1-20 (correct)# 2. One step:gapminder_small <- gapminder[c(1:9, 19:23), ]# Two steps:gapminder_small <- gapminder[1:9, ]gapminder_small <-rbind(gapminder_small, gapminder[19:23, ])