ggsave("My_most_recent_plot.pdf")Writing Data in R
Adapted from Software Carpentry
Overview
Today we will learn to:
- Save plots from R to files
- Control plot size and resolution
- Write data frames to CSV and other formats
- Handle formatting options for data output
- Create reproducible data-cleaning workflows
Questions
- How can I save plots and data created in R?
- What formats can I export to?
- How do I control the output format?
Why Write Data and Plots?
- Share results with colleagues
- Archive your work
- Create reports with multiple plots
- Automate workflows without manual clicking
- Batch process multiple files
Writing output is essential for reproducible research!
Saving Plots: Quick Method
The easiest way - use ggsave() from ggplot2:
Saves your most recent ggplot to a file.
When to Use Manual Methods
The Export button works great when:
- You want a quick save
- Making one or two plots
- Interactively exploring
But… what if you need to:
- Create a PDF with multiple pages?
- Loop through subsets of data?
- Generate many plots automatically?
You need programmatic control!
Using the pdf() Device
Create a PDF file programmatically:
pdf("Life_Exp_vs_time.pdf", width=12, height=4)
# Your plotting code here
ggplot(data=gapminder, aes(x=year, y=lifeExp,
colour=country)) +
geom_line() +
theme(legend.position = "none")
# Important: Turn off the PDF device!
dev.off()Understanding Devices
In R, a “device” is where your plot goes:
- Screen device: Default (shows in Plot pane)
- PDF device: Writes to PDF file
- PNG device: Writes to PNG file
- JPEG device: Writes to JPEG file
- Other devices: SVG, BMP, TIFF, etc.
The pdf() Function
Parameters for control:
pdf("filename.pdf",
width = 12, # Width in inches
height = 4, # Height in inches
pointsize = 12, # Font size
family = "sans") # Font familyCreating Multi-Page PDFs
Each plotting command creates a new page:
pdf("multi_page.pdf", width=10, height=6)
# Page 1
plot(x = 1:10, y = 1:10)
# Page 2
plot(x = 1:10, y = (1:10)^2)
# Page 3
plot(x = 1:10, y = sqrt(1:10))
dev.off() # Close the deviceEach call to plot() adds a new page!
Important: dev.off()
Always remember to call dev.off()!
- Closes the output device
- Finalizes the file
- Frees up the device
- File won’t be complete without it
If you forget, the file is incomplete or corrupted!
Other Image Formats
Use similar functions for different formats:
png("plot.png", width=800, height=600)
# Your plot code
dev.off()
jpeg("plot.jpg", width=800, height=600)
# Your plot code
dev.off()
svg("plot.svg", width=10, height=8)
# Your plot code
dev.off()Same structure, different output formats!
PNG and JPEG Parameters
For bitmap formats, use pixels for size:
png("plot.png", width=800, height=600, res=72)
# res = resolution in DPI (dots per inch)
jpeg("plot.jpg", width=800, height=600, quality=90)
# quality = 0-100 (higher = better quality, larger file)Challenge 1
Write a command to create a multi-page PDF showing:
- Life expectancy vs. time (line plot) - first page
- Same data with facets by continent (facet_grid) - second page
Make the PDF 12 inches wide and 4 inches tall.
Challenge 1 Solution
pdf("Life_Exp_vs_time.pdf", width=12, height=4)
# Page 1: Line plot
ggplot(data=gapminder, aes(x=year, y=lifeExp,
colour=country)) +
geom_line() +
theme(legend.position = "none")
# Page 2: Faceted plot
ggplot(data=gapminder, aes(x=year, y=lifeExp,
colour=country)) +
geom_line() +
facet_grid(~continent) +
theme(legend.position = "none")
dev.off()Writing Data: Introduction
Now let’s save data (not plots)!
Use the write.table() function:
aust_subset <- gapminder[gapminder$country == "Australia", ]
write.table(aust_subset,
file = "cleaned-data/gapminder-aus.csv",
sep = ","
)This saves a data frame to a file.
First Look at Output
Let’s check what was written:
head cleaned-data/gapminder-aus.csvOutput:
"country","year","pop","continent","lifeExp","gdpPercap"
"61","Australia",1952,8691212,"Oceania",69.12,10039.59564
"62","Australia",1957,9712569,"Oceania",70.33,10949.64959
Problem: Lots of unwanted quotation marks and row numbers!
write.table() Default Behavior
By default, write.table():
- Wraps character vectors in quotes
- Includes row and column names
- Includes row numbers
- Uses spaces as separator
This is usually NOT what we want for CSV files!
Fixing the Output
Add parameters to control formatting:
write.table(
gapminder[gapminder$country == "Australia", ],
file = "cleaned-data/gapminder-aus.csv",
sep = ",",
quote = FALSE, # Don't quote strings
row.names = FALSE # Don't include row numbers
)Checking the Fixed Output
head cleaned-data/gapminder-aus.csvOutput:
country,year,pop,continent,lifeExp,gdpPercap
Australia,1952,8691212,Oceania,69.12,10039.59564
Australia,1957,9712569,Oceania,70.33,10949.64959
Australia,1962,10794968,Oceania,70.93,12217.22686
Australia,1967,11872264,Oceania,71.1,14526.12465
Perfect! Clean and readable CSV!
write.table() Parameters
Key parameters:
file: Output filenamesep: Column separator (","for CSV)quote: Wrap strings in quotes? (default TRUE)row.names: Include row numbers? (default TRUE)col.names: Include column names? (default TRUE)na: How to represent NA values (default “NA”)
Alternative: write.csv()
Use write.csv() as a shortcut:
write.csv(aust_subset,
file = "cleaned-data/gapminder-aus.csv",
row.names = FALSE
)write.csv() automatically sets:
sep = ","quote = TRUE(includes headers and factors)
Choosing Between write.table() and write.csv()
Use write.table() when:
- You need custom separators
- You want
quote = FALSE - Working with tab-delimited files
- Want precise control
Use write.csv() when:
- Creating standard CSV files
- Want quotes around text
- Don’t need to customize separator
Challenge 2
Create a data-cleaning script that:
- Loads the gapminder data
- Subsets it to include only data since 1990
- Writes it to
cleaned-data/gapminder-1990-onwards.csv
Name the script data-cleaning.R and place it in the root directory.
Challenge 2 Solution
Create data-cleaning.R:
# Load the data
gapminder <- read.csv(
"data/gapminder_data.csv"
)
# Subset data from 1990 onwards
gapminder_recent <- gapminder[gapminder$year >= 1990, ]
# Write to file
write.csv(gapminder_recent,
file = "cleaned-data/gapminder-1990-onwards.csv",
row.names = FALSE
)
print("Data cleaning complete!")Run it with: source("data-cleaning.R")
Workflow: From Raw to Clean Data
Typical data processing workflow:
- Load raw data with
read.csv() - Explore structure with
str(),head() - Clean by subsetting, transforming
- Check results with
summary(),tail() - Write cleaned data with
write.csv()
Why This Workflow?
This approach ensures:
- Reproducibility: Script documents all steps
- Traceability: Can see what changed
- Reusability: Run again with new data
- Sharing: Others can repeat your process
- Auditability: Archive of transformations
Data Output Formats
Beyond CSV, R can write:
# Tab-separated values
write.table(df, "file.tsv", sep = "\t", row.names = FALSE)
# Fixed-width format
write.fwf(df, "file.txt")
# Other formats (with packages)
library(readxl)
writexl::write_xlsx(df, "file.xlsx")
library(haven)
write_sas(df, "file.sas7bdat")Creating Output Directories
Before writing files, ensure directory exists:
# Check if directory exists
if (!dir.exists("cleaned-data")) {
dir.create("cleaned-data")
}
# Then write your file
write.csv(df, "cleaned-data/output.csv")Prevents “directory not found” errors!
Best Practices for Output
- Use descriptive filenames:
gapminder-1990-onwards.csv - Include dates:
results-2026-01-30.csv - Document transformations: Comment your code
- Save intermediate steps: For debugging
- Use consistent separators: CSV or TSV
- Turn off row numbers: Unless needed
- Quote only when needed: Smaller files
Combining Saving Plots and Data
Complete workflow:
# Subset data
subset_data <- gapminder[gapminder$year >= 1990, ]
# Create and save plot
pdf("plot_recent.pdf")
ggplot(subset_data, aes(x=year, y=lifeExp)) + geom_point()
dev.off()
# Save the subset
write.csv(subset_data, "recent-data.csv", row.names = FALSE)
print("Analysis complete!")Verification After Saving
Always verify your output:
# Read it back in
check_data <- read.csv("cleaned-data/gapminder-aus.csv")
# Verify structure
str(check_data)
head(check_data)
# Compare with original (if needed)
nrow(check_data) # Should match what we wroteCommon Mistakes
- Forgetting to call
dev.off()after plotting - Including row numbers when you don’t want them
- Using wrong separators (comma vs tab)
- Not creating output directories first
- Forgetting quotes around filenames
- Not verifying the output
File Naming Conventions
Good filenames:
gapminder-cleaned.csvresults_2026-01-30.csvAfrica_subset_1990-2007.csv
Bad filenames:
data.csv(too generic)my file.csv(spaces problematic)FINAL_FINAL_v3.csv(version chaos)
Be descriptive and version-controlled!
Performance Considerations
Writing large data frames:
# For large files, specify col.names and row.names explicitly
# It's faster than letting R figure it out
write.csv(large_df, "output.csv",
row.names = FALSE,
col.names = TRUE)Can also use data.table::fwrite() for even faster writing.
Key Points
- Save plots using
pdf(),png(), orjpeg()devices - Always call
dev.off()after saving plots - Use
write.table()orwrite.csv()to save data - Set
row.names = FALSEandquote = FALSEfor clean CSV output - Create output directories before writing files
Key Points (Continued)
- Use descriptive filenames for clarity
- Verify output after saving
- Combine data cleaning and output in scripts
- Document transformations with comments
- Use version control for reproducibility
Organizing Your Project
Typical structure:
project/
├── data/ # Raw data
│ └── gapminder_data.csv
├── cleaned-data/ # Output data
│ └── gapminder-clean.csv
├── figures/ # Output plots
│ └── plot.pdf
├── scripts/ # Analysis scripts
│ ├── functions.R
│ └── data-cleaning.R
└── README.md # Documentation
Resources
- Software Carpentry: r-novice-gapminder
- R Documentation:
?write.table,?write.csv,?pdf - ggplot2 Documentation:
?ggsave - Project organization best practices