Writing Data in R

Adapted from Software Carpentry

Overview

Today we will learn to:

Save plots from R to files
Control plot size and resolution
Write data frames to CSV and other formats
Handle formatting options for data output
Create reproducible data-cleaning workflows

Questions

How can I save plots and data created in R?
What formats can I export to?
How do I control the output format?

Why Write Data and Plots?

Share results with colleagues
Archive your work
Create reports with multiple plots
Automate workflows without manual clicking
Batch process multiple files

Writing output is essential for reproducible research!

Saving Plots: Quick Method

The easiest way - use ggsave() from ggplot2:

ggsave("My_most_recent_plot.pdf")

Saves your most recent ggplot to a file.

Saving Plots: RStudio Export Button

Within RStudio:

Go to the Plot window
Click Export
Choose format: PDF, PNG, JPG, etc.
Set size and resolution
Click Save

Quick and interactive!

When to Use Manual Methods

The Export button works great when:

You want a quick save
Making one or two plots
Interactively exploring

But… what if you need to:

Create a PDF with multiple pages?
Loop through subsets of data?
Generate many plots automatically?

You need programmatic control!

Using the pdf() Device

Create a PDF file programmatically:

pdf("Life_Exp_vs_time.pdf", width=12, height=4)

# Your plotting code here
ggplot(data=gapminder, aes(x=year, y=lifeExp, 
                           colour=country)) +
  geom_line() +
  theme(legend.position = "none")

# Important: Turn off the PDF device!
dev.off()

Understanding Devices

In R, a “device” is where your plot goes:

Screen device: Default (shows in Plot pane)
PDF device: Writes to PDF file
PNG device: Writes to PNG file
JPEG device: Writes to JPEG file
Other devices: SVG, BMP, TIFF, etc.

The pdf() Function

Parameters for control:

pdf("filename.pdf", 
    width = 12,      # Width in inches
    height = 4,      # Height in inches
    pointsize = 12,  # Font size
    family = "sans") # Font family

Creating Multi-Page PDFs

Each plotting command creates a new page:

pdf("multi_page.pdf", width=10, height=6)

# Page 1
plot(x = 1:10, y = 1:10)

# Page 2
plot(x = 1:10, y = (1:10)^2)

# Page 3
plot(x = 1:10, y = sqrt(1:10))

dev.off()  # Close the device

Each call to plot() adds a new page!

Important: dev.off()

Always remember to call dev.off()!

Closes the output device
Finalizes the file
Frees up the device
File won’t be complete without it

If you forget, the file is incomplete or corrupted!

Other Image Formats

Use similar functions for different formats:

png("plot.png", width=800, height=600)
# Your plot code
dev.off()

jpeg("plot.jpg", width=800, height=600)
# Your plot code
dev.off()

svg("plot.svg", width=10, height=8)
# Your plot code
dev.off()

Same structure, different output formats!

PNG and JPEG Parameters

For bitmap formats, use pixels for size:

png("plot.png", width=800, height=600, res=72)
# res = resolution in DPI (dots per inch)

jpeg("plot.jpg", width=800, height=600, quality=90)
# quality = 0-100 (higher = better quality, larger file)

Challenge 1

Write a command to create a multi-page PDF showing:

Life expectancy vs. time (line plot) - first page
Same data with facets by continent (facet_grid) - second page

Make the PDF 12 inches wide and 4 inches tall.

Challenge 1 Solution

pdf("Life_Exp_vs_time.pdf", width=12, height=4)

# Page 1: Line plot
ggplot(data=gapminder, aes(x=year, y=lifeExp, 
                           colour=country)) +
  geom_line() +
  theme(legend.position = "none")

# Page 2: Faceted plot
ggplot(data=gapminder, aes(x=year, y=lifeExp, 
                           colour=country)) +
  geom_line() +
  facet_grid(~continent) +
  theme(legend.position = "none")

dev.off()

Writing Data: Introduction

Now let’s save data (not plots)!

Use the write.table() function:

aust_subset <- gapminder[gapminder$country == "Australia", ]

write.table(aust_subset,
  file = "cleaned-data/gapminder-aus.csv",
  sep = ","
)

This saves a data frame to a file.

First Look at Output

Let’s check what was written:

head cleaned-data/gapminder-aus.csv

Output:

"country","year","pop","continent","lifeExp","gdpPercap"
"61","Australia",1952,8691212,"Oceania",69.12,10039.59564
"62","Australia",1957,9712569,"Oceania",70.33,10949.64959

Problem: Lots of unwanted quotation marks and row numbers!

write.table() Default Behavior

By default, write.table():

Wraps character vectors in quotes
Includes row and column names
Includes row numbers
Uses spaces as separator

This is usually NOT what we want for CSV files!

Fixing the Output

Add parameters to control formatting:

write.table(
  gapminder[gapminder$country == "Australia", ],
  file = "cleaned-data/gapminder-aus.csv",
  sep = ",",
  quote = FALSE,      # Don't quote strings
  row.names = FALSE   # Don't include row numbers
)

Checking the Fixed Output

head cleaned-data/gapminder-aus.csv

Output:

country,year,pop,continent,lifeExp,gdpPercap
Australia,1952,8691212,Oceania,69.12,10039.59564
Australia,1957,9712569,Oceania,70.33,10949.64959
Australia,1962,10794968,Oceania,70.93,12217.22686
Australia,1967,11872264,Oceania,71.1,14526.12465

Perfect! Clean and readable CSV!

write.table() Parameters

Key parameters:

file: Output filename
sep: Column separator ("," for CSV)
quote: Wrap strings in quotes? (default TRUE)
row.names: Include row numbers? (default TRUE)
col.names: Include column names? (default TRUE)
na: How to represent NA values (default “NA”)

Alternative: write.csv()

Use write.csv() as a shortcut:

write.csv(aust_subset,
  file = "cleaned-data/gapminder-aus.csv",
  row.names = FALSE
)

write.csv() automatically sets:

sep = ","
quote = TRUE (includes headers and factors)

Choosing Between write.table() and write.csv()

Use write.table() when:

You need custom separators
You want quote = FALSE
Working with tab-delimited files
Want precise control

Use write.csv() when:

Creating standard CSV files
Want quotes around text
Don’t need to customize separator

Challenge 2

Create a data-cleaning script that:

Loads the gapminder data
Subsets it to include only data since 1990
Writes it to cleaned-data/gapminder-1990-onwards.csv

Name the script data-cleaning.R and place it in the root directory.

Challenge 2 Solution

Create data-cleaning.R:

# Load the data
gapminder <- read.csv(
  "data/gapminder_data.csv"
)

# Subset data from 1990 onwards
gapminder_recent <- gapminder[gapminder$year >= 1990, ]

# Write to file
write.csv(gapminder_recent,
  file = "cleaned-data/gapminder-1990-onwards.csv",
  row.names = FALSE
)

print("Data cleaning complete!")

Run it with: source("data-cleaning.R")

Workflow: From Raw to Clean Data

Typical data processing workflow:

Load raw data with read.csv()
Explore structure with str(), head()
Clean by subsetting, transforming
Check results with summary(), tail()
Write cleaned data with write.csv()

Why This Workflow?

This approach ensures:

Reproducibility: Script documents all steps
Traceability: Can see what changed
Reusability: Run again with new data
Sharing: Others can repeat your process
Auditability: Archive of transformations

Data Output Formats

Beyond CSV, R can write:

# Tab-separated values
write.table(df, "file.tsv", sep = "\t", row.names = FALSE)

# Fixed-width format
write.fwf(df, "file.txt")

# Other formats (with packages)
library(readxl)
writexl::write_xlsx(df, "file.xlsx")

library(haven)
write_sas(df, "file.sas7bdat")

Creating Output Directories

Before writing files, ensure directory exists:

# Check if directory exists
if (!dir.exists("cleaned-data")) {
  dir.create("cleaned-data")
}

# Then write your file
write.csv(df, "cleaned-data/output.csv")

Prevents “directory not found” errors!

Best Practices for Output

Use descriptive filenames: gapminder-1990-onwards.csv
Include dates: results-2026-01-30.csv
Document transformations: Comment your code
Save intermediate steps: For debugging
Use consistent separators: CSV or TSV
Turn off row numbers: Unless needed
Quote only when needed: Smaller files

Combining Saving Plots and Data

Complete workflow:

# Subset data
subset_data <- gapminder[gapminder$year >= 1990, ]

# Create and save plot
pdf("plot_recent.pdf")
ggplot(subset_data, aes(x=year, y=lifeExp)) + geom_point()
dev.off()

# Save the subset
write.csv(subset_data, "recent-data.csv", row.names = FALSE)

print("Analysis complete!")

Verification After Saving

Always verify your output:

# Read it back in
check_data <- read.csv("cleaned-data/gapminder-aus.csv")

# Verify structure
str(check_data)
head(check_data)

# Compare with original (if needed)
nrow(check_data)  # Should match what we wrote

Common Mistakes

Forgetting to call dev.off() after plotting
Including row numbers when you don’t want them
Using wrong separators (comma vs tab)
Not creating output directories first
Forgetting quotes around filenames
Not verifying the output

File Naming Conventions

Good filenames:

gapminder-cleaned.csv
results_2026-01-30.csv
Africa_subset_1990-2007.csv

Bad filenames:

data.csv (too generic)
my file.csv (spaces problematic)
FINAL_FINAL_v3.csv (version chaos)

Be descriptive and version-controlled!

Performance Considerations

Writing large data frames:

# For large files, specify col.names and row.names explicitly
# It's faster than letting R figure it out

write.csv(large_df, "output.csv", 
          row.names = FALSE,
          col.names = TRUE)

Can also use data.table::fwrite() for even faster writing.

Key Points

Save plots using pdf(), png(), or jpeg() devices
Always call dev.off() after saving plots
Use write.table() or write.csv() to save data
Set row.names = FALSE and quote = FALSE for clean CSV output
Create output directories before writing files

Key Points (Continued)

Use descriptive filenames for clarity
Verify output after saving
Combine data cleaning and output in scripts
Document transformations with comments
Use version control for reproducibility

Organizing Your Project

Typical structure:

project/
├── data/                    # Raw data
│   └── gapminder_data.csv
├── cleaned-data/            # Output data
│   └── gapminder-clean.csv
├── figures/                 # Output plots
│   └── plot.pdf
├── scripts/                 # Analysis scripts
│   ├── functions.R
│   └── data-cleaning.R
└── README.md               # Documentation

Resources

Software Carpentry: r-novice-gapminder
R Documentation: ?write.table, ?write.csv, ?pdf
ggplot2 Documentation: ?ggsave
Project organization best practices