Import Data into R

R is capable of reading data from most formats, including files created in other statistical packages. Whether the data was prepared using Excel (in CSV, XLSX, or TXT format), SAS, Stata, SPSS, or others, R can read and load the data into memory.

R also has two native data formats—Rdata (sometimes shortened to Rda) and Rds. These formats are used when R objects are saved for later use. Rdata is used to save multiple R objects, while Rds is used to save a single R object. See below for instructions on how to read and load data into R from both file extensions.

Choose File

Using file.choose() will open up the file explorer window on your local machine to select the file you wish to import. For many this will be the easiest option however, requires manual input each time therefore script automation is not possible.

df <- file.choose()

data < read.csv(file=file.choose())

Set Working Directory

Before reading any data, you must set the R working directory to the location of the data. When specifying the pathname, R reads forward slashes, whereas Windows reads backward slashes. Setting the working directory can eliminate path confusion.

setwd() will set the current working directory to a specific location

setwd("C:/") # Windows

setwd("~/") # Mac

getwd() will print out the current directory

setwd("~/Desktop")

getwd()

CSV Files

A CSV is a comma-separated values file, which allows data to be saved in a tabular format. CSVs look like a garden-variety spreadsheet but with a . csv extension. CSV files can be used with most any spreadsheet program, such as Microsoft Excel or Google Spreadsheets.

Download Test CSV File

Read CSV File

data < read.csv(file="data.csv")

Write CSV File

write.csv(data, "save.csv")

Read TXT File

data < read.delim("test.txt", stringsAsFactor=FALSE)

Reading R Data Files

RData Files

Function: load(). Notice that the result of this function is not assigned to an object name. When R calls load(), all of the R objects saved in the file are loaded into R. The names given to these objects when they were originally saved will be given to them when they are loaded. The command > ls() can be used to print out all of the objects currently loaded into R.

load("survey.rdata")

load("survey.rda")

RDS Files

Function: readRDS(). The readRDS function will restore a single R object. In this example, this object was assigned a new name of dataRDS.

dataRDS <- readRDS("survey.rds")

Reading Delimited Data Files

Space-Delimited

Function: read.table()

Common Parameters:

Header: TRUE when first row includes variable names. The default is FALSE.

Sep: A string indicating what is separating the data. The default is " ".

dataSPACE <-read.table("C:/mydata/survey.dat", header=TRUE, sep= " ")

With the working directory set, this is equivalent to:

dataSPACE <-read.table("survey.dat", header=TRUE, sep= " ")

Tab-Delimited

Functions: read.table()Common Parameters:Header: TRUE when first row includes variable names. The default is FALSE.Sep: A string indicating what is separating the data. The default is " ".> dataTAB <-read.table("survey.dat", header=TRUE, sep= "\t")

Comma-Delimited

Function: read.csv()Common Parameters:Header: TRUE when first row includes variable names. The default is FALSE.> dataCOMMA <-read.csv("survey.csv", header=TRUE)

Fixed-Width Formats

Function: read.fwf()Common Parameters:Header: TRUE when first row includes variable names. The default is FALSE.> dataFW <-read.fwf("survey.txt", header=TRUE)

Reading SPSS, Stata, and SAS Data Files

The “foreign” R package can be used to read data stored as SPSS SAV files, Stata DTA files, or SAS XPORT libraries. If foreign is not already installed on your local computer, you can install it and load it into R with:> install.packages(“foreign”)> library(foreign) SPSSFunction: read.spss()Common Parameters:to.data.frame: TRUE if R should treat loaded data as a data frame. The default is FALSE.use.value.labels: TRUE if R should convert variables with value labels into R factors with those levels. The default is TRUE.> dataSPSS <- read.spss("C:\mydata/survey.save", to.data.frame=TRUE)R assumes that any value labels recorded in the SPSS file refer to factors (R’s version of a categorical variables) and stores the labels rather than the original numbers. For example, a variable named "gender" may be coded 0=male and 1=female and the labels are saved in the SAV file. When R reads in the data from SPSS, the values of the variable will be "male" and "female" instead of "0" and "1". This is the default behavior, but it can be changed in the call to the read.spss function as:> dataSPSS <- read.spss(file.choose(), use.value.labels=FALSE) STATAFunction: read.data()Common Parameters:convert.dates: Convert STATA dates to Date class. The default is TRUE.convert.factors: TRUE to convert value labels into factors. The default is TRUE.> dataStata <- read.dta("survey.dta")The created object is automatically a data frame. The default converts value labels into factor levels. This can be turned off by using the following:> dataStata <-read.dta("survey.dta", convert.factors=FALSE)NOTE: STATA sometimes changes how it stores data files from one version to the next and the foreign package may lag behind. If the read.dta command returns an error, then save the data in STATA using the SAVEOLD command. This creates a DTA file saved in a previous version of STATA that read.dta may be more likely to recognize. SASFunction: read.xport()> dataSAS <- read.xport("C:/mydata/survey")The function returns a data frame if there is a single dataset in the library or a list of data frames if there are multiple datasets. Reading Excel Data Files (XLSX or XLS)EXCELFunction: read_excel()Common Parameters:Sheet: The name of the sheet or its location number.It may be easier to use Excel to save individual sheets as CSV files and then read the CSV files into R. However, reading the XLSX and XLS extensions is possible in R:> install.packages(“readxl”) > library(readxl) > dataEXCEL <- read_excel(“survey.xlsx”, sheet = 1) > dataEXCEL <- read_excel(“survey.xlsx”, sheet = “sheetname”)This creates an R tibble (the newer version of an R dataframe). If you are more comfortable with R dataframes, please use:> dfEXCEL <- as.data.frame(dataEXCEL)

Spreadsheets

.CSV File

.XLS/.XLMS File

.ODS File

Documents

.TXT File

Sabalico Logo
Sabalytics Logo
Senty Logo
SEO Guide Logo
World Map Logo
rStatistics Logo
Day Map Logo
Time Zone Logo
Galaxy View Logo
Periodic Table Logo
My Location Logo
Weather Track Logo
Sprite Sheet Logo
Barcode Generator Logo
Test Speed Logo
Website Tools Logo
Image Tools Logo
Color Tools Logo
Text Tools Logo
Finance Tools Logo
File Tools Logo
Data Tools Logo
History of Humanity - History Archive Logo
History of Humanity - History Mysteries Logo
History of Humanity - Ancient Mesopotamia Logo
History of Humanity - Persian Empire Logo
History of Humanity - Alexander the Great Logo
History of Humanity - Roman History Logo
History of Humanity - Punic Wars Logo
History of Humanity - Golden Age of Piracy Logo
History of Humanity - Revolutionary War Logo