Reading and Writing Data to and from R

Functions for Reading Data into R:

There are a few very useful functions for reading data into R.

  1.  read.table() and  read.csv() are two popular functions used for reading tabular data into R.
  2.  readLines() is used for reading lines from a text file.
  3.  source() is a very useful function for reading in R code files from a another R program.
  4. dget() function is also used for reading in R code files.
  5. load() function is used for reading in saved workspaces
  6. unserialize() function is used for reading single R objects in binary format.

Functions for Writing Data to Files:

There are similar functions for writing data to files

  1. write.table() is used for writing tabular data to text files (i.e. CSV).
  2.  writeLines() function is useful for writing character data line-by-line to a file or connection.
  3. dump() is a function for dumping a textual representation of multiple R objects.
  4.  dput() function is used for outputting a textual representation of an R object.
  5. save() is useful for saving an arbitrary number of R objects in binary format  to a file.
  6.  serialize() is used for converting an R object into a binary format for outputting to a connection (or
    file).

Reading Data Files with read.table():

The read.table() function is one of the most commonly used functions for reading data in R. TO get the help file for read.table() just type ?read.table in R console.

The read.table() function has a few important arguments:

  • file, the name of a file, or a connection
  • header, logical indicating if the file has a header line
  • sep, a string indicating how the columns are separated
  •  colClasses, a character vector indicating the class of each column in the dataset
  •  nrows, the number of rows in the dataset. By default read.table() reads an entire file.
  • comment.char, a character string indicating the comment character. This defalts to “#”. If there are no commented lines in your file, it’s worth setting this to be the empty string “”.
  •  skip, the number of lines to skip from the beginning
  •  stringsAsFactors, should character variables be coded as factors? This defaults to TRUE because back in the old days, if you had data that were stored as strings, it was because those strings represented levels of a categorical variable. Now we have lots of data that is text data and they don’t always represent categorical variables. So you may want to set this to be FALSE in those cases. If you always want this to be FALSE, you can set a global option via options(stringsAsFactors = FALSE). I’ve never seen so much heat generated on discussion forums about an R function argument than the stringsAsFactors argument.

Check the following example how to work with read.table() in r. For this example a data set called wine data set will be used. You can download the data set by clicking here. The data set was originally taken from UCI Repository. You can get more details about the data set from here.

Download the Wine Data set

w<-read.table("http://makemeanalyst.com/wp-content/uploads/2017/05/wine.txt",sep=",",header = TRUE)
head(w)
View(w)

To know more about read.table() function click here. 

Writing Data Files with write.table():

To write a R object into a file check the following code.

 write.table(w,"E:/MakeMeAnalyst/wine.txt")  #Give your own path here.

To learn more about data output using write.table() click here.

readLines() and writeLines() function in R:

readLines() function is mainly used for reading lines from a text file and writeLines() function is useful for writing character data line-by-line to a file or connection. Check the following example to deal with readLines() and writeLines(). First, download the sample text from here and then read it into R.

Download the Sample Text

con <- file("http://makemeanalyst.com/wp-content/uploads/2017/05/Sample.txt", "r")
w<-readLines(con)
close(con)
w[1]
w[2]
w[3]

Output:

> w[1]
[1] "This is a sample text file."
> w[2]
[1] "Read this file using readLines() function."
> w[3]
[1] "And you can wrtie a file using writeLines() function."

You can also write contents into a file using writeLines() function in R. Following example shows how to do that.

sample<-c("Class,Alcohol,Malic acid,Ash","1,14.23,1.71,2.43","1,13.2,1.78,2.14")
writeLines(sample,"F://sample.csv")

You can write them into tsv file also using below code.

sample<-c("Class,Alcohol,Malic acid,Ash","1,14.23,1.71,2.43","1,13.2,1.78,2.14")
t<- gsub(",", "\t", sample)
writeLines(t, "F://Sample.tsv")

dput()  and dget() Function in R:

You can create a more descriptive representation of an R object by using the dput() or dump() functions. Unlike writing out a table or CSV file, dump() and dput() preserve the metadata, so that another user doesn’t have to specify it all over again. For example, we can preserve the class of each column of a table or the levels of a factor variable.

# Create a data frame
x <- data.frame(Name = "Mr. A", Gender = "Male", Age=35)
#Print 'dput' output to your R console
dput(x)
#Write the 'dput' output to a file
dput(x, file = "F://w.R")
# Now read in 'dput' output from the file
y <- dget("F:/w.R")
y

dump() Function in R:

You can dump() R objects to a file by passing its names.

x<-1:10
d <- data.frame(Name = "Mr. A", Gender = "Male", Age=35)
dump(c("x", "d"), file = "F://dump_data.R")

rm(x, d) #After dumping just remove the variables from environment.

source() Function in R:

The inverse of dump() is source() function. Now you can import that dump_data.R into R using following code.

source("F://dump_data.R")
x
d
str(d)

Output:

> x
[1] 1 2 3 4 5 6 7 8 9 10
> d
Name Gender Age
1 Mr. A Male 35
> str(d)
'data.frame': 1 obs. of 3 variables:
$ Name : Factor w/ 1 level "Mr. A": 1
$ Gender: Factor w/ 1 level "Male": 1
$ Mobile: num 35

Binary Formats in R:

The complement to the textual format is the binary format. Binary format is sometimes useful for efficiency purposes. Sometimes, it may happen that there is no useful way to represent your data in a textual manner then binary format helps to import and export data i R. The main functions for converting R objects into a binary format are save(), save.image(), and serialize(). Individual R objects can be saved to a file using the save() function.

x <- data.frame(col1 = rep(10,10), col2 = runif(10,min=0,max=10))
y<-rnorm(10)
z<-100:110
#Save 'x', 'y' and 'z' to a file
save(x,y,z,file="F:/testdata.rda")
#OR
save(x,y,z,file="F:/testdata.rData")
#Load 'x', 'y' and 'z' into your workspace
load("F:/testdata.rda")
#OR
load("F:/testdata.rData")

If you have a lot of objects that you want to save to a file in one run, you can save all objects in your workspace using the save.image() function.


# Save everything to a file
save.image(file = "F://mydata.RData")
#load all objects in this file
load("F://mydata.RData")

serialize()  and unserialize() function in R:

The serialize() function is used to convert individual R objects into a binary format that can be communicated across an arbitrary connection. When you call serialize() on an R object, the output will be a raw vector coded in hexadecimal format. The benefit of the serialize() function is that it is the only way to perfectly represent an R object in an exportable format, without losing precision or any metadata. If that is what you need, then serialize() is the function for you.

x<-list(1,2,3)
s<-serialize(x, NULL)
s
save(s,file="F:/test_serialization.rda")
load("F:/test_serialization.rda")
unserialize(s)

Read more about Simple Serialization Interface from here.

saveRDS() and readRDS() in R:

Now you are familiar with save() and load() function in R. They allow you to save a named R object to a file or other connection and restore that object again. When loaded the named object is restored to the current environment  with the same name it had when saved. This is annoying for example when you have a saved model object resulting from a previous fit and you want to compare it with the model object returned when the R code is rerun. Unless you change the name of the model fit object in your script you can’t have both the saved object and the newly created one available in the same environment at the same time. saveRDS() provides a far better solution to this problem and to the general one of saving and loading objects created with R. saveRDS() serializes an R object into a format that can be saved.
save() does the same thing, but with one important difference; saveRDS() doesn’t save the both the object and its name it just saves a representation of the object. As a result, the saved object can be loaded into a named object within R that is different from the name it had when originally serialized. The main difference is that save() can save many objects to a file in a single call, whilst saveRDS(), being a lower-level function, works with a single object at a time.

# save a single object to file
women
saveRDS(women, "F://women.rds")
# restore it under a different name
women2 <- readRDS("F://women.rds")
identical(women, women2)

Output:

> women
height weight
1 58 115
2 59 117
3 60 120
4 61 123
5 62 126
6 63 129
7 64 132
8 65 135
9 66 139
10 67 142
11 68 146
12 69 150
13 70 154
14 71 159
15 72 164

identical(women, women2)
[1] TRUE

Read more about Serialization Interface for Single Objects from here.

Data Summary

CSV Files in R