Statistics with R
- Statistics with R
- R Objects, Numbers, Attributes, Vectors, Coercion
- Matrices, Lists, Factors
- Data Frames in R
- Control Structures in R
- Functions in R
- Data Basics: Compute Summary Statistics in R
- Central Tendency and Spread in R Programming
- Data Basics: Plotting – Charts and Graphs
- Normal Distribution in R
- Skewness of statistical data
- Bernoulli Distribution in R
- Binomial Distribution in R Programming
- Compute Randomly Drawn Negative Binomial Density in R Programming
- Poisson Functions in R Programming
- How to Use the Multinomial Distribution in R
- Beta Distribution in R
- Chi-Square Distribution in R
- Exponential Distribution in R Programming
- Log Normal Distribution in R
- Continuous Uniform Distribution in R
- Understanding the t-distribution in R
- Gamma Distribution in R Programming
- How to Calculate Conditional Probability in R?
- How to Plot a Weibull Distribution in R
- Hypothesis Testing in R Programming
- T-Test in R Programming
- Type I Error in R
- Type II Error in R
- Confidence Intervals in R
- Covariance and Correlation in R
- Covariance Matrix in R
- Pearson Correlation in R
- Normal Probability Plot in R
Data Frames in R
In R, a data frame is a two-dimensional table-like data structure that allows you to store and manipulate data in rows and columns. Each column of a data frame can contain data of a different type (numeric, character, factor, etc.) but all columns must have the same length. Data frames are one of the most commonly used data structures in R because they allow you to work with datasets that have different types of data.
Data frames can be created using the data.frame() function in R. You can also read in data from external files, such as CSV or Excel files, and convert them into data frames using functions like read.csv() or read_excel() from the tidyverse package. Once you have a data frame, you can manipulate and analyze the data using functions like subset(), aggregate(), and merge().
One of the benefits of working with data frames in R is that they are compatible with many statistical and visualization packages. This makes it easy to perform data analysis and create visualizations of your data.
Data frames can be created explicitly with the data.frame() function.
employee <- c('Ram','Sham','Jadu') salary <- c(21000, 23400, 26800) startdate <- as.Date(c('2016-11-1','2015-3-25','2017-3-14')) employ_data <- data.frame(employee, salary, startdate) employ_data View(employ_data)
Output:
> employ_data employee salary startdate 1 Ram 21000 2016-11-01 2 Sham 23400 2015-03-25 3 Jadu 26800 2017-03-14 > View(employ_data)
If you look at the structure of the data frame now, you see that the variable employee is a character vector, as shown in the following output:
str(employ_data)
Output:
> str(employ_data) 'data.frame': 3 obs. of 3 variables: $ employee : Factor w/ 3 levels "Jadu","Ram","Sham": 2 3 1 $ salary : num 21000 23400 26800 $ startdate: Date, format: "2016-11-01" "2015-03-25" "2017-03-14"
You can try some other functions like dim() to see the dimension. nrow() and ncol() function will help you to know number of row and column in the data frame.
dim(employ_data) nrow(employ_data) ncol(employ_data)
Output:
> dim(employ_data) [1] 3 3 > nrow(employ_data) [1] 3 > ncol(employ_data) [1] 3
Example of Data Frames in R
Creating a data frame manually:
# Create a data frame with three columns df <- data.frame( name = c("Alice", "Bob", "Charlie"), age = c(25, 30, 35), city = c("New York", "Los Angeles", "Chicago") ) # Print the data frame df
Output:
name age city 1 Alice 25 New York 2 Bob 30 Los Angeles 3 Charlie 35 Chicago
Reading in a CSV file and converting it to a data frame:
# Read in a CSV file mydata <- read.csv("mydata.csv") # Convert the data to a data frame df <- data.frame(mydata) # Print the data frame df
Output:
id name age gender 1 1 John 23 M 2 2 Sarah 28 F 3 3 Michael 35 M 4 4 Lauren 19 F
Subsetting a data frame:
# Subset the data frame to include only rows where age is greater than 25 subset_df <- subset(df, age > 25) # Print the subsetted data frame subset_df
Output:
id name age gender 2 2 Sarah 28 F 3 3 Michael 35 M