What is Data frame?
A data frame is more general than a matrix. Here different columns can contain different modes of data (numeric, character, etc.). It’s similar to the datasets you typically see and use in statistical packages like IBM SPSS, SAS, and Stata. Data frames are the most common data structure that we deal in R. A data frame is created with the function
data.frame(). The general format is
mydata <- data.frame(col1, col2, col3,…)
col1, col2, col3, … are column vectors of any type (such as character, numeric, or logical). Names for each column can be provided with the
names function. The following example will help you understand better.
Let us create students database
> rollno<- c(1201,1202,1203,1204,1205) > age <- c(19,20,22,19,20) > results <- c("Pass","Pass","Fail","Pass","Fail") > gender <- c("Male","Female","Male","Male","Female") > students <- data.frame(rollno,age,gender,results) > students rollno age gender results 1 1201 19 Male Pass 2 1202 20 Female Pass 3 1203 22 Male Fail 4 1204 19 Male Pass 5 1205 20 Female Fail
Points to remember, each column must have only one data type, but you can put columns of different data type together to form the data frame. Because data frames are close to what analysts typically think of as datasets, we’ll use the terms columns and variables interchangeably when discussing data frames. There are several ways to identify the elements of a data frame.
You can use the subscript notation or you can specify column names. Using the
students data frame in the above example, the following illustration will help in understanding.
Three ways to Subset a data frame in R Examples
In this example lets subset, Rollno and Results column from above data frame.
> students[c(1,4)] rollno results 1 1201 Pass 2 1202 Pass 3 1203 Fail 4 1204 Pass 5 1205 Fail > students[c("rollno","results")] rollno results 1 1201 Pass 2 1202 Pass 3 1203 Fail 4 1204 Pass 5 1205 Fail > students$age  19 20 22 19 20
$ notation in the third example is used to indicate a particular variable from a given data frame. For example, if you want to cross-tabulate students gender by results, you could use the following code
> table(students$gender,students$results) Fail Pass Female 1 1 Male 1 2
You may get tired typing
students$ at the beginning of every variable name. So, R has shortcuts for that too. You can use either the
with() functions to simplify your code.