Data structures in R – Part 2

R has wide options for holding data, such as scalars, vectors, matrices, arrays, data frames, and lists. In Data structures in R – Part 1 we have seen scalars, vectors, matrices, arrays. Now let’s see data frames and lists.

Data frames

A data frame is more is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.  Here, different columns can contain different modes of data (numeric, character, etc.). It’s similar to the datasets that we see in IBM SPSS, SAS and Stata. Data frames are the most common data structure that is used within R.

Characteristics of a data frame

  1. Column names should not be empty.
  2. Row names should be unique.
  3. Data stored in a data frame can be of numeric, factor or character type.
  4. Each column should contain the same number of data items.

Let’s see some example,

> marklist<-data.frame( + rollno = c(1001:1006), + name = c("Abdul","Balu","Charlie","Daniel","Elisa","Fathima"), + marks = c(87,91,66,57,83,72) + ) > marklist
  rollno    name marks
1   1001   Abdul    87
2   1002    Balu    91
3   1003 Charlie    66
4   1004  Daniel    57
5   1005   Elisa    83
6   1006 Fathima    72

In the above example, you can observe that each column must have only one data type but you can have different columns inside the data frame with the different data type.

We can subscript data frame like the way we subscript matrices. Let’s see this with an example with the above used marklist dataset.

> marklist[1,3]
[1] 87
> marklist[1:3]
  rollno    name marks
1   1001   Abdul    87
2   1002    Balu    91
3   1003 Charlie    66
4   1004  Daniel    57
5   1005   Elisa    83
6   1006 Fathima    72
> marklist[c(1,3)]
  rollno marks
1   1001    87
2   1002    91
3   1003    66
4   1004    57
5   1005    83
6   1006    72
> marklist[c("rollno","marks")]
  rollno marks
1   1001    87
2   1002    91
3   1003    66
4   1004    57
5   1005    83
6   1006    72
> marklist$name
[1] "Abdul"   "Balu"    "Charlie" "Daniel" 
[5] "Elisa"   "Fathima"

Factors

Factors are used to categorize the data and store it as levels. They can store both strings and integers. This is useful in the columns which have a limited number of unique values. For example, Male, Female, Neutral and True, False etc. They are useful in data analysis for statistical modelling. Factors are created using the factor () function by taking a vector as input.

We will see more about factors practically when we discuss about statistical methods.

Previous articleData structures in R – Part 1
Next articleUseful functions for working with data objects in R
A.Sulthan, Ph.D.,
Author and Assistant Professor in Finance, Ardent fan of Arsenal FC. Always believe "The only good is knowledge and the only evil is ignorance - Socrates"
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments