Part 2: Types of Objects in R

1. Overview: Types of Objects

The main purpose of R is to manipulate ‘objects’ to accomplish tasks. Your goal is to assign objects and then use functions to manipulate them.
There are many types (or classes) of objects. Many functions are specifically tailored to deal with specific types of objects. Therefore, it is critical that you understand the distinctions between different types of objects, and how to best make use of each. Some packages generate special types of objects, which can then be manipulated or analyzed in special ways. Here, we will cover some of the most common types of objects you will encounter.

Object Type	Detail
Numeric	Numbers
Character	Text
Factor	A set of characters with finite levels
Logical	TRUE or FALSE
Date	Dates and times can take on special formats
Vector	A variable with multiple values of the same type (i.e., numeric, character, factor, logical, etc.)
Matrix	A two-dimensional array of numbers
Array	A set of numbers arranged in any number of dimensions. For example, you can have a three-dimentional array, which is essentially a stack of matrices.
Data frame	A two-dimensional object with each column consisting of a numerica vector or character string. What you typically thing of as a spreadsheet.
List	A bundle of any set of components. Each element in a list can be whatever object. Once you get used to them, lists are very useful.

Other types of objects

Aside from these common types of objects, there are all sorts of other specialized objects that are outputs of specific functions. For example, the output of a specific statistical analysis (say, linear models, using the function lm()). But at the end of the day, even these are typically customized lists composed of the objects described above

2. Vectors

Vectors are essentially a one-dimensional set of elements. The elements can be numbers (numeric vectors), characters, etc.

2.1 Vectors of different types

Let’s try making a numeric vector using a function called c() (for ‘combine’):

v=c(4,3,5,3,2,3,1)
v

## [1] 4 3 5 3 2 3 1

Objects can also be text. Text objects are called character strings. In R, all text needs to be contained within quotes (single or double quotes are allowed). Otherwise, it will just try to give you an object with that name.

We can combine multiple character strings into a vector. Each element can be a single letter, word, phrase, or entire sentences.

chars=c("a", "word", "or a phrase")
chars

## [1] "a"           "word"        "or a phrase"

If you try to combine letters and numbers into a single vector, it will turn into a character vector, with numbers treated as text:

numbersletters=c(1,2,3, "one", "two", "three")
numbersletters

## [1] "1"     "2"     "3"     "one"   "two"   "three"

Factors are different from chracters in that they have levels. This will become a bit more important later when we start playing with dataframes.

factors=as.factor(numbersletters) #convert the vector above to factors
factors

## [1] 1     2     3     one   two   three
## Levels: 1 2 3 one three two

Objects can also be logical objects, i.e., TRUE or FALSE. Note all capitals. This class can be really important and useful.

logic=c(TRUE, TRUE, FALSE, FALSE)
logic

## [1]  TRUE  TRUE FALSE FALSE

One cool thing to note is that we can convert logical objects into numerics by adding a number:

logic+0

## [1] 1 1 0 0

You can see that TRUE becomes 1 and FALSE becomes 0

2.2 Vector Functions

You can measure various attributes of this vector. For example, let’s find out how many numbers there are in this vector and add up all of the numbers. Try:

length(v)

## [1] 7

sum(v)

## [1] 21

From this, we can calculate the mean.

sum(v)/length(v)

## [1] 3

Of course, there is a pre-packaged function that calculates the mean of a vector, so this is simpler:

mean(v)

## [1] 3

Here are some more mathematical functions you can try out. Try typing these, and also try looking at the details of the functions using ?’functionname’:

function	meaning
`max()`	maximum value
`min()`	minimum value
`sum()`	sum
`mean()`	average
`median()`	median
`range()`	returns vector of min and max values
`var()`	sample variance

We can manipulate vectors as a whole. for example, let’s multiply the vector by 10.

v*10

## [1] 40 30 50 30 20 30 10

2.3 Indexing: The importance of [ ]

For multi-element objects (i.e., anything that is a combination of numbers, letters, etc.), we can locate specific elements within objects using square brackets []. For example, we can ask what is the 6th number in the numeric vector v, or the second element in the character vector chars from above.

v[6]

## [1] 3

chars[2]

## [1] "word"

3. Matrices

Ok, now let’s try a matrix. This is a two-dimensional set of numbers, so when we create a matrix, we also need to specify the dimensions. Let’s demonstrate the difference beween vectors and matrices:

1:9 #colon create vector of integers

## [1] 1 2 3 4 5 6 7 8 9

vec=1:9
mat=matrix(1:9,nrow=3)

Now look at the objects vec and mat

vec

## [1] 1 2 3 4 5 6 7 8 9

mat

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

Note that R arranges the number series going up to down. This is important to remember when you are creating matrices. You can make R construct matrices by rows (which is more intuitive to me) by:

mat2=matrix(1:9,nrow=3,byrow=TRUE)
mat2

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

Now, try a slight variation:

mat3=matrix(1:10,nrow=2,byrow=TRUE)
rownames(mat3)=c("row1","row2")
colnames(mat3)=c("A","B","C","D","E")
mat3

##      A B C D  E
## row1 1 2 3 4  5
## row2 6 7 8 9 10

You can see that matrices can be “rectangular”, and also you can name the dimensions (rows & columns) of the matrix using rownames() and colnames().

3.1. Indexing with a matrix

Indexing in a matrix requires two values inside the square brackets: [row, column]. You can also use this to look at entire rows or columns. For example:

mat3[2,3] #what is the number in row 2, column 3?

## [1] 8

mat3[2,] #what are the values of row 2?

##  A  B  C  D  E 
##  6  7  8  9 10

mat3[,4] #what are the values of column 4?

## row1 row2 
##    4    9

You can conduct mathematical operations on matrices:

mat3*10 #multiply all values in mat3 by 10

##       A  B  C  D   E
## row1 10 20 30 40  50
## row2 60 70 80 90 100

Arrays
Technically, a matrix is simply a two-dimensional array (and vectors are one-dimensional arrays). More generally, an array can be any number of dimensions. A three-dimensional array would be a stack of matrices, and a four-dimensional arrays would be yet another stack of those… Arrays can be very useful for fast computing, but it can also be very confusing, so I’m going to avoid the issue here. We may come back to the idea of three-dimensional arrays later in the course.

4. Dataframes

For most cases, your data will be organized in the form of a dataframe. A dataframe is an object with rows and columns in which each row represents an observation (sometimes called cases), and each column is a measurement of a variable (sometimes called fields). Whereas the values of a matrix can only be numbers, the values of a variable in a dataframe can be numeric, character,factor, or other formats (e.g., dates, logical variables such as TRUE and FALSE).

Let’s try creating a dataframe by combining a factor (categorical variable) and a numeric vector.

sex=c(rep("M",5), rep("F",5))
size=c(9,8,8,9,7,5,4,4,3,4)
dat=data.frame(sex, size)
dat

##    sex size
## 1    M    9
## 2    M    8
## 3    M    8
## 4    M    9
## 5    M    7
## 6    F    5
## 7    F    4
## 8    F    4
## 9    F    3
## 10   F    4

Notice that the columns already have names. The data.frame function uses the object name as the default column names. However, you can also assign column names using arguments inside the function:

dat=data.frame(Sex=sex, Size=size) #Notice the capitalization
dat

##    Sex Size
## 1    M    9
## 2    M    8
## 3    M    8
## 4    M    9
## 5    M    7
## 6    F    5
## 7    F    4
## 8    F    4
## 9    F    3
## 10   F    4

4.1. Indexing in dataframes

We can refer to each row or columns in the dataframe using square brackets, just as with the other objects we have learned already.

dat[1,] #first row

##   Sex Size
## 1   M    9

dat[,2] #third column

##  [1] 9 8 8 9 7 5 4 4 3 4

You can also get the columns of the dataframe using the $ operator:

dat$Sex

##  [1] "M" "M" "M" "M" "M" "F" "F" "F" "F" "F"

Here, the output shows the “levels” available in this column because it is a factor.

You can find out the type of variable for each column using the function class()

class(dat$Sex)

## [1] "character"

class(dat$Size)

## [1] "numeric"

Two more useful functions: str() gives you the structure of the object, and summary() gives you some basic info on each column.

str(dat)

## 'data.frame':    10 obs. of  2 variables:
##  $ Sex : chr  "M" "M" "M" "M" ...
##  $ Size: num  9 8 8 9 7 5 4 4 3 4

summary(dat)

##      Sex                 Size    
##  Length:10          Min.   :3.0  
##  Class :character   1st Qu.:4.0  
##  Mode  :character   Median :6.0  
##                     Mean   :6.1  
##                     3rd Qu.:8.0  
##                     Max.   :9.0

4.2. Built-in data sets

The base R program comes with a bunch of datasets as part of the program. To load a specific data set, you simply use the function data(). For example, to load the data set called ‘iris’:

data("iris")

Now let’s look at this dataset. Here, I’m going to use the function head(), which will display only the first 6 lines of the dataset:

head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Built-in datasets are often useful for learning how functions work. You will often see examples within help files make use of built-in data sets to demonstrate how something works. You will also see some R packages will include some built-in data sets for this same reason.

5. Lists

A List object is a powerful and flexible tool in R. Dataframes, matrices and array have many constraints – e.g., each row must have the same number of columns. In contrast, you can combine any set of objects together into a list.
As an example, let’s create three vectors that are of different lengths with different types of elements (number, logical, and character).

apples=c(1,2,3,4,5)
oranges=c(TRUE, FALSE)
grapes=c("grape", "Grape", "GRAPE")

We can try to combine these objects into a dataframe, but we won’t be able to because the vectors are different lengths:

data.frame(apples, oranges, grapes)

## Error in data.frame(apples, oranges, grapes): arguments imply differing number of rows: 5, 2, 3

However, we can combine these into a list:

mylist=list(apples, oranges, grapes) 
mylist

## [[1]]
## [1] 1 2 3 4 5
## 
## [[2]]
## [1]  TRUE FALSE
## 
## [[3]]
## [1] "grape" "Grape" "GRAPE"

Lists are structured differently than other objects. In a list, each component or item is indexed using a double bracket [[]]. So the first item in the list (i.e., apples) is:

mylist[[1]]

## [1] 1 2 3 4 5

… and the second element within the third item (i.e., grapes) would be:

mylist[[3]][2]

## [1] "Grape"

You can name the items within a list when creating it, or afterwards:

#These do the same thing
mylist=list(apples=apples, oranges=oranges, grapes=grapes) 
names(mylist)=c("apples", "oranges", "grapes")
mylist

## $apples
## [1] 1 2 3 4 5
## 
## $oranges
## [1]  TRUE FALSE
## 
## $grapes
## [1] "grape" "Grape" "GRAPE"

Once you name the items in a list, you can use the $ operator to call a specific item:

mylist$grapes

## [1] "grape" "Grape" "GRAPE"

You can even combine different dataframes into a list. Let’s do this by loading several built-in data sets and then combining them into a list (output hidden):

data("iris")
data("trees")
data("Loblolly")
mydata=list(iris, trees, Loblolly)
mydata

Lists may not be intuitive to you yet, but you will see how convenient this type of object can be when we get around to more complex tasks such as batch processessing and apply functions.