So, you want to learn R. If you are in this class, you probably want to learn R to do some SCIENCE. Typically, students that take this class have spent a long time learning biology, and how to do biological research, and it is now time to learn how to use R because you have to make plots, or because you have to do some statistical analyses.
But R is even MORE than a software for graphics or stats (or seemingly endless set of software packages)–it is A LANGUAGE. And like learning any language, learning R (or any other programming language) will open you up for more opportunities than you might realize. It will increase your research capacity. It will also help you think more clearly about data. Also, like learning any language, you can’t just learn it by reading or hearing a lecture–you have to USE IT. During this semester, you should try to use it everyday.
This will be a journey, but it will be a journey that is well worth it!
On a more practical note, becoming good at programming as part of science is a marketable skill. You can become a data scientist that actually understands data, their pitfalls and their potential. We can do this!
R is a language that allows you to do data manipulation, conduct any data analysis you can think of, produce beautiful graphs, put together and run simple models, simulations, randomizations… you name it.
In this class, we will be using an open-source software called RStudio. RStudio is an IDE (Integrated Development Environment)–a fancy word for software that organizes windows and provides a layout that helps make programming easier. Strictly speaking, you don’t really need RStudio or any other IDE. If you prefer, you can simply open the R program and use the R console and editor as separate windows. However, there are some benefits to using RStudio.
The main benefit to RStudio for this class is that it makes
R look the same across platforms. So it should make it easier for me to
communicate efficiently with Mac OS and Windows users. Another benefit
is access to other tools such as Rmarkdown, which we will learn
to use later for generating reports.
First, open up the R Studio program. You will get a window with 3
panels. Click on the little icon at the top left that looks like this:
Now you will have 4 panels.
These are the 4 panels you will have:
You can set the panels up however you like by going to
[Preferences]
–[Pane Layout]
. For this class, I
recommend keeping the pane layout the same as mine so you don’t get
confused.
Let’s start with something simple. Try typing the code that is shown
in the shaded area into the Console (bottom left panel) and press
[return]
5*2
You should see an output like this:
## [1] 10
Note
Here and throughout this course I will present code in the shaded box. This can be typed into the Console, or as you will see in next, you can copy and paste into the Script Editor. The output of codes, if shown, will be displayed below with hashtags (##) in front.
Back to the R language: Just performing calculations isn’t
that useful–you could just use a calculator.
R is called an object-oriented language. What this means is
that we can assign almost anything (numbers, text, matrices, data,
functions, etc.) into an entity called object, and then we can
combine these objects to do tasks. Try typing this into the R Console
(bottom left)
a = 5*2
You will note that there is no output after typing this in. R simply
registered the fact that you have assigned the output of the equation
5*2
into an object called a
.
You can now display the object by simply typing a
a
## [1] 10
Note that you will also see whatever objects you create in the “environment” window (top right panel).
Objects are the building blocks of tasks you will perform in R, and thus assigning and manipulating objects is the essence of the R language. Here, we have used an extremely simply example of an object–a number, or numeric in R lingo. You will see later that objects can be almost anything–a set of numbers, characters, matrices, datasets, lists, outputs of statistical analyses, and any number of special formats. You will soon see that this simple concept can be scaled up to accomplish very complex tasks efficiently.
Some things to know:
>
is the prompt from R. It means that R is waiting for you to enter something.- R is case-sensitive
- Spaces are ignored
- If the console gets stuck, press
[esc]
- Pressing
[return]
in the Console will run the command.
Operators are symbols that have special meaning in R. These are critical to know.
Operator | Meaning |
---|---|
# |
Comment. R ignores lines that start with this |
+ , - , * , / ,
^ |
Arithmetic operators (plus, minus, divide, multiply, exponent) |
> , >= , < ,
<= |
Relational operators meaning “greater than”“,”greater or equal to”“,”less than”“,”less or equal to”” |
== |
Relational operator meaning is equal to |
!= |
Relational operator meaning is not |
<= or = |
Both used to assign objects |
! , & , | |
Logical operator used for indexing, meaning “exclude”, “and”, “or” |
% |
This symbol is used in several contexts including matrix math, integer division, and value matching |
~ |
Used for model formulae |
$ |
List indexing (element name) |
: |
Create a sequence |
We will be using most if not all of these operators in due time. For
now, let’s get oriented with the first 6 rows of the table above.
First, it is important to know that R ignores all lines that begin with
a hashtag #
. Thus, hashtags a really useful for making
comments on your code.
# You can type anything after the hashtag and R will ignore it.
Second, it’s important to know the difference between
<-
, =
and ==
.
<-
and =
are the same thing: they both
assign elements to objects.
a <- 5 #this is the same as...
a = 5
Some experienced programmers prefer <-
due to
occasional ambiguity in using the equals sign. In this class, I will use
=
, which is what I prefer due to its simplicity.
Third, whereas single equals sign =
is used to assign
objects, the double equas sign ==
is a
relational operator asking “is something equal to
something?”
For example, type in these lines and hit return (you can skip the
parts after the #
)
# assign some values
a = 5
b = 10
c = 5
a == b # is a equal to b?
## [1] FALSE
a == c # is a equal to c?
## [1] TRUE
Let’s play with some other relational operators:
a < b #is a less than b?
## [1] TRUE
a + c == b # is a + c equal to b?
## [1] TRUE
a != b # a is not the same as b?
## [1] TRUE
a != c # a is not the same as c?
## [1] FALSE
Functions are commands that you use to manipulate objects in
R. Functions followed by ()
, and each function comes with
specific arguments or syntax that goes inside the parentheses. Function
names are like the verbs that you have to learn to master this
language.
For example, the function rep(x,n)
is a function that
says, “repeat the value x n times”. Try it:
rep(a,5)
## [1] 5 5 5 5 5
Try another simple function, seq()
, which creates a
sequence of numbers. Here’s an example.
seq(1,10,1)
## [1] 1 2 3 4 5 6 7 8 9 10
Here, the syntax is important. Generically, seq(x, y, z)
says “create a sequence of numbers from x to y at increments of
z).
But how do you find out what the syntax for a function
is? This is a really important point about using R. You
have to learn how to use each function. Luckily, there is a help file
associated with each function. To look at the help file, you simply use
? in front of the function name:
?seq
This should give you a help file in the bottom right ‘outputs’ panel.
It’ll look something like this:
Some important elements of the help file:
Upper left corner shows the function, then brackets with the name
of the package that contains the function: seq{base} indicates that the
function seq()
is in the “base package”—it is pre-loaded so
you can always use it. Some functions require certain packages to be
loaded. We will talk about loading & using packages in a later
module.
Usage: Shows the syntax. What you should focus on is the different arguments that can be included—this helps specify how the function performs and what outputs are shown.
Arguments: This section provides more detail about what goes inside the parentheses. This is probably the most useful part of the help file.
Details: This section can be very informative for statistical functions or other complex functions. Read this carefully for new functions.
Value: This section tells you what the outputs of the function are. This can also be very useful for more complicated functions. We will likely refer to this section in some cases.
Examples: This section often gives you a self-contained example of usage. You can copy and paste codes from here and run them to see what they do.
Ok, now that we’re oriented with the syntax of seq()
,
let’s play around with the function a bit.
You can see from the help file that the third argument for this function
is “by”, which defines the interval that you want to use for the
sequence of numbers. You can change this.
seq(1,10,by=0.7)
## [1] 1.0 1.7 2.4 3.1 3.8 4.5 5.2 5.9 6.6 7.3 8.0 8.7 9.4
You can also see that there is an optional argument called “length.out”. It is set as NULL by default—meaning that if you don’t specify it, it will be ignored. However, you can choose to specify the length of the output:
seq(1,10,length.out=19)
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
## [16] 8.5 9.0 9.5 10.0
The ability to see the inner workings of each function and specify some aspects of how functions work is one of the strengths of R. Hopefully you will come to appreciate that this flexibility and detail as you learn how to work with a programming language.
One of the huge benefits of learning R is the enormous and ever-growing corpus of packages that members of the user community build to aid in different tasks (over 22,500 packages as of August 28, 2025).
Packages are collections of functions and data in a well-defined format (e.g., each function has a help file) designed around a specific functionality (e.g., a particular analysis approach).
R automatically comes with base packages that forms the ‘source code’.
To use a new package, you have to go through two steps:
install.packages("vioplot")
To install a package form CRAN, use install.packages()
with the name of the package in quotes inside the parentheses.
One way to load a package is to use the function
library()
library(vioplot)
## Loading required package: sm
## Package 'sm', version 2.2-6.0: type help(sm) for summary information
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
Once you’ve loaded the package, you now have access to all the functions included in it. You can see what is in the package like this:
library(help="vioplot")
This should pop up a window in one of your panels with all the information about the package, including the functions in it.
You can type commands directly into the R Console and hit
[return]
to run the command (as we have done above).
However, it is best practice to type your code into the Editor, and then
hit [command]+[return]
while the cursors is on the line of
the command you want to run. There are several advantages to running
your commands from the Editor rather than typing directly into the
Console:
[command]+[return]
.#
will show up as a different color in your Editor, and R
Console will ignore this text when running your commands. This allows
you to keep notes that explain what different sets of codes do.Try typing these two lines in your Script Editor (top right), and
then highlight both lines and run them by hitting
[command]+[return]
, or hitting the little Run
button at the top of the Script Editor:
a=5*2
b=4
a/b
## [1] 2.5
You have now written a script! Now try annotating the script by adding comments preceded by a hashtag:
a=5*2 #This is the same as before
b=4
a/b #The answer should be 2.5
## [1] 2.5
See what happens if you remove the hashtags and run the script again.
Now that we have built a simple script, we should save it. But to save a script, you need to be familiar with the working directory. The working directory is the location in your computer where R will know to go save things, or to look for things if you ask it.
The default working directory can be set by going to
[Preferences]
. It will be the first item at the top of the
preferences window. You can set the default working directory by
clicking Browse
. Go ahead and set the working directory to
a folder for this course.
Now, if you save the script file, it will be saved in the default
working directory. You can save the script by clicking
[File]
–[Save]
or the little floppy disk icon
at the top of the Rstudio window.
However, it is often good practice to actually set the working directory
for each project.
To do this, you will use a function called setwd()
. To
use this function, you will have to get familiar with the concept of
file or folder paths. A path name is the “address” of a
specific file or folder on your computer. Paths typically look something
like /Users/dshizuka/folder
.
Once you have the path name, you can set the working directory. For example, if I wanted to set my working directory to be my Documents folder, I can set it this way:
setwd("/Users/daishizuka/Documents") #fill in the path to your working directory folder
Make sure your path name is inside the quotes!
You can always check what the current working directory is by typing:
getwd()
Now, save the script you have written so far by going to
File
–Save
and giving it a name like
“session_1”. The script file should show up in the folder that is
designated as the working directory with the file extension “.R”
In a couple of modules, we will tackle a more elegant way to set up your working directory, using what is called Rstudio Projects.
After you’ve saved your R script, let’s try quitting R.
You will be asked if you want to save the ‘workspace image’.
Saving the workspace image isn’t terrible–but I find it very annoying. What this will do is automatically reload the working environment–i.e., everything that you’ve done in the current session–the next time you start up Rstudio. But this turns out to be an annoyance a lot of the time. For example, it will be super annoying if you are switching between projects, then having previously loaded objects that you don’t remember. Trust me, you are better off NOT saving the working environment.
Quick Review of this module:
You can assign values to objects, which you can manipulate using operators and functions.
One key to success in R is to learn what each operator and function does. This is the part that is like learning a language.
Each function comes with a help file, which you can get by just running a code
?functionname
(replace ‘functionname’ with the name of the function)The console is where the code runs. But use the script editor to write and save the code script.