##1. The Study
Today, we will recreate some aspects of a study (Wales et al. 2013)1 on the effect of anthropogenic noise on the metabolic rate of the shore crab, Carcinus maenas. You can download the pdf here.
The study is summarized by these two figures:
Figure 1, on the left, shows the relationship between mass and oxygen consumption for individuals exposed to 15 minutes of ambient noise (open circles, dotted line) and individual exposed to ship noise (filled diamonds, solid line).
Figure 2, on the right, shows the relationship for sets of individuals that were exposed to repeated playbacks (8 trials of 15 minutes each, each trial separated by 48 hours) of ambient noise (open circles, dotted line) and ship noise (filled diamonds, solid line). Sample sizes were n=11 individuals for each treatment.
##2 The Data The data are publicly accessible on Dryad: http://datadryad.org/resource/doi:10.5061/dryad.36f65 The data are saved in two separate “worksheets” within .xlsx file. We will need to save the two sheets separately as individual .csv files. Follow these steps:
Let’s load the datasets into R and take a quick a look to check that the dataframes look correct.
single=read.csv("data/single_exposure.csv")
head(single)
mult=read.csv("data/multiple_exposure.csv")
head(mult)
## Playback.treatment Crab.mass Oxygen.consumed
## 1 Ambient 50.73 142.4711
## 2 Ambient 59.72 168.6700
## 3 Ambient 54.51 205.1131
## 4 Ambient 57.32 168.3071
## 5 Ambient 84.90 209.4783
## 6 Ambient 49.58 154.7141
## Playback.treatment Day1 Day3 Day5 Day7 Day9 Day11
## 1 Ambient 1.921936 0.9849995 0.9842713 0.9828148 0.9828148 0.9791737
## 2 Ambient 1.857020 0.8899608 1.6646716 2.0493675 2.0493675 2.0405021
## 3 Ambient 1.723259 1.1569210 1.1565645 1.1558514 3.4240576 1.7211203
## 4 Ambient 1.943196 0.6500865 1.2966410 3.2363590 5.8227945 3.8826962
## 5 Ambient 1.630638 0.8660895 1.6321760 1.8849740 2.9084740 2.9007815
## 6 Ambient 1.798980 0.5874803 0.3429576 2.0403275 2.0403275 2.5182593
## Day13 Day15
## 1 1.456381 1.453468
## 2 2.254127 1.858793
## 3 3.426197 1.723616
## 4 3.883294 2.589859
## 5 2.405955 2.655676
## 6 1.806918 1.800568
##3 Testing the relationships between mass, noise, and physiology
####(Testing the interaction between a continuous variable and a
factor)
A central question here is whether the sound environment affects the
physiology of crabs. However, we also know that physiological measures
are often affected by body size. Thus, we would like to know whether the
relationship between mass and oxygen consumption differ across
treatments.
Let’s break this down. We are interested in the relationships between 3 measures:
###3.1 Plotting Figure 1
We can plot the relationships between these three things this way:
# Run two regression analyses: one for each treatment
mod1=lm(Oxygen.consumed~Crab.mass,
data=subset(single, single$Playback.treatment=="Ambient"))
mod2=lm(Oxygen.consumed~Crab.mass,
data=subset(single, single$Playback.treatment=="Ship"))
# set up the x and y coordinates for the regression lines
xv=c(min(single$Crab.mass), max(single$Crab.mass))
yv1=predict(mod1, list(Crab.mass=xv))
yv2=predict(mod2, list(Crab.mass=xv))
#Figure 1
plot(Oxygen.consumed~Crab.mass, data=single,
pch=c(1, 18)[Playback.treatment], cex=1.5, las=1, xlim=c(0,90), ylim=c(0,375))
lines(xv, yv1, lty=2, lwd=1.5)
lines(xv, yv2, lty=1, lwd=1.5)
Figure 1: The relationship between body mass and oxygen consumption in shore crabs. Individuals in the ‘Ambient’ treatment shown in open circles, individuals in ‘Ship’ treatment shown in filled diamonds.
So, what should we take away from this figure? There are three relationships shown here:
There seems to be a positive relationship between mass and oxygen consumption
Crabs that were exposed to ship noise (filled diamonds) seem to have higher oxygen consumption than those exposed to ambient noise (open circles)
The crabs that were exposed to ship noise seem to have a steeper relationship between mass and oxygen consumption
How do we test these relationships statistically? When we have a continuous dependent variable (Oxygen consumption) and two independent variables–one continuous (mass) and one discrete (playback treatment), we conduct an Analysis of Covariance, or ANCOVA.2
###3.2 Running an ANCOVA
You can run an ANCOVA in R in two different ways. First, you could
build a linear model and then compute the analysis of variance table.
Second, you could run use the aov()
wrapper for a linear
model and then just look at the output using a summary function.
####ANCOVA Option 1: lm()
and anova()
mod=lm(Oxygen.consumed~Playback.treatment*Crab.mass, data=single)
anova(mod)
####ANCOVA Option 2: aov()
and
summary()
aov.mod=aov(Oxygen.consumed~Playback.treatment*Crab.mass, data=single)
summary(aov.mod)
## Df Sum Sq Mean Sq F value Pr(>F)
## Playback.treatment 1 39355 39355 34.342 2.06e-06 ***
## Crab.mass 1 33456 33456 29.195 7.46e-06 ***
## Playback.treatment:Crab.mass 1 7828 7828 6.831 0.0139 *
## Residuals 30 34379 1146
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####So what does this result tell us?
This model has a significant interaction effect (F =
6.8, P = 0.014). In this case, this means that the slopes of
the relationship between mass and oxygen consumption are different
between treatments. We can see this in the figure above: In both
treatments, oxygen consumption increases with body mass, but heavier
individuals increase their consumption rate more when they are exposed
to ship noise.
Technically speaking, you don’t really want to interpret the other single-variable results (i.e., the effect of mass and treatment on oxygen consumption independently) because they are NOT independent. If you want to look at the effect of mass per se on oxygen consumption, you need to do that within each treatment (because we know that the relationship is different depending on the treatment!). Likewise, if you want to look at the relationship between treatment and oxygen consumption, you should do a simple ANOVA to compare differences between treatments. You can do those like this (outputs not shown):
#The effects of mass on oxygen consumption within each treatment
summary(lm(Oxygen.consumed~Crab.mass, data=subset(single, single$Playback.treatment=="Ambient")))
summary(lm(Oxygen.consumed~Crab.mass, data=subset(single, single$Playback.treatment=="Ship")))
#The effect of playback treatment, regardless of mass
summary(aov(Oxygen.consumed~Playback.treatment, data=single))
###3.3: Table of basic statistical tests Here is a table with some of the most common statistical tests
Dependent Variable | Independent Variable(s) | Test | function |
---|---|---|---|
continuous | continuous | Linear Regression | lm() |
continuous | categorical | ANOVA or t-test | aov() or t.test() |
categorical | categorical | Fisher’s exact test or contingency table | fisher.test() or chisq.test() |
continuous | continuous + categorical | ANCOVA | lm() or aov() |
continuous | categorical + categorical | two-way ANOVA | aov() |
continuous | continuous + continuous | multiple regression | lm() |
binary (e.g., yes/no) | continuous | logistic regression | glm() |
counts or proportions | continuous | poisson regression | glm() |
##4 Examining the effect of repeated exposure to noise The second part of the study uses a repeated playback of ambient or ship noise to look at cumulative effects of noise on the physiology of shore crabs.
In this part of the study, each individual was exposed to repeated trials of noise playbacks, and their oxygen consumption was measured each time.
We have already import the data for this part of the study at the beginning of this module. Let’s refresh our memory about what this looks like.
head(mult)
## Playback.treatment Day1 Day3 Day5 Day7 Day9 Day11
## 1 Ambient 1.921936 0.9849995 0.9842713 0.9828148 0.9828148 0.9791737
## 2 Ambient 1.857020 0.8899608 1.6646716 2.0493675 2.0493675 2.0405021
## 3 Ambient 1.723259 1.1569210 1.1565645 1.1558514 3.4240576 1.7211203
## 4 Ambient 1.943196 0.6500865 1.2966410 3.2363590 5.8227945 3.8826962
## 5 Ambient 1.630638 0.8660895 1.6321760 1.8849740 2.9084740 2.9007815
## 6 Ambient 1.798980 0.5874803 0.3429576 2.0403275 2.0403275 2.5182593
## Day13 Day15
## 1 1.456381 1.453468
## 2 2.254127 1.858793
## 3 3.426197 1.723616
## 4 3.883294 2.589859
## 5 2.405955 2.655676
## 6 1.806918 1.800568
Ok, what you can see here is that the data is stored with each individual as a row, and each trial as a separate column. We will refer to this format as the “wide-format”. To conduct a repeated-measures analysis on this data, we actually need to reformat this data so that each row represents a trial–i.e., each individual measured on a given day of the experiment. We will call this the “long-format”. Let’s see how to use R to manipulate the format of the data:
###8.4.1: Re-formatting the data using the tidyr
package
The ‘tidyr’ package is an R package designed for re-arranging data in a flexible way. It can be a really nice way to reshape your data or First, if you have not installed the package on your computer yet:
install.packages('tidyr')
We will proceed in two steps:
gather()
the data using these attributes:Let’s try this with our dataset! We will name the “key” variable as “day_txt” (for reasons that will become clearer below), and the “value” variable as “oxy” (for oxygen consumption).
library(tidyr) #remember to load the package
mult$ID=1:nrow(mult)
mult_long=gather(mult, key=day_txt, value=oxy, -ID, -Playback.treatment)
head(mult_long)
## Playback.treatment ID day_txt oxy
## 1 Ambient 1 Day1 1.921936
## 2 Ambient 2 Day1 1.857020
## 3 Ambient 3 Day1 1.723259
## 4 Ambient 4 Day1 1.943196
## 5 Ambient 5 Day1 1.630638
## 6 Ambient 6 Day1 1.798980
#mult_long$day_txt=as.ordered(mult_long$day_txt)
You can use the spread()
function to re-arrange the data
back to a “wide-format”.
mult_wide=spread(mult_long, key=day_txt, value=oxy)
head(mult_wide)
## Playback.treatment ID Day1 Day11 Day13 Day15 Day3
## 1 Ambient 1 1.921936 0.9791737 1.456381 1.453468 0.9849995
## 2 Ambient 2 1.857020 2.0405021 2.254127 1.858793 0.8899608
## 3 Ambient 3 1.723259 1.7211203 3.426197 1.723616 1.1569210
## 4 Ambient 4 1.943196 3.8826962 3.883294 2.589859 0.6500865
## 5 Ambient 5 1.630638 2.9007815 2.405955 2.655676 0.8660895
## 6 Ambient 6 1.798980 2.5182593 1.806918 1.800568 0.5874803
## Day5 Day7 Day9
## 1 0.9842713 0.9828148 0.9828148
## 2 1.6646716 2.0493675 2.0493675
## 3 1.1565645 1.1558514 3.4240576
## 4 1.2966410 3.2363590 5.8227945
## 5 1.6321760 1.8849740 2.9084740
## 6 0.3429576 2.0403275 2.0403275
However, note that this will sort the days incorrectly because “Day11” comes after “Day1” when sorting these values alphabetically. This is another reason to keep your data in long-format and keep days as a separate numerical vector…
In any case, we will only be dealing with the long-format data from here on out.
###4.2 Conducting the repeated-measures ANOVA
Now, we are ready to conduct our repeat-measures two-way ANOVA analysis.
Running a two-way ANOVA is easy: you simply add two dependent variables and an interaction term using “*” as with the ANCOVA. To add a repeated-measures term, you add what is called an “Error()” term, which indicates which variable is the “ID” or “subjects” that is being repeatedly sampled.
aov.mod=aov(oxy~Playback.treatment*day_txt + Error(ID), data=mult_long)
summary(aov.mod)
##
## Error: ID
## Df Sum Sq Mean Sq
## Playback.treatment 1 141.2 141.2
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## Playback.treatment 1 34.64 34.64 33.933 3.07e-08 ***
## day_txt 7 6.03 0.86 0.844 0.5527
## Playback.treatment:day_txt 7 15.57 2.22 2.179 0.0388 *
## Residuals 159 162.32 1.02
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This gives us the result that there is a significant interaction between playback treatment and the changes in the crab’s response to the playback experiment across days. Graphically, we can see from Figure 2 in the paper that oxygen consumption increases for the control treatment (“Ambient”“) but not the”Ship” treatment. As with the single-exposure treatment, we really shouldn’t be interpreting the “general” relationship between day and oxygen consumption nor the treatment and oxygen consumption…
###4.3: A little bit of a correction on the paper:
Ok, thus far we have re-created the analyses that were conducted and presented in the paper. Here, I would like to point out a couple of minor but significant-enough errors in this analysis.
First, in the two-way repeated-measures ANOVA presented above, the “ID” variable needs to be assigned as a factor, i.e., a categorical variable that has no order. Currently, this variable is an integer, and as such it is interpreted as something that has rank-order (i.e., individual with ID = 1 has some quantitity that is lower than an individual with ID = 10). This is not correct.
The days are currently not interpreted as factors–categorical variables without any order. Clearly, days are ordered, and we are interested in the change in values across days. One could argue about whether “days” should be treated as a continuous variable.
Let’s fix the first problem first. This is easy to do–we just need to
tell R that “ID” is a factor within the aov()
function:
aov.mod=aov(oxy~Playback.treatment*day_txt
+ Error(factor(ID)), data=mult_long)
summary(aov.mod)
##
## Error: factor(ID)
## Df Sum Sq Mean Sq F value Pr(>F)
## Playback.treatment 1 175.2 175.21 30.61 2.04e-05 ***
## Residuals 20 114.5 5.72
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## day_txt 7 6.03 0.8613 2.485 0.0196 *
## Playback.treatment:day_txt 7 15.57 2.2242 6.416 1.48e-06 ***
## Residuals 140 48.53 0.3467
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
You can see that this immediately increases the all of the effects observed here. This is because when “ID” was coded as an integer, it was creating “noise” that is irrelevant. Thus, the interaction effect is actually much stronger than is reported in the paper.
To fix the second problem (Days are not being interpreted as an
ordered variable), we need to first re-code the “day” variable so that
they are numbers rather than text. But we don’t have to re-type this
data or anything–the information is within the character strings. We
just need a way to change “Day1” into “1” and “Day3” to “3”, etc. Here,
we can use a function called substr()
(for substring) to
extract certain characters from a character string. In this case, we
just want the 4th through 5th character of the “day_txt” column. Try
this (output not shown):
substr(mult_long$day_txt, 4, 5)
So we want to use this code to make a new column in the dataframe and make sure that it is interpreted as a number. Let’s do that and look at the first few lines of the revised dataset.
mult_long$day_num=as.numeric(substr(mult_long$day_txt, 4, 5))
head(mult_long)
## Playback.treatment ID day_txt oxy day_num
## 1 Ambient 1 Day1 1.921936 1
## 2 Ambient 2 Day1 1.857020 1
## 3 Ambient 3 Day1 1.723259 1
## 4 Ambient 4 Day1 1.943196 1
## 5 Ambient 5 Day1 1.630638 1
## 6 Ambient 6 Day1 1.798980 1
Now, we can use this new “day_num” variable as the variable of interest in our two-way repeated-measures ANOVA to see how the results might differ when days are treated as a continuous variable:
aov.mod2=aov(oxy~Playback.treatment*day_num + Error(factor(ID)), data=mult_long)
summary(aov.mod2)
##
## Error: factor(ID)
## Df Sum Sq Mean Sq F value Pr(>F)
## Playback.treatment 1 175.2 175.21 30.61 2.04e-05 ***
## Residuals 20 114.5 5.72
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## day_num 1 1.67 1.672 4.215 0.0418 *
## Playback.treatment:day_num 1 8.15 8.151 20.543 1.17e-05 ***
## Residuals 152 60.31 0.397
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now you see that the interaction effect is even stronger, although the results don’t really change qualitatively. I suspect that most of the difference comes simply from the change in degrees of freedom when days are treated as a continuous variable rather than categorical variables.
###4.4: Accounting for repeated measures using a linear mixed-effects model
An alternative to using the repeated-measures ANOVA here is to use
assign “ID” as a random effect. In very
general terms, a random effect is a way to assign a hierarchical model
in which you are exploring the relationships between a dependent
variable and a set of fixed effects within
categories (assigned as the random effect). It is a complicated concept,
but it is widely applicable in ecological and biological statistics.
There are number of R packages that can handle mixed-effects models,
including nlme
, lme4
, MCMCglmm
and others. Here is an example using the nlme
package: If
you don’t have this installed on your computer yet, run:
install.packages('nlme')
Here is the syntax to use to structure a mixed-effects model with “oxy” (oxygen consumption) as the dependent variable, “Playback.treatment” and “day_num” as fixed effects, and “ID” as the random effect:
#mixed-model using ML
library(nlme)
lme.mod=lme(oxy~Playback.treatment*day_num, random=~1|ID, data=mult_long)
anova(lme.mod)
## numDF denDF F-value p-value
## (Intercept) 1 152 257.66516 <.0001
## Playback.treatment 1 20 30.61482 <.0001
## day_num 1 152 4.21465 0.0418
## Playback.treatment:day_num 1 152 20.54278 <.0001
You should see that the result is actually the same as the last
version of the two-way repeated-measures ANOVA that we ran (with days as
continuous variable and ID as factor). Thus, these are two viable
alternative methods for doing the same thing.
More generally, linear mixed-effects models, and its extension, the
Generalized Linear Mixed Model (GLMM) can be powerful tools for
statistical analysis.
Mixed-models are very complicated, and different packages deal with them in different ways. You need to sit down and really learn how to deal with them. Here are a couple important references:
Bolker et al. (2009) Generalized linear mixed models: a practical guide for ecology and evolution, TREE, 24: 127-135.link
Zurr et al. (2009) Mixed Effects Models and Extensions in Ecology with R. Springer.
##5 Figure 2
Let’s play around with different versions of Figure 2. For all of these, we will need to calculate the standard error, so first, the custom function for calculating standard error:
se=function(x) sqrt(var(x)/length(x))
###5.1 The old way
#step 1: create a summarized dataset with means and standard errors
d1=aggregate(mult_long$oxy, list(mult_long$day_num,mult_long$Playback.treatment), mean)
ses=aggregate(mult_long$oxy, list(mult_long$day_num,mult_long$Playback.treatment), se)
d1$se=ses[,3]
names(d1)=c("day", "treatment", "mean", "se")
#plot blank plot, populate with dots & lines & standard error bars
plot(1:15, seq(0,5,length=15), type="n", xlab="", ylab="", las=1)
points(mean~day, data=subset(d1, treatment=="Ambient"), type="b",pch=21, bg="white")
points(mean~day, data=subset(d1, treatment=="Ship"), type="b",pch=21, bg="black")
arrows(d1$day, d1$mean-d1$se, d1$day, d1$mean+d1$se, code=3, angle=90, length=0.1)
###5.2 A fancier way to summarize data using dplyr
There
are a couple of packages that help us accomplish complex tasks in fewer
keystrokes or in a more organized way. Hadley Wickham http://hadley.nz/ is an R whiz
that has created a series of groundbreaking R packages including
tidyr
, dplyr
and ggplot2
. These
packages enhance your R skills. However, the downside is that it takes a
little bit of additional learning curve to understand how to use these
tools. We have already touched on tidyr
. Here, I will
introduce you very briefly to dplyr
, which is a package for
manipulating data, and ggplot2
, an popular package for
graphics.
Here is how to make the summarized dataset (mean and standard error)
using dplyr
. You can find a more complete guide on this
package here: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
library(dplyr)
d2 = mult_long %>%
group_by(Playback.treatment, day_num) %>%
summarise(mean=mean(oxy), se=se(oxy))
## `summarise()` has grouped output by 'Playback.treatment'. You can override
## using the `.groups` argument.
head(d2)
## # A tibble: 6 × 4
## # Groups: Playback.treatment [1]
## Playback.treatment day_num mean se
## <chr> <dbl> <dbl> <dbl>
## 1 Ambient 1 1.85 0.0795
## 2 Ambient 3 1.16 0.159
## 3 Ambient 5 1.29 0.159
## 4 Ambient 7 1.68 0.221
## 5 Ambient 9 2.42 0.442
## 6 Ambient 11 2.28 0.292
###5.3 Plotting the figure with ggplot2
See this page for details on how to use ggplot2
: http://docs.ggplot2.org/current/
##make figure 2 with ggplot2
library(ggplot2)
p = ggplot(d2)
p = p + geom_errorbar(aes(x=day_num, ymax=mean+se, ymin=mean-se, width=0.5))
p = p + geom_point(aes(x=day_num, y=mean, colour=Playback.treatment), size=5)
p
##5.4 Figure 2 as bar plots
We can display Figure 2 as a series of paired bar plots.
#make it bar plots
tab.mean=xtabs(mean~Playback.treatment+day_num, data=d2)
tab.se=xtabs(se~Playback.treatment+day_num, data=d2)
bp=barplot(tab.mean, beside=T, ylim=c(0, max(tab.mean)+max(tab.se)))
arrows(bp, tab.mean-tab.se, bp, tab.mean+tab.se, angle=90, code=3, length=0.05)
We can do the same using ggplot2
p = ggplot(d2, aes(x=day_num, y=mean, group=Playback.treatment))
p = p + geom_bar(aes(fill=Playback.treatment), stat="identity", position='dodge')
p = p + geom_errorbar(aes(ymax=mean+se, ymin=mean-se), position='dodge', stat="identity")
p
Wale, M.A., Simpson, S.D., & Radford, A.N. (2013) Size-dependent physiological responses of shore crabs to single and repeated playback of ship noise. Biology Letters 9: 20121194↩︎
The paper calls this analysis a “Two-way ANOVA” (in Results section), but this is technically wrong. A two-way ANOVA is when there is a continuous dependent variable and two discrete independent variables.↩︎