This document provides the guide to the R codes for the workflow for constructing social networks from automated telemetry data, as described in Shizuka et al., Constructing social networks from automated telemetry data: a worked example using within- and across-group associations in cooperatively breeding birds. This worked example uses one month (October 2018) worth of data from a study of acorn woodpeckers (Melanerpes formicivorus) at Hasting Reserve, California.
Set up the options and required packages:
knitr::opts_chunk$set(tidy.opts=list(width.cutoff=60), tidy=TRUE)
library(knitr)
library(tidyverse)
library(lubridate)
library(igraph)
library(assortnet)
library(rgdal)
library(REdaS)
library(geosphere)
source("ACWO_masterfunctions.R", local = knitr::knit_global())
For the first steps, we are going to work with the raw data. Here, we will use the data from October 2018.
load("alloctober.rdata")
Here is what a snippet of the raw data would look like.
head(rawdat)
## ST DATE GROUP HOME BIRD SEX STATUS ZTIME
## 1 -96 2018 Oct 12 CAVI CAVI B4935 M B 43394
## 2 -96 2018 Oct 12 CAVI CAVI B4935 M B 43398
## 3 -100 2018 Oct 16 CAVI CAVI B4935 M B 36678
## 4 -91 2018 Oct 11 CAVI CAVI B4935 M B 60377
## 5 -96 2018 Oct 12 CAVI CAVI B4935 M B 39875
## 6 -95 2018 Oct 12 CAVI CAVI B4935 M B 43832
There are few data cleanup tasks:
make sure dates are in “date object” format
make sure time stamp is read as numeric
make sure group, home group, and bird ID are read as character
make sure the signal strength data (“ST”) is a number. This is tricky because they are coded as ‘factor’ right now, so if you just use , it will convert it wrong. To get around this, first convert to character and then convert to number by
rawdat = rawdat[-which(rawdat$DATE == "DATE"), ]
rawdat$date.obj = ymd(rawdat$DATE)
rawdat$ztime.num = as.numeric(as.character(rawdat$ZTIME))
rawdat$GROUP = as.character(rawdat$GROUP)
rawdat$HOME = as.character(rawdat$HOME)
rawdat$BIRD = as.character(rawdat$BIRD)
rawdat$ST = as.numeric(as.character(rawdat$ST))
Now, we will load the individual attributes data, which includes the bird ID, their home group (GROUP), sex (M = male, F = female) and status (B = breeder, H = helper). Note that “GROUP” is confusing because when we look at the telemetry data, “GROUP” there indicates the base station where the tag was detected. This is unfortunate, but since this is what our raw data look like, we’ll keep it that way.
Note also that, in this data, there is a group called “A1-RE10”. We will simplify this to “A1”, which is where the closest base station is here.
id.dat = read.csv("RadioTagIDs_Final.csv")
id.dat$GROUP = as.character(id.dat$GROUP)
# change 'A1-RE10' to 'A1' so that we can match up groups
id.dat$GROUP[which(id.dat$GROUP == "A1-RE10")] = "A1"
# set up an object that has just the ids
ids = unique(as.character(id.dat$BIRD))
We will also import the locations of ALL group territories (including those that do not have base stations), as well as the locations for all receivers used in the whole study.
# all known group locations
group.dat = read.csv("gnames.csv")
The first major task is to convert the raw detections into an estimate of time windows during which a bird is present within the range of a base station.
The key here is to figure out what is the probability that an individual that stays within a given area is detected X seconds apart. An individual tag can be detected 2 seconds apart, but we expect there to be variation in this time difference between detection to detection due to error, etc. So, the question is, how much delay in detection can we have and still consider the individual to be continuously within the signal range?
Preview of what we are trying to do
To illustrate what we are trying to do, let’s start with a visualization of the detections of individuals at MACR for just two hours: 9-11am on October 25, 2018
You can see that the data is very dense. Each tick is a data point, with varying intervals between detections. Our goal here is to convert this data into time windows during which we consider the bird to be “present” at the station. If we can do that, then we can convert this data into something that looks like this:
Why?
Consolidating the individual detection data into these time windows has several advantages:
It accounts for variation in detection rate. Not every tag that is present will be detected every 2 seconds (minimum time delay between detections) due to various sources of detection error. Sources of detection error could include: (a) interference between tags (though this is probably not a huge problem here given the system?), (b) topography/vegetation obscurring tags, (c) inconsistent detection of birds near edge of detection range, (d) tag power failure leading to differences in temporal delay in tag detection, (e) problem with base station scanning (we would know this because all tags will be undetected). So we need to be able to “fill in the gaps” to account for moments when a bird is present but not detected.
It vastly reduces the file sizes and makes the dataset more manageable
Most importantly, we can calculate association indices between every pair of individuals using the overlaps in these time windows
How to do it:
Here’s the general workflow for converting the raw detection data to time windows during which a bird is considered to be present at a station.
Calculate time delays between successive detections for a given individual at a given station.
Figure out the threshold value of time delay that we believe captures the pattern of presence/absence (e.g., for foraging or group visit forays).
Group detections that are separated by time delays below the threshold value into a time window
Now, we are going to filter this data to just a single day’s worth of data. This routine uses piping %>%
, which is available in tidyverse
As a first step towards generating the time window presence data, we can visualize the cumulative distribution of time differences between detections for an individual at its home base station. We will look at the home base station because this is likely to be where individuals are relatively stationary when they are present. That means that the majority of the time they are there, the time difference detection should be very short (i.e., close to 2 seconds). To do this, we will first create a list of dataframes…
# set a date
choose.date = "2018-10-25"
# filter the data to that date, then sort by 'ztime.num', and
# then group by social group and bird ID. We then filter the
# data down further to only cases where the individual is
# seen at their home group, then create a new column called
# 'timediff', which is the time difference between the
# previous detection and the current detection.
timediff = rawdat %>% filter(date.obj == choose.date) %>% arrange(ztime.num) %>%
group_by(GROUP, BIRD) %>% filter(GROUP == HOME) %>% mutate(timediff = ztime.num -
lag(ztime.num))
# now split this into a list where each element is a
# different bird
timediff = split(timediff, list(timediff$BIRD, timediff$GROUP),
drop = T)
We then plot the accumulation curve of tag detections across a range of inter-detection intervals.
plot(c(0, 300), c(0, 1), type = "n", ylab = "Cumulative Frequency",
xlab = "Time delay between detections (seconds)")
for (i in 1:length(timediff)) {
if (nrow(timediff[[i]]) < 100)
next else {
t = table(timediff[[i]]$timediff)
p = cumsum(t)/max(cumsum(t))
points(sort(unique(timediff[[i]]$timediff)), p, type = "l",
col = gray(0.3, 0.5))
}
}
This figure shows a set of empirical cumulative distribution functions of time delays between detections (in seconds) for each bird that was detected AT ITS HOME GROUP on one day (2018-10-25).
Now take a closer look: Here I zoom in a bit to 0-100 seconds, and highlight four individuals here that have a weird pattern…
You can see that the 4 individuals shown in red (“B4113”,“B5227”, “B5274” and “B5571”) exhibit a flat line (of varying duration) between about 3-20 seconds of delay between detections. From what I can tell, these are caused by tags that must frequently pause for about 12-20 seconds or somehow not get detected by the base station.
One way to confirm that this problem comes from the tag and not the base station is to check the patterns of detections of these birds at basestations away from the home group–i.e., during visits to other groups. Here is what that looks like:
We can also confirm that this is not a base station issue but rather a tag issue. (1) The same individuals show this pattern on multiple days (2) Only these individuals show this pattern at a given base station on a given day. For example, here are birds detected at 100+ times at SHAN. The red lines are 5274 and 5571.
Let’s take a closer look at the data in another way. Here, I show the actual detections across time on the morning of 2018-10-25.
What you can see is a difference in the temporal pattern of detections between, say 6119 and 5274. Whereas 6119 (e.g., look at blue box) displays dense clusters of detections, 5274 (e.g., look at red box) has constant detections that are spaced apart. If you look closer, you can actually see that 5274 is either detected 2-3 seconds apart or about 50 seconds apart, but rarely if ever in between those values. That explains the cumulative frequency patterns above.
The most likely scenario is that, when tags might sometimes run low on power due to their solar panel being covered or something like that. When this happens, it seems that the tag exhibits some kind of irregular “rhythm”.
Takeaway on tag errors
Accounting for possibility of unpredictable delays in detection intervals will be important for coming up with a good way to convert point detection into time windows of presence. This is not insurmountable, but it will require some decision making, ideally with some knowledge of the natural history of the organism. Here are couple of things we should think about doing:
Develop a way to detect which tags show delays in detection interval and when
As long as the threshold time delay we decide on is reliably longer than the detection interval, you should still be capturing all of these individuals.
The main problem will be in terms of capturing very brief visits to territories. If, say, a typical visit lasts less than the interval of detections of tags when they are glitchy (e.g., when they are low on power), then we may end up underestimating the frequency of visitations.
Currently, we have used 60 seconds as the threshold time delay for creating time windows (shown in red dashed line below). For the time being, I think this is ok because it exceeds the weird patterns generated by any tag in this dataset, and it seems biological plausible to assume that forays to other territories would take more than a minute most of the time.
We’ve created a function called generate_timewindow
to convert raw data for a given date and station into time window data. The output for this function is a list, where each item is the set of time windows of presence for a given bird and the average signal strength for that time window.
generate_timewindows=function(rawdat, choose.date, pick.station, threshold=60){
dat.t=rawdat %>% filter(date.obj==choose.date) %>% filter(GROUP==pick.station) %>% arrange(ztime.num) %>% group_by(BIRD)
dat.t.list=split(dat.t, list(dat.t$BIRD), drop=T)
timewindows.list=list()
for(j in 1:length(dat.t.list)){
tempd=dat.t.list[[j]]
tempd$timelag=lead(tempd$ztime.num)-tempd$ztime.num
starts.list=list()
stops.list=list()
st.list=list()
counter=1
open.count=1
## if there is only one line, open and close time window and end
if (nrow(tempd)==1) {
starts.list[[1]]=stops.list[[1]]=tempd$ztime.num
st.list[[1]]=tempd$ST
}
else {
for(i in 1:(nrow(tempd)-1)){
##for first line, if next detection is > threshold, open and close time window and add counter
if (i==1 & tempd$timelag[i]>threshold){
starts.list[counter]=tempd$ztime.num[i]
stops.list[counter]=tempd$ztime.num[i]
st.list[counter]=tempd$ST[i]
counter=counter+1
next}
##otherwise, if first line, open time window
else if(i==1) {
starts.list[counter]=tempd$ztime.num[i]
open.count=i
next}
#after first line, if next detection is > threshold, and there is an open time window, close it and add to counter
else if(tempd$timelag[i]>threshold & length(starts.list)>length(stops.list)) {
stops.list[counter]=tempd$ztime.num[i]
st.list[counter]=mean(tempd$ST[open.count:i])
counter=counter+1
next}
#if next detection is > threshold and there is no open time window, open and close one and add to counter
else if(tempd$timelag[i]>threshold & length(starts.list)==length(stops.list)) {
starts.list[counter]=tempd$ztime.num[i]
stops.list[counter]=tempd$ztime.num[i]
st.list[counter]=tempd$ST[i]
counter=counter+1
next}
#if next detection is <= threshold and there is an open time window, just keep it open
else if(tempd$timelag[i]<=threshold & length(starts.list)>length(stops.list)) {
next}
#if next detection is <= threshold and there is no open time window, open one.
else if(tempd$timelag[i]<=threshold & length(starts.list)==length(stops.list) ) {
starts.list[[counter]]=tempd$ztime.num[i]
}
else next
}
}
counter
if(length(starts.list)>length(stops.list)) {
stops.list[counter]=last(tempd$ztime.num)
st.list[counter]=last(tempd$ST)
}
timewindows=data.frame(start=unlist(starts.list), stop=unlist(stops.list), st=unlist(st.list))
#check and make sure time windows are non-negative.
length(which(timewindows[,2]-timewindows[,1]<0))==0
timewindows$date=rep(tempd$date.obj[1], nrow(timewindows))
timewindows$group=rep(tempd$GROUP[1], nrow(timewindows))
timewindows$bird=rep(tempd$BIRD[1], nrow(timewindows))
timewindows.list[[j]]=timewindows
}
names(timewindows.list)=sapply(timewindows.list, function(x) x$bird[1])
return(timewindows.list)
}
Let’s try running this and seeing the output:
gate.10.25.18 = generate_timewindows(rawdat, choose.date = "2018-10-25",
pick.station = "GATE", threshold = 60)
lapply(gate.10.25.18, head)
## $B4113
## start stop st date group bird
## 1 31718 31718 -95.00000 2018-10-25 GATE B4113
## 2 31785 43759 -94.88367 2018-10-25 GATE B4113
## 3 43846 44860 -94.90997 2018-10-25 GATE B4113
## 4 44922 45019 -94.91250 2018-10-25 GATE B4113
## 5 45110 50161 -95.04039 2018-10-25 GATE B4113
## 6 50225 51339 -95.04936 2018-10-25 GATE B4113
##
## $B5495
## start stop st date group bird
## 1 39350 39350 -98 2018-10-25 GATE B5495
## 2 39746 39746 -100 2018-10-25 GATE B5495
##
## $B6028
## start stop st date group bird
## 1 27723 27753 -98.11111 2018-10-25 GATE B6028
## 2 27880 28009 -96.83333 2018-10-25 GATE B6028
## 3 28276 28305 -96.82353 2018-10-25 GATE B6028
## 4 28427 28427 -97.00000 2018-10-25 GATE B6028
## 5 28937 28937 -101.00000 2018-10-25 GATE B6028
## 6 29236 29236 -99.00000 2018-10-25 GATE B6028
##
## $B6029
## start stop st date group bird
## 1 37449 37449 -99.00 2018-10-25 GATE B6029
## 2 39636 39651 -97.25 2018-10-25 GATE B6029
##
## $B6077
## start stop st date group bird
## 1 26115 26115 -81 2018-10-25 GATE B6077
## 2 26293 26293 -80 2018-10-25 GATE B6077
## 3 26462 26462 -80 2018-10-25 GATE B6077
## 4 26589 26589 -81 2018-10-25 GATE B6077
## 5 26727 26727 -82 2018-10-25 GATE B6077
## 6 26837 26837 -82 2018-10-25 GATE B6077
##
## $B6078
## start stop st date group bird
## 1 28080 28084 -79.00000 2018-10-25 GATE B6078
## 2 29016 29234 -78.00000 2018-10-25 GATE B6078
## 3 29408 29420 -77.88542 2018-10-25 GATE B6078
## 4 29574 29576 -77.91837 2018-10-25 GATE B6078
## 5 29668 29760 -79.46364 2018-10-25 GATE B6078
## 6 29914 29916 -79.54464 2018-10-25 GATE B6078
We’ve also written a function to plot the time window data called plot_timewindows
. The function takes the output of the generate_timewindows
function. Time windows are shown with lines with ticks on the ends, and if a time window is too short to be shown (e.g., just a few seconds of consecutive detections), it is shown as a small dot. This plot shades the lines based on relative signal strength, with the color being darker when the signal strength is relatively stronger. This can be overridden by using the col=
argument within the function.
plot_timewindows = function(data, first.hour = 7, last.hour = 19,
col = NULL) {
first.sec.hour = first.sec.hour = 1 + (60 * 60 * 0:24)
plot(c(first.sec.hour[first.hour], first.sec.hour[last.hour]),
c(1, length(data)), type = "n", las = 1, yaxt = "n",
ylab = "", xaxt = "n", xlab = "")
axis(1, at = first.sec.hour[first.hour:last.hour], labels = seq(first.hour,
last.hour, 1))
axis(2, at = 1:length(data), labels = substr(names(data),
start = 1, stop = 5), las = 1)
max.abs.st = abs(min(sapply(data, function(x) min(x$st, na.rm = T)),
na.rm = T))
min.abs.st = abs(max(sapply(data, function(x) max(x$st, na.rm = T)),
na.rm = T))
if (is.null(col) == T) {
for (k in 1:length(data)) {
for (l in 1:nrow(data[[k]])) {
if (data[[k]][l, 2] - data[[k]][l, 1] <= 4)
points(data[[k]][l, 1], k, pch = 20, col = gray((abs(data[[k]][l,
3]) - min.abs.st)/(max.abs.st - min.abs.st)),
cex = 0.5) else {
arrows(data[[k]][l, 1], k, data[[k]][l, 2],
k, lwd = 1, angle = 90, code = 3, length = 0.03,
col = gray((abs(data[[k]][l, 3]) - min.abs.st)/(max.abs.st -
min.abs.st)))
}
}
}
} else {
for (k in 1:length(data)) {
for (l in 1:nrow(data[[k]])) {
if (data[[k]][l, 2] - data[[k]][l, 1] <= 4)
points(data[[k]][l, 1], k, pch = 20, col = col,
cex = 0.5) else {
arrows(data[[k]][l, 1], k, data[[k]][l, 2],
k, lwd = 1, angle = 90, code = 3, length = 0.03,
col = col)
}
}
}
}
}
plot_timewindows(gate.10.25.18, first.hour = 7, last.hour = 19)
… and you can zoom into a few hours and don’t use the shading
plot_timewindows(gate.10.25.18, first.hour = 13, last.hour = 15,
col = "black")
Despite the fact that we have filtered the data based on signal strength, this method is not foolproof because individual tags can differ slightly in signal strength, and receivers may differ in detection strength due to topography and vegetation. Thus, sometimes individuals can be detected at multiple receiver stations at the same time. Therefore, we take the additional step of filtering signals by comparing the signal strengths of any given tag detected at multiple stations at the same time.
To do this, we (1) determine if the time windows of detection of one individual across any two receiver stations overlaps at all, and (2) if there are overlaps, we compare the average signal strengths of each putative time window and only keep the one that has the strongest average signal strength. This ensures that we only have one location for each individual at any given time.
In this example, the cleaned data are stored as test.filter
.
### filter time windows for one individual on one day.
# choose all stations, with a specific date and individual.
allstations = unique(rawdat$GROUP)
choose.date = "2018-10-01"
use.birds = unique(rawdat$BIRD)
# pick out the data from a random bird that is in the dataset
dat1 = rawdat %>% filter(BIRD == use.birds[3])
# now we go through a loop where we generate time windows of
# detection for all stations on the chosen date.
allstations_oneday = list()
for (i in 1:length(allstations)) {
l = list()
if (length(which(dat1$date.obj == choose.date & dat1$GROUP ==
allstations[i])) > 0)
l = generate_timewindows(dat1, choose.date = choose.date,
pick.station = allstations[i], threshold = 60) else next
allstations_oneday[[i]] = l
# print(i)
}
# the resulting list, allstations_oneday, is a list of a
# list. so we need to use 'rbind' twice to make one clean
# dataframe. There is probably a more elegant way to do
# this, but this works.
a = list()
for (j in 1:length(allstations_oneday)) {
if (length(allstations_oneday[[j]]) > 0)
a[[j]] = do.call("rbind", allstations_oneday[[j]]) else next
}
b = do.call("rbind", a)
# now, we are going to iteratively find out, for any given
# time window, if there are any other time windows that
# overlap with it. This identifies overlapping time windows.
# This routine is currently inefficient because it will go
# through and identify each overlap twice (i.e, it identifies
# when time window A overlaps B, then again for B overlapping
# A). But again, it works. having found overlapping time
# windows, it will then compare the signal strength (column
# named 'st') and get the rownames of which is NOT the
# maximum signal strength among the overlapping windows. That
# will be designated as 'rows to remove'.
rows.to.remove = list()
for (i in 1:nrow(b)) {
s1 = b[c(i, which(b$start < b$start[i] & b$stop > b$start[i])),
]
rows.to.remove[[i]] = rownames(s1[-which.max(s1[, "st"]),
])
}
# consolidate the list of rows to remove, then remove them.
# save the resulting data as 'test.filter'.
all.rows.to.remove = unique(match(unlist(rows.to.remove), rownames(b)))
if (length(all.rows.to.remove) > 0) test.filter = b[-all.rows.to.remove,
] else test.filter = b
head(test.filter[order(test.filter$start), ])
## start stop st date group bird
## B5961.157 22797 22885 -81.80000 2018-10-01 Hilltop B5961
## B5961.227 23038 23064 -81.50000 2018-10-01 Hilltop B5961
## B5961.327 23185 23249 -80.00000 2018-10-01 Hilltop B5961
## B5961.427 23455 23600 -82.11538 2018-10-01 Hilltop B5961
## B5961.527 23684 23684 -86.00000 2018-10-01 Hilltop B5961
## B5961.626 23827 23829 -82.10345 2018-10-01 Hilltop B5961
Now, here is the code to run this whole routine with all individuals and all days in the dataset. Here, we run the code with the threshold set at 60 seconds and the resulting data is saved as all.timewindows.list.oct_60.rdata
all.timewindows.list = list()
for (k in 1:length(all.days)) {
allstations = unique(rawdat$GROUP)
allstations_oneday = list()
choose.date = all.days[k]
for (i in 1:length(allstations)) {
l = list()
if (length(which(rawdat$date.obj == choose.date & rawdat$GROUP ==
allstations[i])) > 0)
l = generate_timewindows(rawdat, choose.date = choose.date,
pick.station = allstations[i], threshold = 60) else next
allstations_oneday[[i]] = l
}
a = list()
for (j in 1:length(allstations_oneday)) {
if (length(allstations_oneday[[j]]) > 0)
a[[j]] = do.call("rbind", allstations_oneday[[j]]) else next
}
b = do.call("rbind", a)
c = split(b, b$bird)
f = lapply(c, function(x) {
test = x
rows.to.remove = list()
for (i in 1:nrow(test)) {
s1 = test[c(i, which(test$start < test$start[i] &
test$stop > test$start[i])), ]
rows.to.remove[[i]] = rownames(s1[-which.max(s1[,
"st"]), ])
}
all.rows.to.remove = unique(match(unlist(rows.to.remove),
rownames(test)))
if (length(all.rows.to.remove) > 0)
test.filter = test[-all.rows.to.remove, ] else test.filter = test
test.filter
})
h = do.call("rbind", f)
all.timewindows.list[[k]] = h
print(k)
}
save(all.timewindows.list, file = "all.timewindows.list.oct_60.rdata")
This part of the workflow takes the time window data and converts the data into a two sets of matrices that together can be used to create social networks for any combo of territories and days.
In Part 1, we created a dataset ‘time windows’ of presence of individuals at base stations in which we filtered detections based on signal strength–i.e., if an individual was detected at multiple locations, we only keep the data for the location that had the strongest signal strength–i.e., the station that the individual was closest to. Let’s load that datset here:
load("all.timewindows.list.oct_60.rdata")
This R data file contains a list called all.timewindows.list
, which is a list with the data for each day saved as separate items. We can see this because it has 31 items (one for each day).
class(all.timewindows.list)
## [1] "list"
length(all.timewindows.list)
## [1] 31
Let’s make it easier and combine this all into one comprehensive dataframe.
all.timewindows = do.call("rbind", all.timewindows.list)
Here, we will walk you through the idea for constructing two different matrices:
a co-presence matrix, which summarizes the number of seconds that two individuals were present at the same receiver station.
a simultaneous detection matrix, which summarizes the number of seconds that two individuals were detected at the same time ANYWHERE (i.e., at any receiver station, including cases where they are at the same station). This gives us information about instances where we knew where both individuals were at the same time.
We will show later how the information from these two matrices can be combined to calculate association indices in Section 2.3.
We will work through how these matrices are constructed using a small subset of the data, then introduce a routine that generates the matrices across multiple days (Figure S11, S12).
The first step involves converting the time windows data into a \(x\) x \(n\) “presence matrix” in which \(x\) rows is time–here, every second of the day between 5am and 9pm–and \(n\) columns is all possible individuals. The reason to set this up with all possible seconds and all possible individuals is that we can later create arrays with equal-sized matrices no matter which station on which day.
I’ve made this routine into a function called build.mat()
in the master functions file. Note that this function takes time window data with columns called “start2” and “stop2”, which simply takes the seconds in the ‘start’ and ‘stop’ columns and converts them to time starting from 5am (to save some memory space).
build.mat = function(d) {
max.secs.day = 54001 #number of seconds starting from 5am up to 8pm
m = matrix(0, nrow = max.secs.day, ncol = length(ids), dimnames = list(1:max.secs.day,
ids))
for (i in 1:nrow(d)) {
a = seq(d[i, "start2"], d[i, "stop2"], 1)
bird = d[i, "bird"]
c = match(bird, ids)
m[a, c] = 1
}
return(m)
}
Using this function, we can make a presence matrix for all individuals seen at one station on one day:
sec.daystart = 60 * 60 * 5
max.secs.day = 54001
timewindows.dat = all.timewindows.list[[1]] #one day's worth of time windows data
pick.station = allstations[1]
daydat = try.data = timewindows.dat[timewindows.dat$group ==
pick.station, ] #data for one station for one day
daydat$date.num = as.numeric(as.Date(daydat$date[1]) - as.Date("2017-07-01"))
# make reduced dataset
d = daydat %>% select(bird, group, date.num, start, stop) %>%
arrange(bird)
d = as.data.frame(d)
# convert seconds to time starting at 5am and call these
# 'start2' and 'stop2'
d$start2 = d$start - sec.daystart
d$stop2 = d$stop - sec.daystart
# use the build.mat() function to generate presence matrix
m = build.mat(d)
The resulting matrix, m
is \(x\) seconds long and \(n\) individuals wide
dim(m)
## [1] 54001 131
This presence matrix can then be used to calculate a “co-presence matrix”, which is an \(n\) x \(n\) matrix with bird ID as both row and column, and the cell value is the number of seconds both of those birds were present at the station. Making this matrix is easy once you have the \(x\) x \(n\) matrix above. You simply have to multiply the transpose of the matrix to the matrix itself:
copresence = t(m) %*% m
This matrix is \(n\) x \(n\) matrix that counts up the number of seconds that any two individual were present at the same receiver station at the same time. The diagonal of the copresence matrix is simply the number of seconds that an individual was present at that station on that day, and is equivalent to the column sums of the presence matrix.
# dimensions of the copresence matrix
dim(copresence)
## [1] 131 131
# the column sums of presence matrix
as.numeric(colSums(m))
## [1] 0 0 0 0 0 0 0 0 0 0 0 0
## [13] 0 9 0 0 0 0 0 0 1 0 0 0
## [25] 0 0 574 0 0 0 0 0 0 0 0 0
## [37] 0 0 0 0 0 0 0 0 0 0 0 0
## [49] 23044 0 0 15 0 0 0 0 0 0 0 0
## [61] 0 0 0 0 0 0 0 738 0 0 0 0
## [73] 0 0 0 0 0 541 0 0 0 0 0 0
## [85] 0 0 0 0 0 0 0 0 0 0 584 0
## [97] 0 0 0 0 0 0 219 10412 0 0 0 0
## [109] 0 0 21 0 0 0 0 0 0 0 0 0
## [121] 0 0 0 0 0 0 0 0 0 73 0
# the diagonal of the co-presence matrix
as.numeric(diag(copresence))
## [1] 0 0 0 0 0 0 0 0 0 0 0 0
## [13] 0 9 0 0 0 0 0 0 1 0 0 0
## [25] 0 0 574 0 0 0 0 0 0 0 0 0
## [37] 0 0 0 0 0 0 0 0 0 0 0 0
## [49] 23044 0 0 15 0 0 0 0 0 0 0 0
## [61] 0 0 0 0 0 0 0 738 0 0 0 0
## [73] 0 0 0 0 0 541 0 0 0 0 0 0
## [85] 0 0 0 0 0 0 0 0 0 0 584 0
## [97] 0 0 0 0 0 0 219 10412 0 0 0 0
## [109] 0 0 21 0 0 0 0 0 0 0 0 0
## [121] 0 0 0 0 0 0 0 0 0 73 0
Now, we can work up to calculating the copresence matrices for all \(S\) stations in one day. We will save the results in a 3-dimensional array (\(n\) x \(n\) x \(S\)). But we will still save the presence matrix for each station (we will call it result.array1
) because we will use it in the next step.
# now do this for all stations in one day
# set up the arrays.
#'result.array1' is the presence matrices across all stations.
#'copresence.array1' is the co-presence matrices across all stations
result.array1 = array(dim = c(max.secs.day, length(ids), length(allstations)))
copresence.array1 = array(dim = c(length(ids), length(ids), length(allstations)))
# using a loop, pick one station, restrict the data to that
# station, and go through the same routine as shown above.
for (j in 1:length(allstations)) {
pick.station = allstations[j]
try.data = timewindows.dat[timewindows.dat$group == pick.station,
]
if (nrow(try.data) == 0)
next else daydat = try.data
daydat$date.num = as.numeric(as.Date(try.data$date[1]) -
as.Date("2017-07-01"))
# make reduced dataset
d = daydat %>% select(bird, group, date.num, start, stop) %>%
arrange(bird)
d = as.data.frame(d)
d$start2 = d$start - sec.daystart #convert into seconds from 5am
d$stop2 = d$stop - sec.daystart #convert into seconds starting from 5am
# Make daily presence matrix
m = build.mat(d) #so the columns here should be sorted by ids
result.array1[, , j] = m
copresence.array1[, , j] = t(m) %*% m
}
Now we will make the “simultaneous detection matrix”–i.e., matrix of seconds when A+B are both detected across all stations. To do this, we first sum up the detection array across all stations. This results in a matrix that shows whether an individual was detected in any station at a given second. Then, we multiply the transpose of that matrix with itself to get a \(n\) x \(n\) matrix that shows the number of seconds that any two individuals were both detected within the study site.
# first, sum up the detection array across all stations.
# This results in a matrix that shows whether an individual
# was detected in any station at a given second.
aa = apply(result.array1, c(1, 2), sum, na.rm = T)
# the resulting matrix should just have 1s and 0s, but just
# in case convert any >1 to 1.
aa[aa > 1] = 1
# the simultaneous detection array is the transpose of this
# matrix multiplied to itself.
simul.detection.matrix = t(aa) %*% aa
The result is two matrices: (1) an \(n\) x \(n\) x \(S\) 3-dimensional array copresence.array1
containg copresence matrices for each of \(S\) stations, and (2) an \(n\) x \(n\) matrix (simul.detection.matrix
) that counts the number of seconds any two individuals were both detected somewhere in the study site.
dim(copresence.array1)
## [1] 131 131 42
dim(simul.detection.matrix)
## [1] 131 131
Now, we have to make the presence matrix for each day, and calculate the simultaneous detection and co-presence matrices for each day. My computer can’t save the array with the raw presence matrix (54,001 seconds x 131 individuals x 42 stations x 31 days) because it will run out of memory. So, we will generate two arrays: (1) the co-presence array (\(n\) x \(n\) x \(S\) x \(T\) copresence.array
) and (2) the simultaneous detection array (\(n\) x \(n\) x \(T\) detection.array
).
I then save this into a .Rdata file, which we will use in Part 3 to calculate association indices and construct the network.
sec.daystart = 60 * 60 * 5
max.secs.day = 72000
detection.array = array(dim = c(length(ids), length(ids), length(all.timewindows.list)),
dimnames = list(ids, ids, 1:length(all.timewindows.list)))
copresence.array = array(dim = c(length(ids), length(ids), length(allstations),
length(all.timewindows.list)), dimnames = list(ids, ids,
allstations, 1:length(all.timewindows.list)))
#
for (k in 1:length(all.timewindows.list)) {
result.array1 = array(dim = c(max.secs.day, length(ids),
length(allstations)))
timewindows.dat = all.timewindows.list[[k]] #one day's worth of time windoes data
for (j in 1:length(allstations)) {
pick.station = allstations[j]
try.data = timewindows.dat[timewindows.dat$group == pick.station,
]
if (nrow(try.data) == 0)
next else daydat = try.data
daydat$date.num = as.numeric(as.Date(try.data$date[1]) -
as.Date("2017-07-01"))
# make reduced dataset
d = daydat %>% select(bird, group, date.num, start, stop) %>%
arrange(bird)
d = as.data.frame(d)
d$start2 = d$start - sec.daystart
d$stop2 = d$stop - sec.daystart
# Make daily presence matrix
m = build.mat(d) #so the columns here should be sorted by ids
result.array1[, , j] = m
# make co-presence matrix and add to array
copresence.mat = t(m) %*% m
copresence.array[, , j, k] = copresence.mat
}
# make matrix of seconds when A+B are both detected across
# all stations and add to detection.array
aa = apply(result.array1, c(1, 2), sum, na.rm = T)
bb = aa
bb[bb > 1] = 1
detection.array[, , k] = t(bb) %*% bb
print(k)
}
save(detection.array, copresence.array, file = "result_arrays_60sec.rdata")
In Part 2, we created an .rdata file that contained two arrays: the copresence array and the simultaneous detection array. We will load that here.
load("result_arrays_60sec.rdata")
This R data file contains a list called all.timewindows.list
, which is a list with the data for each day saved as separate items. We can see this because it has 31 items (one for each day).
dim(copresence.array)
## [1] 131 131 42 31
dim(detection.array)
## [1] 131 131 31
Here, I will provide a quick background on two potential association indices we could use: the Simple Ratio Index (SRI: Cairns & Schwager 1984) and the Very Simple Ratio Index (vSRI: Hoppitt & Farine 2018).
The SRI has been one of the most widely used association index in studies of animal social networks. It takes the frequency of observations in which two individuals (A and B) were seen together and divides it by the number of observations that A and B were observed in total:
\(SRI = \frac{x}{x + y_A + y_B + y_{AB}}\)
where \(x\) is the number of observation periods that individual A and B are observed together, \(y_A\) and \(y_B\) are the number of observation periods that A is observed without B and B is observed without A, respectively. Finally, \(y_{AB}\) is the number of observation periods where both A and B are observed but not together.
In the case of automated telemetry data, each observation period can be any discrete temporal scale. In our case, we use each second of the day as the observation period, so the SRI captures the association rate based on the number of seconds that A and B are observed together over the study period.
Hoppitt & Farine (2018) conducted a simulation study to test the robustness of SRI and other association indices under two common sources of sampling error, group location error (i.e., not all groups are observed) and individual identification error (not all individuals in a group are identified). In this study, they show that a modified version of the SRI, which they term vSRI, is a robust, unbiased estimator of association rates when the only source of error is individual identification error (i.e., all groups are located, but some individuals within those groups are missed):
\(vSRI = \frac{x}{x + y_{AB}}\)
This simpler association index is unbiased in the absence of group location error because we simply calculate the rate at which A and B are observed together during the observation periods when the locations of both individuals are known. With automated telemetry systems, the \(vSRI\) may be a useful association index when two criteria are met: (1) we suppose that all sites of relevance to the social behavior under study is monitored by a functioning receiver station, and (2) all receiver stations are fully functioning for the duration of the study period (main text Figure 2). In our worked example, we know that neither of these criteria are met: not all territories contained receiver stations, and there were suspected bouts of receiver failure (i.e., days when a receiver station did not log any tags). However, we provide the codes for vSRI here to illustrate the possibility of using this index.
Instead, we primarily use SRI as our main association index despite the potential biases that could have arisen from incomplete sampling (Hoppitt & Farine 2018). Future studies could attempt to collect validation data needed to calculate association indices that are robust to group location error as well, such as GRECI and CECI (see Hoppitt & Farine 2018).
Here, we outline how to calculate the SRI and vSRI using the two arrays we have generated in Part 2.
Recall that we have two arrays: (1) the \(n\) x \(n\) x \(S\) x \(T\) copresence array, which is a stack of copresence matrices across all stations across all days. (2) the \(n\) x \(n\) x \(T\) simultaneous detection array, which is a stack of the matrices of number of seconds that each pair of birds were detected at the same time anywhere, and stacked across days.
From here, we can get all of the values needed to calculate each association index:
So, the denominator of SRI (\(x + y_{AB} + y_A + y_B\)) is (3) - (2) - (1)
## Calculate Simple Ratio Index from co-presence matrix
get_SRI = function(copresence, detections) {
# total #seconds A and B were seen anywhere, with
# double-counting y_ab and x
sum1 = outer(diag(detections), diag(detections), "+")
diag(sum1) = 0
# number of secs a and b were both seen at the same time, but
# not in the same place
y_ab = detections - copresence
diag(y_ab) = 0
# denominator is the total number of times A and B were seen
# total but subtracting x and y_ab because these are
# double-counted.
adj = copresence/(sum1 - copresence - y_ab)
diag(adj) = 0
adj[is.nan(adj)] = 0
adj
}
For vSRI, the denominator is simply \(x + y_{AB}\)
get_vSRI = function(copresence, detections) {
# total #seconds A and B were seen anywhere, with
# double-counting y_ab and x
sum1 = outer(diag(detections), diag(detections), "+")
diag(sum1) = 0
# number of secs a and b were both seen at the same time, but
# not in the same place
y_ab = detections - copresence
diag(y_ab) = 0
adj = copresence/(copresence + y_ab)
diag(adj) = 0
adj[is.nan(adj)] = 0
adj
}
To calculate the association index across all individuals for one day, we can sum the copresence array across all stations for that day, and simply take the simultaneous detection matrix for that day. We then plug in the resulting matrices into the get_SRI function:
# getting SRI for day 1 only
cpm1 = apply(copresence.array[, , , 1], c(1, 2), sum, na.rm = T)
sdm1 = detection.array[, , 1]
sri1 = get_SRI(cpm1, sdm1)
This matrix then becomes our adjacency matrix of the network. We can use the igraph
package to generate the network. Here, we delete the “isolates” i.e., the nodes that are not connected to any edges.
# convert adjacency matrix into network
g.sri1 = graph_from_adjacency_matrix(sri1, "undirected", weighted = T)
# remove isolates
filt.g.sri1 = delete_vertices(g.sri1, which(degree(g.sri1) ==
0))
# plot network
plot(filt.g.sri1, vertex.label = "", edge.width = E(filt.g.sri1)$weight *
30, vertex.size = 8, edge.color = "black")
array_to_net()
function to make it easierWe can make the routine easier by making the routine into a function:
# dates takes values 1 through 31; ai (association index)
# options are 'SRI' or 'vSRI'
array_to_net = function(copresence_array, detections_array, stations = dimnames(copresence_array)[[3]],
dates = as.numeric(dimnames(copresence_array)[[4]]), ai = "SRI") {
if (length(stations) > 0)
subdat.cp = copresence_array[, , which(dimnames(copresence_array)[[3]] %in%
stations), ] else subdat.cp = copresence_array
if (length(stations) == 1)
subdat.cp = subdat.cp[, , match(dates, dimnames(copresence_array)[[4]])] else subdat.cp = subdat.cp[, , , match(dates, dimnames(copresence_array)[[4]])]
sumstations.cp = apply(subdat.cp, c(1, 2), sum, na.rm = T)
cpm = sumstations.cp[which(rowSums(sumstations.cp) > 0),
which(rowSums(sumstations.cp) > 0)]
detections.use = apply(detections_array[, , match(dates,
dimnames(detections_array)[[3]])], c(1, 2), sum, na.rm = T)
sdm = detections.use[which(rowSums(sumstations.cp) > 0),
which(rowSums(sumstations.cp) > 0)]
if (ai == "SRI")
adj = get_SRI(cpm, sdm) else if (ai == "vSRI")
adj = get_vSRI(cpm, sdm) else print("Association index not recognized")
g = graph_from_adjacency_matrix(adj, "undirected", weighted = T)
V(g)$count = diag(sdm)
g
}
An extra feature of this function is that the output graph will contain a vertex attribute called “count”, which tabulates the number of seconds that each individual is detected during the designated days. We will use this later in Part 3.4.1 when we filter by observation.
We can use this routine to quickly make a network from the first day of data using the SRI index.
g.trial = array_to_net(copresence.array, detection.array, dates = 1,
ai = "SRI")
We can show that this the same network as what we created above (filt.g.sri1
) by plotting the two using the same layout.
par(mfrow = c(1, 2))
layout.trial = layout_with_fr(g.trial)
plot(g.trial, vertex.label = "", edge.width = E(g.trial)$weight *
30, vertex.size = 8, edge.color = "black", layout = layout.trial)
plot(filt.g.sri1, vertex.label = "", edge.width = E(filt.g.sri1)$weight *
30, vertex.size = 8, edge.color = "black", layout = layout.trial)
*put the graphics paramter back to one row, one column
par(mfrow = c(1, 1))
To plot the total aggregated network, we can sum the copresence array across all stations and days, and also sum the simultaneous detection array across all days. This is done automatically in the array_to_net()
function if we just leave the “stations” and dates" arguments null.
g.sri = array_to_net(copresence.array, detection.array, ai = "SRI")
We can add vertex attributes. We’ll take the vertex names from the margin name of the array, and we’ll also add a GROUP attribute, which is the home group of the individual (we get this from the individual attributes file using a match()
function).
V(g.sri)$GROUP = as.character(id.dat[match(V(g.sri)$name, id.dat$BIRD),
"GROUP"])
Then, we filter out the isolates. We will also filter out individuals that do not have a receiver station at their home territory. This way, we restrict the network to those individuals that could be observed both at home and while visiting other territories.
Here, we’ll plot the vertex colors by home group membership (though there will be overlap in group colors right now).
# filter out isolates
filt.g.sri = delete.vertices(g.sri, which(degree(g.sri) == 0))
# filter out individuals whose home group doesn't have a base
# station
filt.g.sri = delete.vertices(filt.g.sri, is.na(match(V(filt.g.sri)$GROUP,
allstations)))
V(filt.g.sri)$color = as.numeric(as.factor(V(filt.g.sri)$GROUP))
plot(filt.g.sri, vertex.label = "", vertex.size = 8, edge.width = E(filt.g.sri)$weight *
50)
Part of the challenge of visualizing social networks is figuring out a way to display both the social and spatial relationships of elements (here, individuals). In the case of acorn woodpeckers, there are associations between each individual, but the social groups that individuals belong to also occupy discrete territories in space. How can we capture both of these factors at the same time?
Here, we will use a color gradient to assign colors to individuals based on the 2-D spatial location of their home territory. We can do this by using a color scheme where the hue and saturation of colors depend on the distance and direction of the territory from a particular location (center of the study site).
hsv()
The hsv()
function allows you to specify colors using hue, saturation and value.
We can use this function to create a color scheme that acts like a color wheel. For example, we can have the color hue of a point represent the angle around a center point, and the saturation of the color represent the distance from the center.
## demo of color scheme using hsv()
# a vector of angles dividing a circle 36 times (in radians),
# repeated 10 times
angles = rep(seq(2 * pi/36, 2 * pi, length.out = 36), 10)
# a vector of lengths (1 through 10), repeated 36 times
lengths = as.numeric(sapply(1:10, function(x) rep(x, 36)))
# use trigonometry to figure out x and y positions of each
# point
ys = -(lengths * cos(angles))
xs = -(lengths * sin(angles))
# convert angles and lengths into values between 0 and 1
angles.norm = angles/(2 * pi)
lengths.norm = lengths/10
# now plot all points, with hues representing the angle and
# the saturation of the color representing the distance from
# the center.
plot(xs, ys, pch = 21, bg = hsv(h = angles.norm, s = lengths.norm,
v = 1), xlab = "x", ylab = "y")
Using the color scheme above, we can assign colors to base stations based on angle and distance from the center of the study site. This becomes useful when we compare between different network layouts.
First, we’ll take the GPS easting and northing values of all group territories from the group data and convert it to latitutde/longitude coordinates.
#### set up territory colors using hsv
utmcoord = SpatialPoints(cbind(group.dat$easting, group.dat$northing),
proj4string = CRS("+proj=utm +zone=10"))
longlatcoord = as.data.frame(spTransform(utmcoord, CRS("+proj=longlat")))
group.dat$long = longlatcoord$coords.x1
group.dat$lat = longlatcoord$coords.x2
We’ll now restrict the data to the group territories of birds that are included in this data. Then, we’ll get the coordinates for the center of this cluster of points.
use.groups = group.dat[match(unique(V(filt.g.sri)$GROUP), group.dat$GROUP),
]
center.lat = mean(use.groups$lat)
center.long = mean(use.groups$long)
Then, we can get the angles and distances of each group territory from this ‘center point’.
home.angles = bearing(matrix(rep(c(center.long, center.lat),
nrow(use.groups)), ncol = 2, byrow = T), matrix(c(use.groups$long,
use.groups$lat), ncol = 2))
distances = distHaversine(matrix(rep(c(center.long, center.lat),
nrow(use.groups)), ncol = 2, byrow = T), matrix(c(use.groups$long,
use.groups$lat), ncol = 2))
Now, we can normalize these angle and distance values to be between 0 and 1, and we’ll add these values to the use.groups dataframe.
## set the colors that we'll use in the hsv() function.
use.groups$hue = (home.angles + 180)/360
use.groups$saturation = distances/max(distances)
use.groups$saturation[which(use.groups$saturation > 1)] = 1
We now organize the data into a dataframe where each row is social group name, with the color we are going to use for that group, generated by the hsv()
function, with our hue and saturation values, and also get the latitude and longitude for the group territory by looking it up in the use.groups
dataframe above.
col.dat.home = data.frame(home = unique(V(filt.g.sri)$GROUP),
color = hsv(use.groups[match(unique(V(filt.g.sri)$GROUP),
use.groups$GROUP), "hue"], use.groups[match(unique(V(filt.g.sri)$GROUP),
use.groups$GROUP), "saturation"], v = 1))
Now we use this dataframe as reference and plot all group territories in space.
# plot a blank plot with bounds
plot(-125.54, 36.38, xlim = c(-121.57, -121.54), ylim = c(36.355,
36.4), type = "n", ylab = "", xlab = "", las = 1, main = "All receiver locations for this study")
points(use.groups$long, use.groups$lat, pch = 21, cex = 1.5,
bg = hsv(use.groups$hue, use.groups$saturation, v = 1))
Now, we can assign each node a color based on this territory color scheme and replot the network.
V(filt.g.sri)$color = hsv(use.groups[match(V(filt.g.sri)$GROUP,
use.groups$GROUP), "hue"], use.groups[match(V(filt.g.sri)$GROUP,
use.groups$GROUP), "saturation"], v = 1)
plot(filt.g.sri, vertex.label = "", vertex.size = 8, edge.width = E(filt.g.sri)$weight *
30, edge.color = "black", main = "Figure 5b")
As discussed above, SRI is but one method of calculating the association index. Another association index, the vSRI, may be more unbiased for datasets in which the receivers were fully operational but individuals were not consistently detected (e.g., due to tag failure or imperfect detection). Here, we demonstrate how we can build the social network of acorn woodpeckers using the vSRI and compare the results to what we see with the SRI.
First, let’s build two networks–one using SRI and other using vSRI–from one day of data. We will also figure out which receiver stations had at least one detection of any bird that day, as we can infer that those receivers were functional during that day.
# getting SRI and vSRI using matrices from day 1 only
g.sri1 = array_to_net(copresence.array, detection.array, dates = 1,
ai = "SRI")
g.vsri1 = array_to_net(copresence.array, detection.array, dates = 1,
ai = "vSRI")
# which stations had at least one detection (i.e., was known
# to be functional)
used.stations = which(apply(copresence.array[, , , 1], 3, sum,
na.rm = T) > 0)
Now, we will assign group membership to individuals using the id.dat data, filter out the isolates, and also filter out individuals who came from groups that did not have a functional receiver station that day.
# assign groups
# with SRI
V(g.sri1)$GROUP = as.character(id.dat[match(V(g.sri1)$name, id.dat$BIRD),
"GROUP"])
# with vSRI
V(g.vsri1)$GROUP = as.character(id.dat[match(V(g.vsri1)$name,
id.dat$BIRD), "GROUP"])
# remove isolates
filt.g.sri1 = delete_vertices(g.sri1, which(degree(g.sri1) ==
0))
filt.g.vsri1 = delete_vertices(g.vsri1, which(degree(g.vsri1) ==
0))
# filter out individuals whose home group receiver station
# wasn't functioning that day.
filt.g.sri1 = delete.vertices(filt.g.sri1, is.na(match(V(filt.g.sri1)$GROUP,
names(used.stations))))
filt.g.vsri1 = delete.vertices(filt.g.vsri1, is.na(match(V(filt.g.vsri1)$GROUP,
names(used.stations))))
We then add color to the vertices using the same procedure as before and plot the result. This network uses the Fruchterman-Reingold force-directed layout (by default). To illustrate the difference between the indices, we will use the same scale for edge width here.
# add color to vertices
V(filt.g.sri1)$color = hsv(use.groups[match(V(filt.g.sri1)$GROUP,
use.groups$GROUP), "hue"], use.groups[match(V(filt.g.sri1)$GROUP,
use.groups$GROUP), "saturation"], v = 1)
V(filt.g.vsri1)$color = hsv(use.groups[match(V(filt.g.vsri1)$GROUP,
use.groups$GROUP), "hue"], use.groups[match(V(filt.g.vsri1)$GROUP,
use.groups$GROUP), "saturation"], v = 1)
# plot network
par(mfrow = c(1, 2), mar = c(2, 2, 2, 2))
plot(filt.g.sri1, vertex.label = "", edge.width = E(filt.g.sri1)$weight *
30, vertex.size = 8, edge.color = "black", main = "with SRI")
plot(filt.g.vsri1, vertex.label = "", edge.width = E(filt.g.vsri1)$weight *
30, vertex.size = 8, edge.color = "black", main = "with vSRI")
As you can see, the vSRI produces much thicker edges because the association indices tend to be larger. In fact, vSRI will always be larger than SRI because its denominator is smaller. When vSRI = 1, it means that whenever both A and B were detected at the same time, they were in the same place.
par(mfrow = c(1, 1))
Let’s plot the vSRI against the SRI for the same set of edges to see their relationship.
plot(E(filt.g.sri1)$weight, E(filt.g.vsri1)$weight, pch = 21,
bg = gray(0.5, 0.5), ylab = "vSRI", xlab = "SRI", xlim = c(0,
1), ylim = c(0, 1))
As you can see, the vSRI is always larger than SRI. Some edges that have very low SRI have very high vSRI–these are likely pairs of individuals in which one of the birds was seen more frequently than the other, but when they both were seen at the same time, they were always at the same location.
Biologically speaking, which association index is more appropriate–vSRI or SRI–likely depends on both the sources of sampling error (as described in Hoppitt & Farine 2018), but also the biological question of interest and the behavioral variation of individuals. If some birds foray a lot but do not spend much time within the core area of their home territory, then their vSRI with members of other group members end up being extremely high–more than is biologically warranted. But ongoing debate and fine-tuning of association indices can potentially help generate better-resolved social networks in the long run.
Here, we demonstrate a couple of other filtering techniques we use to quality-control the data in the paper.
One thing we can do is remove the birds that are rarely seen–let’s say we put the cutoff at 31 times (i.e., once per day on average):
# figure out which birds are rarely seen in the whole data.
rarely.seen = V(g.sri)$name[which(V(g.sri)$count < 31)]
# restrict the arrays to those individuals that are not in
# the 'rarely.seen' string. This part could be made simpler
# with a function.
copresence.array.reduce = copresence.array[-which(dimnames(copresence.array)[[1]] %in%
rarely.seen), -which(dimnames(copresence.array)[[1]] %in%
rarely.seen), , ]
detection.array.reduce = detection.array[-which(dimnames(detection.array)[[1]] %in%
rarely.seen), -which(dimnames(detection.array)[[1]] %in%
rarely.seen), ]
# make the network
g.sri2 = array_to_net(copresence.array.reduce, detection.array.reduce,
ai = "SRI")
# assign group membership
V(g.sri2)$GROUP = as.character(id.dat[match(V(g.sri2)$name, id.dat$BIRD),
"GROUP"])
# filter out isolates
filt.g.sri2 = delete.vertices(g.sri2, which(degree(g.sri2) ==
0))
# filter out individuals whose home group doesn't have a base
# station
filt.g.sri2 = delete.vertices(filt.g.sri2, is.na(match(V(filt.g.sri2)$GROUP,
allstations)))
filt.g.sri2 = delete_edges(filt.g.sri2, which(E(filt.g.sri2)$weight <
quantile(E(filt.g.sri2)$weight, prob = 0.1)))
V(filt.g.sri2)$color = hsv(use.groups[match(V(filt.g.sri2)$GROUP,
use.groups$GROUP), "hue"], use.groups[match(V(filt.g.sri2)$GROUP,
use.groups$GROUP), "saturation"], v = 1)
plot(filt.g.sri2, edge.width = E(filt.g.sri2)$weight * 50, vertex.size = 8,
vertex.label = "", edge.color = gray(0.3), layout = layout_with_fr(filt.g.sri2),
edge.curved = F)
assortment.discrete(as_adjacency_matrix(filt.g.sri2, attr = "weight",
sparse = F), V(filt.g.sri2)$GROUP)$r
## [1] 0.5737661
To do this, we have to go back to Part 2.1.4, where we create the copresence and simultaneous detection arrays. Here, we need to restrict the data to just the detections of individuals away from their home group. So that routine looks like this, and will save an .Rdata file called visitor_result_arrays_60sec.Rdata
, which will contain two arrays: visitor.copresence.array
and visitor.detection.array
.
*Note this routine needs to be run within the code script in Part 2, since some of the object names refer to ones we established there.
visitor.detection.array = array(dim = c(length(ids), length(ids),
length(all.timewindows.list)))
visitor.copresence.array = array(dim = c(length(ids), length(ids),
length(allstations), length(all.timewindows.list)))
#
for (k in 1:length(all.timewindows.list)) {
result.array1 = array(dim = c(max.secs.day, length(ids),
length(allstations)))
timewindows.dat = all.timewindows.list[[k]]
for (j in 1:length(allstations)) {
pick.station = allstations[j]
# here is where we remove the data where the individual is at
# its home territory.
try.data = timewindows.dat[!timewindows.dat$bird %in%
id.dat[which(id.dat$GROUP == pick.station), "BIRD"] &
timewindows.dat$group == pick.station, ]
if (nrow(try.data) == 0)
next else daydat = try.data
daydat$date.num = as.numeric(as.Date(try.data$date[1]) -
as.Date("2017-07-01"))
# make reduced dataset
d = daydat %>% select(bird, group, date.num, start, stop) %>%
arrange(bird)
d = as.data.frame(d)
d$start2 = d$start - sec.daystart
d$stop2 = d$stop - sec.daystart
# Make daily presence matrix
m = build.mat(d)
result.array1[, , j] = m
# make co-presence matrix and add to array
copresence.mat = t(m) %*% m
visitor.copresence.array[, , j, k] = copresence.mat
}
# make matrix of seconds when A+B are both detected across
# all stations and add to detection.array
aa = apply(result.array1, c(1, 2), sum, na.rm = T)
bb = aa
bb[bb > 1] = 1
visitor.detection.array[, , k] = t(bb) %*% bb
print(k)
}
dimnames(visitor.detection.array) = list(ids, ids, 1:length(all.timewindows.list))
dimnames(visitor.copresence.array) = list(ids, ids, allstations,
1:length(all.timewindows.list))
save(visitor.detection.array, visitor.copresence.array, file = "visitor_result_arrays_60sec.rdata")
Now, we can go through the same routine as 3.2 to plot the visitor network
load("visitor_result_arrays_60sec.rdata")
Make the visitor network
visit.g.sri = array_to_net(visitor.copresence.array, visitor.detection.array,
ai = "SRI")
V(visit.g.sri)$GROUP = as.character(id.dat[match(V(visit.g.sri)$name,
id.dat$BIRD), "GROUP"])
# filter out isolates
filt.visit.g.sri = delete.vertices(visit.g.sri, which(degree(visit.g.sri) ==
0))
# filter out rarely seen birds
filt.visit.g.sri = delete.vertices(filt.visit.g.sri, which(V(visit.g.sri)$count <
31))
visitor.group.dat = group.dat[match(unique(V(visit.g.sri)$GROUP),
group.dat$GROUP), ]
all.home.angles = bearing(matrix(rep(c(center.long, center.lat),
nrow(visitor.group.dat)), ncol = 2, byrow = T), matrix(c(visitor.group.dat$long,
visitor.group.dat$lat), ncol = 2))
all.distances = distHaversine(matrix(rep(c(center.long, center.lat),
nrow(visitor.group.dat)), ncol = 2, byrow = T), matrix(c(visitor.group.dat$long,
visitor.group.dat$lat), ncol = 2))
visitor.group.dat$hue = (all.home.angles + 180)/360
visitor.group.dat$saturation = all.distances/max(all.distances,
na.rm = T)
visitor.group.dat$saturation[which(visitor.group.dat$saturation >
1)] = 1
V(filt.visit.g.sri)$color = hsv(visitor.group.dat[match(V(filt.visit.g.sri)$GROUP,
visitor.group.dat$GROUP), "hue"], visitor.group.dat[match(V(filt.visit.g.sri)$GROUP,
visitor.group.dat$GROUP), "saturation"], v = 1)
spat.layout.visit = make_spatial_layout(net = filt.visit.g.sri,
id_data = id.dat, group_data = visitor.group.dat)
plot(filt.visit.g.sri, edge.width = E(filt.visit.g.sri)$weight *
30, vertex.size = 5, vertex.label = "", layout = spat.layout.visit,
edge.curved = T, edge.color = "black", main = "Figure 6a")
In Figures 6b-d in the main text, we show how the structure of the visitor-only network is very different depending on the territory where you are monitoring the visitors. This is because territories vary in ‘how interesting’ they are to visitors. We do not yet know exactly what factors lead to this variability–e.g., is it the resources (granaries) that are there, or the specific group members that are there, etc? That will be an ongoing area of research. For now, we just demonstrate how we can generate site-specific social networks. To do this, we generate separate networks by receiver station by using the “stations” argument in the array_to_net()
function.
g.MLF2 = array_to_net(visitor.copresence.array, visitor.detection.array,
stations = "MLF2")
g.HORS = array_to_net(visitor.copresence.array, visitor.detection.array,
stations = "HORS")
g.1800 = array_to_net(visitor.copresence.array, visitor.detection.array,
stations = "1800")
g.SHAN = array_to_net(visitor.copresence.array, visitor.detection.array,
stations = "SHAN")
Then, we organize these networks into a list.
nets.list = list(MLF2 = g.MLF2, HORS = g.HORS, `1800` = g.1800,
SHAN = g.SHAN)
Then, we can just use a loop (or you can use an apply function if you like) to plot those networks independently into different panels.
par(mfrow = c(2, 2), mar = c(1, 1, 1, 1))
for (i in 1:length(nets.list)) {
g = nets.list[[i]]
spl = make_spatial_layout(net = g, id_data = id.dat, group_data = visitor.group.dat)
V(g)$status = as.character(id.dat[match(V(g)$name, id.dat$BIRD),
"SEXSTATUS"])
V(g)$GROUP = as.character(id.dat[match(V(g)$name, id.dat$BIRD),
"GROUP"])
V(g)$color = hsv(visitor.group.dat$hue[match(V(g)$GROUP,
visitor.group.dat$GROUP)], visitor.group.dat$saturation[match(V(g)$GROUP,
visitor.group.dat$GROUP)], v = 1)
plot(g, edge.width = E(g)$weight * 100, vertex.size = 0.1,
vertex.label = "", edge.color = "black", layout = spl,
edge.curved = T, rescale = F, xlim = c(-121.57, -121.54),
ylim = c(36.36, 36.395), asp = 0, frame = T, main = paste(names(nets.list)[[i]]))
points(visitor.group.dat$long[which(visitor.group.dat$GROUP ==
names(nets.list)[[i]])], visitor.group.dat$lat[which(visitor.group.dat$GROUP ==
names(nets.list)[[i]])], pch = "*", col = "red", cex = 2)
}
Gaussian Mixture Modelling (GMM) is now commonly used in the workflow to build social networks from datastreams, primarily in the context of RFID-based studies of social association (Psorakis et al. 2012, 2015). In the main text, we briefly mention why Gaussian Mixture Models are not appropriate for the type of data we use here (i.e., detection of individuals at territories). We elaborate on the reasoning here.
Psorakis et al. (2012, 2015) first applied GMM to a data stream that represented how RFID-tagged birds visit bird feeders in Wytham Woods, UK. It is important to note that this method was designed to solve a particular solution–how do we develop robust criteria for identifying ‘foraging groups’ from temporal data streams alone? That is, if we can’t know anything about spatial proximity of individuals, how can we figure out what is a group just based on the time stamps of visits to a particular location (i.e., an antenna that is placed in front of a bird feeder)?
Prior to the GMM approach, people imposed arbitrary thresholds to define associations. So, let’s say, our threshold is 10 seconds. Then, we would say that any pair of birds that were logged at the antenna within 10 seconds of each other are associated. But what do we know that birds that visit the feeder 11 seconds apart are NOT part of the same group? And are birds that visit within 9 seconds of each other always in the same group?
The GMM approach avoids this arbitrariness by using machine learning to detect ‘bursts’ of activity at the bird feeder, which we call ‘gathering events’. Birds that appear in the same gathering event are then considered part of the same group.
In summary, the GMM approach is geared towards finding bursts of activity that indicate time periods where a group of individuals came to forage at a feeder collectively. In theory, automated telemetry systems can be used to accomplish a similar task–i.e., to detect bursts of activity within range of a receiver station.
But there are at least two critical differences between automated telemetry data and RFID data. First, individuals could ‘reside’ within range of a receiver in automated telemetry systems, which extends to hundreds of meters, whereas this would likely not happen with RFID systems where the detection range is less than one meter. This is especially true when automated telemetry receivers are placed in territories–like our woodpecker study. Another difference is that multiple individuals can be detected nearly simultaneously at a single automated telemetry receiver, whereas RFID data will only detect one individual (because only one individual can fit within the antenna’s detection range). This changes the density of the data stream we expect to get from automated telemetry and RFID.
Basically, the reason we do not use GMM approaches to construct social networks here is because of these and primarily the former reason–i.e., the receivers are set up to detect ‘residency’ rather than ‘gathering events’. When an automated telemetry system is designed to geared towards detecting such gathering events, it may be perfectly fine to use GMM methods (as seen in Jacoby et al. 2016). But we suspect that there are many situations in which GMM approach is not well-suited for detecting associations in automated telemetry.
In summary, the Gaussian Mixture Model approach for detecting gathering events is not designed to delineate association patterns when some individuals simply stay at a location for extended periods of time regardless of group structure. This is why we do not believe it is appropriate to use this approach for building socical networks with automated telemetry systems embedded in territories.
As an illustration, the Figure S25 below shows what the ‘gathering events’ look like when applied to our woodpecker data. The figure shows detections at MACR over 2 hours (as shown above), but with the time ranges of ‘gathering events’ detected using the gmmevents()
function in the asnipe
package.
You can see that almost every second of the 2 hour period is considered a gathering event, and many of the events are defined by gaps in the tag bursts, which could simply be detection error.