This document provides the guide to the R codes for the workflow for constructing social networks from automated telemetry data, as described in Shizuka et al., Constructing social networks from automated telemetry data: a worked example using within- and across-group associations in cooperatively breeding birds. This worked example uses one month (October 2018) worth of data from a study of acorn woodpeckers (Melanerpes formicivorus) at Hasting Reserve, California.

Set up the options and required packages:

knitr::opts_chunk$set(tidy.opts=list(width.cutoff=60), tidy=TRUE)
library(knitr)
library(tidyverse)
library(lubridate)
library(igraph)
library(assortnet)
library(rgdal)
library(REdaS)
library(geosphere)
source("ACWO_masterfunctions.R", local = knitr::knit_global())

Part 0: Load the master data and clean up

For the first steps, we are going to work with the raw data. Here, we will use the data from October 2018.

load("alloctober.rdata")

Here is what a snippet of the raw data would look like.

head(rawdat)
##     ST        DATE GROUP HOME  BIRD SEX STATUS ZTIME
## 1  -96 2018 Oct 12  CAVI CAVI B4935   M      B 43394
## 2  -96 2018 Oct 12  CAVI CAVI B4935   M      B 43398
## 3 -100 2018 Oct 16  CAVI CAVI B4935   M      B 36678
## 4  -91 2018 Oct 11  CAVI CAVI B4935   M      B 60377
## 5  -96 2018 Oct 12  CAVI CAVI B4935   M      B 39875
## 6  -95 2018 Oct 12  CAVI CAVI B4935   M      B 43832

There are few data cleanup tasks:

rawdat = rawdat[-which(rawdat$DATE == "DATE"), ]
rawdat$date.obj = ymd(rawdat$DATE)
rawdat$ztime.num = as.numeric(as.character(rawdat$ZTIME))
rawdat$GROUP = as.character(rawdat$GROUP)
rawdat$HOME = as.character(rawdat$HOME)
rawdat$BIRD = as.character(rawdat$BIRD)
rawdat$ST = as.numeric(as.character(rawdat$ST))

Now, we will load the individual attributes data, which includes the bird ID, their home group (GROUP), sex (M = male, F = female) and status (B = breeder, H = helper). Note that “GROUP” is confusing because when we look at the telemetry data, “GROUP” there indicates the base station where the tag was detected. This is unfortunate, but since this is what our raw data look like, we’ll keep it that way.

Note also that, in this data, there is a group called “A1-RE10”. We will simplify this to “A1”, which is where the closest base station is here.

id.dat = read.csv("RadioTagIDs_Final.csv")
id.dat$GROUP = as.character(id.dat$GROUP)

# change 'A1-RE10' to 'A1' so that we can match up groups
id.dat$GROUP[which(id.dat$GROUP == "A1-RE10")] = "A1"

# set up an object that has just the ids
ids = unique(as.character(id.dat$BIRD))

We will also import the locations of ALL group territories (including those that do not have base stations), as well as the locations for all receivers used in the whole study.

# all known group locations
group.dat = read.csv("gnames.csv")

Part 1: Converting raw detections into time windows of presence

The first major task is to convert the raw detections into an estimate of time windows during which a bird is present within the range of a base station.

The key here is to figure out what is the probability that an individual that stays within a given area is detected X seconds apart. An individual tag can be detected 2 seconds apart, but we expect there to be variation in this time difference between detection to detection due to error, etc. So, the question is, how much delay in detection can we have and still consider the individual to be continuously within the signal range?

Preview of what we are trying to do

To illustrate what we are trying to do, let’s start with a visualization of the detections of individuals at MACR for just two hours: 9-11am on October 25, 2018

*Figure S1: Raw detections data from 2 hours (9am - 11am) at territory **MACR** *

Figure S1: Raw detections data from 2 hours (9am - 11am) at territory MACR

You can see that the data is very dense. Each tick is a data point, with varying intervals between detections. Our goal here is to convert this data into time windows during which we consider the bird to be “present” at the station. If we can do that, then we can convert this data into something that looks like this:

*Figure S2: Detections over 2 hours (9am - 11am) at territory **MACR** converted to time windows of precence*

Figure S2: Detections over 2 hours (9am - 11am) at territory MACR converted to time windows of precence

Why?

Consolidating the individual detection data into these time windows has several advantages:

How to do it:

Here’s the general workflow for converting the raw detection data to time windows during which a bird is considered to be present at a station.

  1. Calculate time delays between successive detections for a given individual at a given station.

  2. Figure out the threshold value of time delay that we believe captures the pattern of presence/absence (e.g., for foraging or group visit forays).

  3. Group detections that are separated by time delays below the threshold value into a time window

1.1 Identifying patterns of detection intervals

Now, we are going to filter this data to just a single day’s worth of data. This routine uses piping %>%, which is available in tidyverse

As a first step towards generating the time window presence data, we can visualize the cumulative distribution of time differences between detections for an individual at its home base station. We will look at the home base station because this is likely to be where individuals are relatively stationary when they are present. That means that the majority of the time they are there, the time difference detection should be very short (i.e., close to 2 seconds). To do this, we will first create a list of dataframes…

# set a date
choose.date = "2018-10-25"

# filter the data to that date, then sort by 'ztime.num', and
# then group by social group and bird ID.  We then filter the
# data down further to only cases where the individual is
# seen at their home group, then create a new column called
# 'timediff', which is the time difference between the
# previous detection and the current detection.
timediff = rawdat %>% filter(date.obj == choose.date) %>% arrange(ztime.num) %>% 
    group_by(GROUP, BIRD) %>% filter(GROUP == HOME) %>% mutate(timediff = ztime.num - 
    lag(ztime.num))

# now split this into a list where each element is a
# different bird
timediff = split(timediff, list(timediff$BIRD, timediff$GROUP), 
    drop = T)

We then plot the accumulation curve of tag detections across a range of inter-detection intervals.

plot(c(0, 300), c(0, 1), type = "n", ylab = "Cumulative Frequency", 
    xlab = "Time delay between detections (seconds)")
for (i in 1:length(timediff)) {
    if (nrow(timediff[[i]]) < 100) 
        next else {
        t = table(timediff[[i]]$timediff)
        p = cumsum(t)/max(cumsum(t))
        points(sort(unique(timediff[[i]]$timediff)), p, type = "l", 
            col = gray(0.3, 0.5))
    }
}

Figure S3: Accumulation curve of inter-detection interval for each individual on one day

This figure shows a set of empirical cumulative distribution functions of time delays between detections (in seconds) for each bird that was detected AT ITS HOME GROUP on one day (2018-10-25).

Now take a closer look: Here I zoom in a bit to 0-100 seconds, and highlight four individuals here that have a weird pattern…

Figure S4: Accumulation curve of inter-detection interval for each individual on one day, with 4 individuals highlighted in red.

You can see that the 4 individuals shown in red (“B4113”,“B5227”, “B5274” and “B5571”) exhibit a flat line (of varying duration) between about 3-20 seconds of delay between detections. From what I can tell, these are caused by tags that must frequently pause for about 12-20 seconds or somehow not get detected by the base station.

One way to confirm that this problem comes from the tag and not the base station is to check the patterns of detections of these birds at basestations away from the home group–i.e., during visits to other groups. Here is what that looks like:

Figure S5: Accumulation curve of 4 individuals highlighted in red in Figure S4, as observed at stations away from their home group.

We can also confirm that this is not a base station issue but rather a tag issue. (1) The same individuals show this pattern on multiple days (2) Only these individuals show this pattern at a given base station on a given day. For example, here are birds detected at 100+ times at SHAN. The red lines are 5274 and 5571.

Figure S6: Accumulation curve of inter-detection interval for individuals observed at one territory (SHAN) on one day.

Let’s take a closer look at the data in another way. Here, I show the actual detections across time on the morning of 2018-10-25.

Figure S7: Zoomed in view of detections of individuals at SHAN over two hours. Blue box highlights typical detections with short bursts of detections. Red box highlights an individual detected at regular intervals at a slow rate.

What you can see is a difference in the temporal pattern of detections between, say 6119 and 5274. Whereas 6119 (e.g., look at blue box) displays dense clusters of detections, 5274 (e.g., look at red box) has constant detections that are spaced apart. If you look closer, you can actually see that 5274 is either detected 2-3 seconds apart or about 50 seconds apart, but rarely if ever in between those values. That explains the cumulative frequency patterns above.

The most likely scenario is that, when tags might sometimes run low on power due to their solar panel being covered or something like that. When this happens, it seems that the tag exhibits some kind of irregular “rhythm”.

Takeaway on tag errors

Accounting for possibility of unpredictable delays in detection intervals will be important for coming up with a good way to convert point detection into time windows of presence. This is not insurmountable, but it will require some decision making, ideally with some knowledge of the natural history of the organism. Here are couple of things we should think about doing:

  • Develop a way to detect which tags show delays in detection interval and when

  • As long as the threshold time delay we decide on is reliably longer than the detection interval, you should still be capturing all of these individuals.

  • The main problem will be in terms of capturing very brief visits to territories. If, say, a typical visit lasts less than the interval of detections of tags when they are glitchy (e.g., when they are low on power), then we may end up underestimating the frequency of visitations.

Currently, we have used 60 seconds as the threshold time delay for creating time windows (shown in red dashed line below). For the time being, I think this is ok because it exceeds the weird patterns generated by any tag in this dataset, and it seems biological plausible to assume that forays to other territories would take more than a minute most of the time.

Figure S8: Accumulation curve of inter-detection interval for each individual on one day. Red dash shows 60 seconds

1.2 Functions to consolidate detections into time windows and plot time windows for a given station on a given day.

We’ve created a function called generate_timewindow to convert raw data for a given date and station into time window data. The output for this function is a list, where each item is the set of time windows of presence for a given bird and the average signal strength for that time window.

generate_timewindows=function(rawdat, choose.date, pick.station, threshold=60){
  dat.t=rawdat %>% filter(date.obj==choose.date) %>% filter(GROUP==pick.station) %>% arrange(ztime.num) %>% group_by(BIRD)
  dat.t.list=split(dat.t, list(dat.t$BIRD), drop=T)
  timewindows.list=list()
  for(j in 1:length(dat.t.list)){
    tempd=dat.t.list[[j]]
    tempd$timelag=lead(tempd$ztime.num)-tempd$ztime.num
    starts.list=list()
    stops.list=list()
    st.list=list()
    
    counter=1
    open.count=1
  ## if there is only one line, open and close time window and end
    if (nrow(tempd)==1) {
      starts.list[[1]]=stops.list[[1]]=tempd$ztime.num
      st.list[[1]]=tempd$ST
    }
    else {
      for(i in 1:(nrow(tempd)-1)){
  ##for first line, if next detection is > threshold, open and close time window and add counter
        if (i==1 & tempd$timelag[i]>threshold){
          starts.list[counter]=tempd$ztime.num[i]
          stops.list[counter]=tempd$ztime.num[i]
          st.list[counter]=tempd$ST[i]
          counter=counter+1
          next}
  ##otherwise, if first line, open time window
        else if(i==1) {
          starts.list[counter]=tempd$ztime.num[i]
          open.count=i
          next}
  #after first line, if next detection is > threshold, and there is an open time window, close it and add to counter
        else if(tempd$timelag[i]>threshold & length(starts.list)>length(stops.list)) {
          stops.list[counter]=tempd$ztime.num[i]
          st.list[counter]=mean(tempd$ST[open.count:i])
          counter=counter+1
          next}
  #if next detection is > threshold and there is no open time window, open and close one and add to counter
        else if(tempd$timelag[i]>threshold & length(starts.list)==length(stops.list)) {
          starts.list[counter]=tempd$ztime.num[i]
          stops.list[counter]=tempd$ztime.num[i]
          st.list[counter]=tempd$ST[i]
          counter=counter+1
          next}
  #if next detection is <= threshold and there is an open time window, just keep it open
        else if(tempd$timelag[i]<=threshold & length(starts.list)>length(stops.list)) {
          next}
  #if next detection is <= threshold and there is no open time window, open one.
        else if(tempd$timelag[i]<=threshold & length(starts.list)==length(stops.list) ) {
          starts.list[[counter]]=tempd$ztime.num[i]
        }
        else next
      }
    }
    counter
    
    if(length(starts.list)>length(stops.list)) {
      stops.list[counter]=last(tempd$ztime.num)
      st.list[counter]=last(tempd$ST)
    }
    
timewindows=data.frame(start=unlist(starts.list), stop=unlist(stops.list), st=unlist(st.list))
  #check and make sure time windows are non-negative.
    length(which(timewindows[,2]-timewindows[,1]<0))==0
    
    timewindows$date=rep(tempd$date.obj[1], nrow(timewindows))
    timewindows$group=rep(tempd$GROUP[1], nrow(timewindows))
    timewindows$bird=rep(tempd$BIRD[1], nrow(timewindows))
    
    timewindows.list[[j]]=timewindows
  }
  names(timewindows.list)=sapply(timewindows.list, function(x) x$bird[1])
  return(timewindows.list)
}

Let’s try running this and seeing the output:

gate.10.25.18 = generate_timewindows(rawdat, choose.date = "2018-10-25", 
    pick.station = "GATE", threshold = 60)
lapply(gate.10.25.18, head)
## $B4113
##   start  stop        st       date group  bird
## 1 31718 31718 -95.00000 2018-10-25  GATE B4113
## 2 31785 43759 -94.88367 2018-10-25  GATE B4113
## 3 43846 44860 -94.90997 2018-10-25  GATE B4113
## 4 44922 45019 -94.91250 2018-10-25  GATE B4113
## 5 45110 50161 -95.04039 2018-10-25  GATE B4113
## 6 50225 51339 -95.04936 2018-10-25  GATE B4113
## 
## $B5495
##   start  stop   st       date group  bird
## 1 39350 39350  -98 2018-10-25  GATE B5495
## 2 39746 39746 -100 2018-10-25  GATE B5495
## 
## $B6028
##   start  stop         st       date group  bird
## 1 27723 27753  -98.11111 2018-10-25  GATE B6028
## 2 27880 28009  -96.83333 2018-10-25  GATE B6028
## 3 28276 28305  -96.82353 2018-10-25  GATE B6028
## 4 28427 28427  -97.00000 2018-10-25  GATE B6028
## 5 28937 28937 -101.00000 2018-10-25  GATE B6028
## 6 29236 29236  -99.00000 2018-10-25  GATE B6028
## 
## $B6029
##   start  stop     st       date group  bird
## 1 37449 37449 -99.00 2018-10-25  GATE B6029
## 2 39636 39651 -97.25 2018-10-25  GATE B6029
## 
## $B6077
##   start  stop  st       date group  bird
## 1 26115 26115 -81 2018-10-25  GATE B6077
## 2 26293 26293 -80 2018-10-25  GATE B6077
## 3 26462 26462 -80 2018-10-25  GATE B6077
## 4 26589 26589 -81 2018-10-25  GATE B6077
## 5 26727 26727 -82 2018-10-25  GATE B6077
## 6 26837 26837 -82 2018-10-25  GATE B6077
## 
## $B6078
##   start  stop        st       date group  bird
## 1 28080 28084 -79.00000 2018-10-25  GATE B6078
## 2 29016 29234 -78.00000 2018-10-25  GATE B6078
## 3 29408 29420 -77.88542 2018-10-25  GATE B6078
## 4 29574 29576 -77.91837 2018-10-25  GATE B6078
## 5 29668 29760 -79.46364 2018-10-25  GATE B6078
## 6 29914 29916 -79.54464 2018-10-25  GATE B6078

We’ve also written a function to plot the time window data called plot_timewindows. The function takes the output of the generate_timewindows function. Time windows are shown with lines with ticks on the ends, and if a time window is too short to be shown (e.g., just a few seconds of consecutive detections), it is shown as a small dot. This plot shades the lines based on relative signal strength, with the color being darker when the signal strength is relatively stronger. This can be overridden by using the col= argument within the function.

plot_timewindows = function(data, first.hour = 7, last.hour = 19, 
    col = NULL) {
    first.sec.hour = first.sec.hour = 1 + (60 * 60 * 0:24)
    plot(c(first.sec.hour[first.hour], first.sec.hour[last.hour]), 
        c(1, length(data)), type = "n", las = 1, yaxt = "n", 
        ylab = "", xaxt = "n", xlab = "")
    axis(1, at = first.sec.hour[first.hour:last.hour], labels = seq(first.hour, 
        last.hour, 1))
    axis(2, at = 1:length(data), labels = substr(names(data), 
        start = 1, stop = 5), las = 1)
    max.abs.st = abs(min(sapply(data, function(x) min(x$st, na.rm = T)), 
        na.rm = T))
    min.abs.st = abs(max(sapply(data, function(x) max(x$st, na.rm = T)), 
        na.rm = T))
    if (is.null(col) == T) {
        for (k in 1:length(data)) {
            for (l in 1:nrow(data[[k]])) {
                if (data[[k]][l, 2] - data[[k]][l, 1] <= 4) 
                  points(data[[k]][l, 1], k, pch = 20, col = gray((abs(data[[k]][l, 
                    3]) - min.abs.st)/(max.abs.st - min.abs.st)), 
                    cex = 0.5) else {
                  arrows(data[[k]][l, 1], k, data[[k]][l, 2], 
                    k, lwd = 1, angle = 90, code = 3, length = 0.03, 
                    col = gray((abs(data[[k]][l, 3]) - min.abs.st)/(max.abs.st - 
                      min.abs.st)))
                }
            }
        }
    } else {
        for (k in 1:length(data)) {
            for (l in 1:nrow(data[[k]])) {
                if (data[[k]][l, 2] - data[[k]][l, 1] <= 4) 
                  points(data[[k]][l, 1], k, pch = 20, col = col, 
                    cex = 0.5) else {
                  arrows(data[[k]][l, 1], k, data[[k]][l, 2], 
                    k, lwd = 1, angle = 90, code = 3, length = 0.03, 
                    col = col)
                }
            }
        }
    }
}
plot_timewindows(gate.10.25.18, first.hour = 7, last.hour = 19)

Figure S9: Time windows of presence at territory GATE on October 25, 2018

… and you can zoom into a few hours and don’t use the shading

plot_timewindows(gate.10.25.18, first.hour = 13, last.hour = 15, 
    col = "black")

*Same data as Figure S9, but shown as time windows of presence.

1.3 Comparing signal strengths from multiple stations to locate individuals during any given time window.

Despite the fact that we have filtered the data based on signal strength, this method is not foolproof because individual tags can differ slightly in signal strength, and receivers may differ in detection strength due to topography and vegetation. Thus, sometimes individuals can be detected at multiple receiver stations at the same time. Therefore, we take the additional step of filtering signals by comparing the signal strengths of any given tag detected at multiple stations at the same time.

To do this, we (1) determine if the time windows of detection of one individual across any two receiver stations overlaps at all, and (2) if there are overlaps, we compare the average signal strengths of each putative time window and only keep the one that has the strongest average signal strength. This ensures that we only have one location for each individual at any given time.

In this example, the cleaned data are stored as test.filter.

### filter time windows for one individual on one day.

# choose all stations, with a specific date and individual.
allstations = unique(rawdat$GROUP)
choose.date = "2018-10-01"
use.birds = unique(rawdat$BIRD)

# pick out the data from a random bird that is in the dataset
dat1 = rawdat %>% filter(BIRD == use.birds[3])

# now we go through a loop where we generate time windows of
# detection for all stations on the chosen date.
allstations_oneday = list()
for (i in 1:length(allstations)) {
    l = list()
    if (length(which(dat1$date.obj == choose.date & dat1$GROUP == 
        allstations[i])) > 0) 
        l = generate_timewindows(dat1, choose.date = choose.date, 
            pick.station = allstations[i], threshold = 60) else next
    allstations_oneday[[i]] = l
    # print(i)
}

# the resulting list, allstations_oneday, is a list of a
# list. so we need to use 'rbind' twice to make one clean
# dataframe.  There is probably a more elegant way to do
# this, but this works.
a = list()
for (j in 1:length(allstations_oneday)) {
    if (length(allstations_oneday[[j]]) > 0) 
        a[[j]] = do.call("rbind", allstations_oneday[[j]]) else next
}
b = do.call("rbind", a)

# now, we are going to iteratively find out, for any given
# time window, if there are any other time windows that
# overlap with it.  This identifies overlapping time windows.
# This routine is currently inefficient because it will go
# through and identify each overlap twice (i.e, it identifies
# when time window A overlaps B, then again for B overlapping
# A). But again, it works. having found overlapping time
# windows, it will then compare the signal strength (column
# named 'st') and get the rownames of which is NOT the
# maximum signal strength among the overlapping windows. That
# will be designated as 'rows to remove'.

rows.to.remove = list()
for (i in 1:nrow(b)) {
    s1 = b[c(i, which(b$start < b$start[i] & b$stop > b$start[i])), 
        ]
    rows.to.remove[[i]] = rownames(s1[-which.max(s1[, "st"]), 
        ])
}

# consolidate the list of rows to remove, then remove them.
# save the resulting data as 'test.filter'.
all.rows.to.remove = unique(match(unlist(rows.to.remove), rownames(b)))
if (length(all.rows.to.remove) > 0) test.filter = b[-all.rows.to.remove, 
    ] else test.filter = b
head(test.filter[order(test.filter$start), ])
##           start  stop        st       date   group  bird
## B5961.157 22797 22885 -81.80000 2018-10-01 Hilltop B5961
## B5961.227 23038 23064 -81.50000 2018-10-01 Hilltop B5961
## B5961.327 23185 23249 -80.00000 2018-10-01 Hilltop B5961
## B5961.427 23455 23600 -82.11538 2018-10-01 Hilltop B5961
## B5961.527 23684 23684 -86.00000 2018-10-01 Hilltop B5961
## B5961.626 23827 23829 -82.10345 2018-10-01 Hilltop B5961

1.4 Putting it all together and generating the “time windows list”

Now, here is the code to run this whole routine with all individuals and all days in the dataset. Here, we run the code with the threshold set at 60 seconds and the resulting data is saved as all.timewindows.list.oct_60.rdata

all.timewindows.list = list()
for (k in 1:length(all.days)) {
    allstations = unique(rawdat$GROUP)
    allstations_oneday = list()
    choose.date = all.days[k]
    for (i in 1:length(allstations)) {
        l = list()
        if (length(which(rawdat$date.obj == choose.date & rawdat$GROUP == 
            allstations[i])) > 0) 
            l = generate_timewindows(rawdat, choose.date = choose.date, 
                pick.station = allstations[i], threshold = 60) else next
        allstations_oneday[[i]] = l
    }
    
    a = list()
    for (j in 1:length(allstations_oneday)) {
        if (length(allstations_oneday[[j]]) > 0) 
            a[[j]] = do.call("rbind", allstations_oneday[[j]]) else next
    }
    b = do.call("rbind", a)
    c = split(b, b$bird)
    
    f = lapply(c, function(x) {
        test = x
        rows.to.remove = list()
        for (i in 1:nrow(test)) {
            s1 = test[c(i, which(test$start < test$start[i] & 
                test$stop > test$start[i])), ]
            rows.to.remove[[i]] = rownames(s1[-which.max(s1[, 
                "st"]), ])
        }
        all.rows.to.remove = unique(match(unlist(rows.to.remove), 
            rownames(test)))
        if (length(all.rows.to.remove) > 0) 
            test.filter = test[-all.rows.to.remove, ] else test.filter = test
        test.filter
    })
    
    h = do.call("rbind", f)
    all.timewindows.list[[k]] = h
    print(k)
}

save(all.timewindows.list, file = "all.timewindows.list.oct_60.rdata")

Part 2: Going from time windows to co-presence arrays

This part of the workflow takes the time window data and converts the data into a two sets of matrices that together can be used to create social networks for any combo of territories and days.

In Part 1, we created a dataset ‘time windows’ of presence of individuals at base stations in which we filtered detections based on signal strength–i.e., if an individual was detected at multiple locations, we only keep the data for the location that had the strongest signal strength–i.e., the station that the individual was closest to. Let’s load that datset here:

load("all.timewindows.list.oct_60.rdata")

This R data file contains a list called all.timewindows.list, which is a list with the data for each day saved as separate items. We can see this because it has 31 items (one for each day).

class(all.timewindows.list)
## [1] "list"
length(all.timewindows.list)
## [1] 31

Let’s make it easier and combine this all into one comprehensive dataframe.

all.timewindows = do.call("rbind", all.timewindows.list)

Here, we will walk you through the idea for constructing two different matrices:

  1. a co-presence matrix, which summarizes the number of seconds that two individuals were present at the same receiver station.

  2. a simultaneous detection matrix, which summarizes the number of seconds that two individuals were detected at the same time ANYWHERE (i.e., at any receiver station, including cases where they are at the same station). This gives us information about instances where we knew where both individuals were at the same time.

We will show later how the information from these two matrices can be combined to calculate association indices in Section 2.3.

We will work through how these matrices are constructed using a small subset of the data, then introduce a routine that generates the matrices across multiple days (Figure S11, S12).

*Figure S11: Converting time windows data to copresence array. Time windows data (a) is converted to a presence matrix (b) in which individuals are either absent (0) or present (1) for each second of the day at a receiver station. This matrix can be easily converted to a co-presence matrix (c) that shows the number of seconds that each pair of individuals were present at a territory on a given day.*

Figure S11: Converting time windows data to copresence array. Time windows data (a) is converted to a presence matrix (b) in which individuals are either absent (0) or present (1) for each second of the day at a receiver station. This matrix can be easily converted to a co-presence matrix (c) that shows the number of seconds that each pair of individuals were present at a territory on a given day.

*Figure S12: The co-presence array (Figure S11) is stacked across stations into an 3 dimentional co-presence array, and arranged as a list with each element of the list containing the co-presence arrays for each day.*

Figure S12: The co-presence array (Figure S11) is stacked across stations into an 3 dimentional co-presence array, and arranged as a list with each element of the list containing the co-presence arrays for each day.

2.1. Create a matrix of presence/absence of every possible individual for every possible second of the day

The first step involves converting the time windows data into a \(x\) x \(n\) “presence matrix” in which \(x\) rows is time–here, every second of the day between 5am and 9pm–and \(n\) columns is all possible individuals. The reason to set this up with all possible seconds and all possible individuals is that we can later create arrays with equal-sized matrices no matter which station on which day.

I’ve made this routine into a function called build.mat() in the master functions file. Note that this function takes time window data with columns called “start2” and “stop2”, which simply takes the seconds in the ‘start’ and ‘stop’ columns and converts them to time starting from 5am (to save some memory space).

build.mat = function(d) {
    max.secs.day = 54001  #number of seconds starting from 5am up to 8pm
    m = matrix(0, nrow = max.secs.day, ncol = length(ids), dimnames = list(1:max.secs.day, 
        ids))
    for (i in 1:nrow(d)) {
        a = seq(d[i, "start2"], d[i, "stop2"], 1)
        bird = d[i, "bird"]
        c = match(bird, ids)
        m[a, c] = 1
    }
    return(m)
}

Using this function, we can make a presence matrix for all individuals seen at one station on one day:

sec.daystart = 60 * 60 * 5
max.secs.day = 54001
timewindows.dat = all.timewindows.list[[1]]  #one day's worth of time windows data
pick.station = allstations[1]
daydat = try.data = timewindows.dat[timewindows.dat$group == 
    pick.station, ]  #data for one station for one day
daydat$date.num = as.numeric(as.Date(daydat$date[1]) - as.Date("2017-07-01"))
# make reduced dataset
d = daydat %>% select(bird, group, date.num, start, stop) %>% 
    arrange(bird)
d = as.data.frame(d)
# convert seconds to time starting at 5am and call these
# 'start2' and 'stop2'
d$start2 = d$start - sec.daystart
d$stop2 = d$stop - sec.daystart
# use the build.mat() function to generate presence matrix
m = build.mat(d)

The resulting matrix, m is \(x\) seconds long and \(n\) individuals wide

dim(m)
## [1] 54001   131

2.2 Co-presence matrix

This presence matrix can then be used to calculate a “co-presence matrix”, which is an \(n\) x \(n\) matrix with bird ID as both row and column, and the cell value is the number of seconds both of those birds were present at the station. Making this matrix is easy once you have the \(x\) x \(n\) matrix above. You simply have to multiply the transpose of the matrix to the matrix itself:

copresence = t(m) %*% m

This matrix is \(n\) x \(n\) matrix that counts up the number of seconds that any two individual were present at the same receiver station at the same time. The diagonal of the copresence matrix is simply the number of seconds that an individual was present at that station on that day, and is equivalent to the column sums of the presence matrix.

# dimensions of the copresence matrix
dim(copresence)
## [1] 131 131
# the column sums of presence matrix
as.numeric(colSums(m))
##   [1]     0     0     0     0     0     0     0     0     0     0     0     0
##  [13]     0     9     0     0     0     0     0     0     1     0     0     0
##  [25]     0     0   574     0     0     0     0     0     0     0     0     0
##  [37]     0     0     0     0     0     0     0     0     0     0     0     0
##  [49] 23044     0     0    15     0     0     0     0     0     0     0     0
##  [61]     0     0     0     0     0     0     0   738     0     0     0     0
##  [73]     0     0     0     0     0   541     0     0     0     0     0     0
##  [85]     0     0     0     0     0     0     0     0     0     0   584     0
##  [97]     0     0     0     0     0     0   219 10412     0     0     0     0
## [109]     0     0    21     0     0     0     0     0     0     0     0     0
## [121]     0     0     0     0     0     0     0     0     0    73     0
# the diagonal of the co-presence matrix
as.numeric(diag(copresence))
##   [1]     0     0     0     0     0     0     0     0     0     0     0     0
##  [13]     0     9     0     0     0     0     0     0     1     0     0     0
##  [25]     0     0   574     0     0     0     0     0     0     0     0     0
##  [37]     0     0     0     0     0     0     0     0     0     0     0     0
##  [49] 23044     0     0    15     0     0     0     0     0     0     0     0
##  [61]     0     0     0     0     0     0     0   738     0     0     0     0
##  [73]     0     0     0     0     0   541     0     0     0     0     0     0
##  [85]     0     0     0     0     0     0     0     0     0     0   584     0
##  [97]     0     0     0     0     0     0   219 10412     0     0     0     0
## [109]     0     0    21     0     0     0     0     0     0     0     0     0
## [121]     0     0     0     0     0     0     0     0     0    73     0

2.3. Calculate co-presence matrix and simultaneous detection matrix for one day

Now, we can work up to calculating the copresence matrices for all \(S\) stations in one day. We will save the results in a 3-dimensional array (\(n\) x \(n\) x \(S\)). But we will still save the presence matrix for each station (we will call it result.array1) because we will use it in the next step.

# now do this for all stations in one day

# set up the arrays.
#'result.array1' is the presence matrices across all stations.
#'copresence.array1' is the co-presence matrices across all stations
result.array1 = array(dim = c(max.secs.day, length(ids), length(allstations)))
copresence.array1 = array(dim = c(length(ids), length(ids), length(allstations)))

# using a loop, pick one station, restrict the data to that
# station, and go through the same routine as shown above.
for (j in 1:length(allstations)) {
    pick.station = allstations[j]
    try.data = timewindows.dat[timewindows.dat$group == pick.station, 
        ]
    if (nrow(try.data) == 0) 
        next else daydat = try.data
    daydat$date.num = as.numeric(as.Date(try.data$date[1]) - 
        as.Date("2017-07-01"))
    # make reduced dataset
    d = daydat %>% select(bird, group, date.num, start, stop) %>% 
        arrange(bird)
    d = as.data.frame(d)
    d$start2 = d$start - sec.daystart  #convert into seconds from 5am
    d$stop2 = d$stop - sec.daystart  #convert into seconds starting from 5am
    
    # Make daily presence matrix
    m = build.mat(d)  #so the columns here should be sorted by ids
    result.array1[, , j] = m
    copresence.array1[, , j] = t(m) %*% m
}

Now we will make the “simultaneous detection matrix”–i.e., matrix of seconds when A+B are both detected across all stations. To do this, we first sum up the detection array across all stations. This results in a matrix that shows whether an individual was detected in any station at a given second. Then, we multiply the transpose of that matrix with itself to get a \(n\) x \(n\) matrix that shows the number of seconds that any two individuals were both detected within the study site.

# first, sum up the detection array across all stations.
# This results in a matrix that shows whether an individual
# was detected in any station at a given second.
aa = apply(result.array1, c(1, 2), sum, na.rm = T)

# the resulting matrix should just have 1s and 0s, but just
# in case convert any >1 to 1.
aa[aa > 1] = 1

# the simultaneous detection array is the transpose of this
# matrix multiplied to itself.
simul.detection.matrix = t(aa) %*% aa

The result is two matrices: (1) an \(n\) x \(n\) x \(S\) 3-dimensional array copresence.array1 containg copresence matrices for each of \(S\) stations, and (2) an \(n\) x \(n\) matrix (simul.detection.matrix) that counts the number of seconds any two individuals were both detected somewhere in the study site.

dim(copresence.array1)
## [1] 131 131  42
dim(simul.detection.matrix)
## [1] 131 131

2.4. Calculate co-presence matrix and simultaneous detection matrix for each day and store all data in two arrays.

Now, we have to make the presence matrix for each day, and calculate the simultaneous detection and co-presence matrices for each day. My computer can’t save the array with the raw presence matrix (54,001 seconds x 131 individuals x 42 stations x 31 days) because it will run out of memory. So, we will generate two arrays: (1) the co-presence array (\(n\) x \(n\) x \(S\) x \(T\) copresence.array) and (2) the simultaneous detection array (\(n\) x \(n\) x \(T\) detection.array).

I then save this into a .Rdata file, which we will use in Part 3 to calculate association indices and construct the network.

sec.daystart = 60 * 60 * 5
max.secs.day = 72000

detection.array = array(dim = c(length(ids), length(ids), length(all.timewindows.list)), 
    dimnames = list(ids, ids, 1:length(all.timewindows.list)))
copresence.array = array(dim = c(length(ids), length(ids), length(allstations), 
    length(all.timewindows.list)), dimnames = list(ids, ids, 
    allstations, 1:length(all.timewindows.list)))
# 
for (k in 1:length(all.timewindows.list)) {
    result.array1 = array(dim = c(max.secs.day, length(ids), 
        length(allstations)))
    timewindows.dat = all.timewindows.list[[k]]  #one day's worth of time windoes data
    for (j in 1:length(allstations)) {
        pick.station = allstations[j]
        try.data = timewindows.dat[timewindows.dat$group == pick.station, 
            ]
        if (nrow(try.data) == 0) 
            next else daydat = try.data
        daydat$date.num = as.numeric(as.Date(try.data$date[1]) - 
            as.Date("2017-07-01"))
        # make reduced dataset
        d = daydat %>% select(bird, group, date.num, start, stop) %>% 
            arrange(bird)
        d = as.data.frame(d)
        d$start2 = d$start - sec.daystart
        d$stop2 = d$stop - sec.daystart
        
        # Make daily presence matrix
        m = build.mat(d)  #so the columns here should be sorted by ids
        result.array1[, , j] = m
        
        # make co-presence matrix and add to array
        copresence.mat = t(m) %*% m
        copresence.array[, , j, k] = copresence.mat
    }
    
    # make matrix of seconds when A+B are both detected across
    # all stations and add to detection.array
    aa = apply(result.array1, c(1, 2), sum, na.rm = T)
    bb = aa
    bb[bb > 1] = 1
    detection.array[, , k] = t(bb) %*% bb
    print(k)
}

save(detection.array, copresence.array, file = "result_arrays_60sec.rdata")

Part 3: Generating the network

3.0: Import data and Master Functions

In Part 2, we created an .rdata file that contained two arrays: the copresence array and the simultaneous detection array. We will load that here.

load("result_arrays_60sec.rdata")

This R data file contains a list called all.timewindows.list, which is a list with the data for each day saved as separate items. We can see this because it has 31 items (one for each day).

dim(copresence.array)
## [1] 131 131  42  31
dim(detection.array)
## [1] 131 131  31

3.1: The Simple Ratio Index and Very Simple Ratio Index

Here, I will provide a quick background on two potential association indices we could use: the Simple Ratio Index (SRI: Cairns & Schwager 1984) and the Very Simple Ratio Index (vSRI: Hoppitt & Farine 2018).

3.1.1: Intro to SRI and vSRI

The SRI has been one of the most widely used association index in studies of animal social networks. It takes the frequency of observations in which two individuals (A and B) were seen together and divides it by the number of observations that A and B were observed in total:

\(SRI = \frac{x}{x + y_A + y_B + y_{AB}}\)

where \(x\) is the number of observation periods that individual A and B are observed together, \(y_A\) and \(y_B\) are the number of observation periods that A is observed without B and B is observed without A, respectively. Finally, \(y_{AB}\) is the number of observation periods where both A and B are observed but not together.

In the case of automated telemetry data, each observation period can be any discrete temporal scale. In our case, we use each second of the day as the observation period, so the SRI captures the association rate based on the number of seconds that A and B are observed together over the study period.

Hoppitt & Farine (2018) conducted a simulation study to test the robustness of SRI and other association indices under two common sources of sampling error, group location error (i.e., not all groups are observed) and individual identification error (not all individuals in a group are identified). In this study, they show that a modified version of the SRI, which they term vSRI, is a robust, unbiased estimator of association rates when the only source of error is individual identification error (i.e., all groups are located, but some individuals within those groups are missed):

\(vSRI = \frac{x}{x + y_{AB}}\)

This simpler association index is unbiased in the absence of group location error because we simply calculate the rate at which A and B are observed together during the observation periods when the locations of both individuals are known. With automated telemetry systems, the \(vSRI\) may be a useful association index when two criteria are met: (1) we suppose that all sites of relevance to the social behavior under study is monitored by a functioning receiver station, and (2) all receiver stations are fully functioning for the duration of the study period (main text Figure 2). In our worked example, we know that neither of these criteria are met: not all territories contained receiver stations, and there were suspected bouts of receiver failure (i.e., days when a receiver station did not log any tags). However, we provide the codes for vSRI here to illustrate the possibility of using this index.

Instead, we primarily use SRI as our main association index despite the potential biases that could have arisen from incomplete sampling (Hoppitt & Farine 2018). Future studies could attempt to collect validation data needed to calculate association indices that are robust to group location error as well, such as GRECI and CECI (see Hoppitt & Farine 2018).

3.1.2. Calculating SRI and vSRI from copresence and simultaneous detection arrays.

Here, we outline how to calculate the SRI and vSRI using the two arrays we have generated in Part 2.

Recall that we have two arrays: (1) the \(n\) x \(n\) x \(S\) x \(T\) copresence array, which is a stack of copresence matrices across all stations across all days. (2) the \(n\) x \(n\) x \(T\) simultaneous detection array, which is a stack of the matrices of number of seconds that each pair of birds were detected at the same time anywhere, and stacked across days.

From here, we can get all of the values needed to calculate each association index:

  1. copresence matrix = \(x\)
  2. simultaneous detection matrix - copresence matrix = \(y_{AB}\) (i.e., cases where A and B were both detected, but not copresent at one station)
  3. total number of times A was seen + total number of times B was seen = \(y_A + y_B + 2x + 2y_{AB}\)

So, the denominator of SRI (\(x + y_{AB} + y_A + y_B\)) is (3) - (2) - (1)

## Calculate Simple Ratio Index from co-presence matrix

get_SRI = function(copresence, detections) {
    # total #seconds A and B were seen anywhere, with
    # double-counting y_ab and x
    sum1 = outer(diag(detections), diag(detections), "+")
    diag(sum1) = 0
    # number of secs a and b were both seen at the same time, but
    # not in the same place
    y_ab = detections - copresence
    diag(y_ab) = 0
    # denominator is the total number of times A and B were seen
    # total but subtracting x and y_ab because these are
    # double-counted.
    adj = copresence/(sum1 - copresence - y_ab)
    diag(adj) = 0
    adj[is.nan(adj)] = 0
    adj
}

For vSRI, the denominator is simply \(x + y_{AB}\)

get_vSRI = function(copresence, detections) {
    # total #seconds A and B were seen anywhere, with
    # double-counting y_ab and x
    sum1 = outer(diag(detections), diag(detections), "+")
    diag(sum1) = 0
    # number of secs a and b were both seen at the same time, but
    # not in the same place
    y_ab = detections - copresence
    diag(y_ab) = 0
    adj = copresence/(copresence + y_ab)
    diag(adj) = 0
    adj[is.nan(adj)] = 0
    adj
}

3.2 Plot the network using SRI

3.2.1 Plot the network for one day

To calculate the association index across all individuals for one day, we can sum the copresence array across all stations for that day, and simply take the simultaneous detection matrix for that day. We then plug in the resulting matrices into the get_SRI function:

# getting SRI for day 1 only
cpm1 = apply(copresence.array[, , , 1], c(1, 2), sum, na.rm = T)
sdm1 = detection.array[, , 1]
sri1 = get_SRI(cpm1, sdm1)

This matrix then becomes our adjacency matrix of the network. We can use the igraph package to generate the network. Here, we delete the “isolates” i.e., the nodes that are not connected to any edges.

# convert adjacency matrix into network
g.sri1 = graph_from_adjacency_matrix(sri1, "undirected", weighted = T)

# remove isolates
filt.g.sri1 = delete_vertices(g.sri1, which(degree(g.sri1) == 
    0))


# plot network
plot(filt.g.sri1, vertex.label = "", edge.width = E(filt.g.sri1)$weight * 
    30, vertex.size = 8, edge.color = "black")

An unlabeled social network of acorn woodpecker associations on October 1st, 2018.

3.2.2 array_to_net() function to make it easier

We can make the routine easier by making the routine into a function:

# dates takes values 1 through 31; ai (association index)
# options are 'SRI' or 'vSRI'
array_to_net = function(copresence_array, detections_array, stations = dimnames(copresence_array)[[3]], 
    dates = as.numeric(dimnames(copresence_array)[[4]]), ai = "SRI") {
    
    if (length(stations) > 0) 
        subdat.cp = copresence_array[, , which(dimnames(copresence_array)[[3]] %in% 
            stations), ] else subdat.cp = copresence_array
    if (length(stations) == 1) 
        subdat.cp = subdat.cp[, , match(dates, dimnames(copresence_array)[[4]])] else subdat.cp = subdat.cp[, , , match(dates, dimnames(copresence_array)[[4]])]
    sumstations.cp = apply(subdat.cp, c(1, 2), sum, na.rm = T)
    cpm = sumstations.cp[which(rowSums(sumstations.cp) > 0), 
        which(rowSums(sumstations.cp) > 0)]
    detections.use = apply(detections_array[, , match(dates, 
        dimnames(detections_array)[[3]])], c(1, 2), sum, na.rm = T)
    sdm = detections.use[which(rowSums(sumstations.cp) > 0), 
        which(rowSums(sumstations.cp) > 0)]
    if (ai == "SRI") 
        adj = get_SRI(cpm, sdm) else if (ai == "vSRI") 
        adj = get_vSRI(cpm, sdm) else print("Association index not recognized")
    g = graph_from_adjacency_matrix(adj, "undirected", weighted = T)
    V(g)$count = diag(sdm)
    g
}

An extra feature of this function is that the output graph will contain a vertex attribute called “count”, which tabulates the number of seconds that each individual is detected during the designated days. We will use this later in Part 3.4.1 when we filter by observation.

We can use this routine to quickly make a network from the first day of data using the SRI index.

g.trial = array_to_net(copresence.array, detection.array, dates = 1, 
    ai = "SRI")

We can show that this the same network as what we created above (filt.g.sri1) by plotting the two using the same layout.

par(mfrow = c(1, 2))
layout.trial = layout_with_fr(g.trial)
plot(g.trial, vertex.label = "", edge.width = E(g.trial)$weight * 
    30, vertex.size = 8, edge.color = "black", layout = layout.trial)
plot(filt.g.sri1, vertex.label = "", edge.width = E(filt.g.sri1)$weight * 
    30, vertex.size = 8, edge.color = "black", layout = layout.trial)

A visual check to see that the array_to_net() function produces the same network as was shown in Figure S13.

*put the graphics paramter back to one row, one column

par(mfrow = c(1, 1))

3.2.2 Plot the network for all days in October

To plot the total aggregated network, we can sum the copresence array across all stations and days, and also sum the simultaneous detection array across all days. This is done automatically in the array_to_net() function if we just leave the “stations” and dates" arguments null.

g.sri = array_to_net(copresence.array, detection.array, ai = "SRI")

We can add vertex attributes. We’ll take the vertex names from the margin name of the array, and we’ll also add a GROUP attribute, which is the home group of the individual (we get this from the individual attributes file using a match() function).

V(g.sri)$GROUP = as.character(id.dat[match(V(g.sri)$name, id.dat$BIRD), 
    "GROUP"])

Then, we filter out the isolates. We will also filter out individuals that do not have a receiver station at their home territory. This way, we restrict the network to those individuals that could be observed both at home and while visiting other territories.

Here, we’ll plot the vertex colors by home group membership (though there will be overlap in group colors right now).

# filter out isolates
filt.g.sri = delete.vertices(g.sri, which(degree(g.sri) == 0))

# filter out individuals whose home group doesn't have a base
# station
filt.g.sri = delete.vertices(filt.g.sri, is.na(match(V(filt.g.sri)$GROUP, 
    allstations)))

V(filt.g.sri)$color = as.numeric(as.factor(V(filt.g.sri)$GROUP))

plot(filt.g.sri, vertex.label = "", vertex.size = 8, edge.width = E(filt.g.sri)$weight * 
    50)

Figure S15: Social network of acorn woodpeckers for October, 2018. The node colors represent membership to social groups (there may be multiple social groups with the same color due to redundancy in default color scheme.


3.3 Plotting the network based on spatial layout of group territories

Part of the challenge of visualizing social networks is figuring out a way to display both the social and spatial relationships of elements (here, individuals). In the case of acorn woodpeckers, there are associations between each individual, but the social groups that individuals belong to also occupy discrete territories in space. How can we capture both of these factors at the same time?

Here, we will use a color gradient to assign colors to individuals based on the 2-D spatial location of their home territory. We can do this by using a color scheme where the hue and saturation of colors depend on the distance and direction of the territory from a particular location (center of the study site).

3.3.1 Setting the color scheme using hsv()

The hsv() function allows you to specify colors using hue, saturation and value.

We can use this function to create a color scheme that acts like a color wheel. For example, we can have the color hue of a point represent the angle around a center point, and the saturation of the color represent the distance from the center.

## demo of color scheme using hsv()

# a vector of angles dividing a circle 36 times (in radians),
# repeated 10 times
angles = rep(seq(2 * pi/36, 2 * pi, length.out = 36), 10)

# a vector of lengths (1 through 10), repeated 36 times
lengths = as.numeric(sapply(1:10, function(x) rep(x, 36)))

# use trigonometry to figure out x and y positions of each
# point
ys = -(lengths * cos(angles))
xs = -(lengths * sin(angles))

# convert angles and lengths into values between 0 and 1
angles.norm = angles/(2 * pi)
lengths.norm = lengths/10
# now plot all points, with hues representing the angle and
# the saturation of the color representing the distance from
# the center.
plot(xs, ys, pch = 21, bg = hsv(h = angles.norm, s = lengths.norm, 
    v = 1), xlab = "x", ylab = "y")

Figure S16: Demonstration of spatial color scheme in which the hue and saturation of color is a function of the distance and angle from the center.

3.3.2 Getting the group coordinates and the corresponding colors.

Using the color scheme above, we can assign colors to base stations based on angle and distance from the center of the study site. This becomes useful when we compare between different network layouts.

First, we’ll take the GPS easting and northing values of all group territories from the group data and convert it to latitutde/longitude coordinates.

#### set up territory colors using hsv
utmcoord = SpatialPoints(cbind(group.dat$easting, group.dat$northing), 
    proj4string = CRS("+proj=utm +zone=10"))
longlatcoord = as.data.frame(spTransform(utmcoord, CRS("+proj=longlat")))
group.dat$long = longlatcoord$coords.x1
group.dat$lat = longlatcoord$coords.x2

We’ll now restrict the data to the group territories of birds that are included in this data. Then, we’ll get the coordinates for the center of this cluster of points.

use.groups = group.dat[match(unique(V(filt.g.sri)$GROUP), group.dat$GROUP), 
    ]
center.lat = mean(use.groups$lat)
center.long = mean(use.groups$long)

Then, we can get the angles and distances of each group territory from this ‘center point’.

home.angles = bearing(matrix(rep(c(center.long, center.lat), 
    nrow(use.groups)), ncol = 2, byrow = T), matrix(c(use.groups$long, 
    use.groups$lat), ncol = 2))
distances = distHaversine(matrix(rep(c(center.long, center.lat), 
    nrow(use.groups)), ncol = 2, byrow = T), matrix(c(use.groups$long, 
    use.groups$lat), ncol = 2))

Now, we can normalize these angle and distance values to be between 0 and 1, and we’ll add these values to the use.groups dataframe.

## set the colors that we'll use in the hsv() function.
use.groups$hue = (home.angles + 180)/360
use.groups$saturation = distances/max(distances)
use.groups$saturation[which(use.groups$saturation > 1)] = 1

We now organize the data into a dataframe where each row is social group name, with the color we are going to use for that group, generated by the hsv() function, with our hue and saturation values, and also get the latitude and longitude for the group territory by looking it up in the use.groups dataframe above.

col.dat.home = data.frame(home = unique(V(filt.g.sri)$GROUP), 
    color = hsv(use.groups[match(unique(V(filt.g.sri)$GROUP), 
        use.groups$GROUP), "hue"], use.groups[match(unique(V(filt.g.sri)$GROUP), 
        use.groups$GROUP), "saturation"], v = 1))

Now we use this dataframe as reference and plot all group territories in space.

# plot a blank plot with bounds
plot(-125.54, 36.38, xlim = c(-121.57, -121.54), ylim = c(36.355, 
    36.4), type = "n", ylab = "", xlab = "", las = 1, main = "All receiver locations for this study")

points(use.groups$long, use.groups$lat, pch = 21, cex = 1.5, 
    bg = hsv(use.groups$hue, use.groups$saturation, v = 1))

Figure S17: By applying the color scheme from Figure S16 to acorn woodpecker territories, we can color the territories as distinct color that gives some sense of their spatial location. These territory colors can be applied to individuals in the social network to indicate their home territory.

3.3.3 Plot the network with node colors based on territory location

Now, we can assign each node a color based on this territory color scheme and replot the network.

V(filt.g.sri)$color = hsv(use.groups[match(V(filt.g.sri)$GROUP, 
    use.groups$GROUP), "hue"], use.groups[match(V(filt.g.sri)$GROUP, 
    use.groups$GROUP), "saturation"], v = 1)

plot(filt.g.sri, vertex.label = "", vertex.size = 8, edge.width = E(filt.g.sri)$weight * 
    30, edge.color = "black", main = "Figure 5b")

*Figure S18: The acorn woodpecker social network for October 2018, with node colors representing the individual’s home territory using the color scheme shown in Figure S17.

3.3.4 Plot the social network using a spatial layout (Figure 5a)

We can also plot the social network using a spatial layout in which each node is shown clustered in a circle around their respective group territory. This uses a custom function called make_spatial_layout(), which is in the ‘ACWOmasterfunctions.R’ script. (It’s quite complicated so I won’t bother walking through it all here.)

spat.layout = make_spatial_layout(net = filt.g.sri, id_data = id.dat, 
    group_data = use.groups)
plot(filt.g.sri, edge.width = E(filt.g.sri)$weight * 30, vertex.size = 5, 
    vertex.label = "", layout = spat.layout, edge.curved = T, 
    edge.color = "black", main = "Figure 5a")

Figure S19: The same social network as Figure S18, but using the ‘spatial layout’ in which nodes are shown next to the spatial location of their home territory.

3.3.5 Measure assortment of associations by social group

We can measure the degree to which the associations between individuals is assorted based on social group membership. The assortment coefficient ranges from -1 to 1. The value is -1 if associations are purely disassortative (associations ONLY occur between nodes that are different types), 0 if associations are random with respect to node type, and 1 if associations are purely assortative (ONLY occur between nodes of the same type). Here, we assign group membership as the “node type”. The assortment.discrete() function from the package assortnet automatically includes edge weight in the calculation (see Farine 2014). Note that the function takes the adjacency matrix as the input, so we have to conver the igraph object into an adjacency matrix first.

assortment.discrete(as_adjacency_matrix(filt.g.sri, attr = "weight", 
    sparse = F), V(filt.g.sri)$GROUP)$r
## [1] 0.5737379

The assortment coefficient is approximately 0.57, which means that associations are heavily assortative–i.e., members of the same group associate with each other more on average. Not surprising, as this is a proof-of-concept result.


3.4 Using the vSRI to build networks and comparing it to SRI

As discussed above, SRI is but one method of calculating the association index. Another association index, the vSRI, may be more unbiased for datasets in which the receivers were fully operational but individuals were not consistently detected (e.g., due to tag failure or imperfect detection). Here, we demonstrate how we can build the social network of acorn woodpeckers using the vSRI and compare the results to what we see with the SRI.

First, let’s build two networks–one using SRI and other using vSRI–from one day of data. We will also figure out which receiver stations had at least one detection of any bird that day, as we can infer that those receivers were functional during that day.

# getting SRI and vSRI using matrices from day 1 only
g.sri1 = array_to_net(copresence.array, detection.array, dates = 1, 
    ai = "SRI")
g.vsri1 = array_to_net(copresence.array, detection.array, dates = 1, 
    ai = "vSRI")

# which stations had at least one detection (i.e., was known
# to be functional)
used.stations = which(apply(copresence.array[, , , 1], 3, sum, 
    na.rm = T) > 0)

Now, we will assign group membership to individuals using the id.dat data, filter out the isolates, and also filter out individuals who came from groups that did not have a functional receiver station that day.

# assign groups

# with SRI
V(g.sri1)$GROUP = as.character(id.dat[match(V(g.sri1)$name, id.dat$BIRD), 
    "GROUP"])

# with vSRI
V(g.vsri1)$GROUP = as.character(id.dat[match(V(g.vsri1)$name, 
    id.dat$BIRD), "GROUP"])

# remove isolates
filt.g.sri1 = delete_vertices(g.sri1, which(degree(g.sri1) == 
    0))
filt.g.vsri1 = delete_vertices(g.vsri1, which(degree(g.vsri1) == 
    0))

# filter out individuals whose home group receiver station
# wasn't functioning that day.
filt.g.sri1 = delete.vertices(filt.g.sri1, is.na(match(V(filt.g.sri1)$GROUP, 
    names(used.stations))))
filt.g.vsri1 = delete.vertices(filt.g.vsri1, is.na(match(V(filt.g.vsri1)$GROUP, 
    names(used.stations))))

We then add color to the vertices using the same procedure as before and plot the result. This network uses the Fruchterman-Reingold force-directed layout (by default). To illustrate the difference between the indices, we will use the same scale for edge width here.

# add color to vertices
V(filt.g.sri1)$color = hsv(use.groups[match(V(filt.g.sri1)$GROUP, 
    use.groups$GROUP), "hue"], use.groups[match(V(filt.g.sri1)$GROUP, 
    use.groups$GROUP), "saturation"], v = 1)
V(filt.g.vsri1)$color = hsv(use.groups[match(V(filt.g.vsri1)$GROUP, 
    use.groups$GROUP), "hue"], use.groups[match(V(filt.g.vsri1)$GROUP, 
    use.groups$GROUP), "saturation"], v = 1)

# plot network
par(mfrow = c(1, 2), mar = c(2, 2, 2, 2))
plot(filt.g.sri1, vertex.label = "", edge.width = E(filt.g.sri1)$weight * 
    30, vertex.size = 8, edge.color = "black", main = "with SRI")
plot(filt.g.vsri1, vertex.label = "", edge.width = E(filt.g.vsri1)$weight * 
    30, vertex.size = 8, edge.color = "black", main = "with vSRI")

Figure S20: The same data can be used to construct a social network using SRI or vSRI as the association index. The data for October 1, 2018 are restricted to include only individuals whose home territory had receiver that was known to be functional that day.

As you can see, the vSRI produces much thicker edges because the association indices tend to be larger. In fact, vSRI will always be larger than SRI because its denominator is smaller. When vSRI = 1, it means that whenever both A and B were detected at the same time, they were in the same place.

par(mfrow = c(1, 1))

Let’s plot the vSRI against the SRI for the same set of edges to see their relationship.

plot(E(filt.g.sri1)$weight, E(filt.g.vsri1)$weight, pch = 21, 
    bg = gray(0.5, 0.5), ylab = "vSRI", xlab = "SRI", xlim = c(0, 
        1), ylim = c(0, 1))

Figure S21: Comparing SRI and vSRI for individuals using October 1, 2018 data.

As you can see, the vSRI is always larger than SRI. Some edges that have very low SRI have very high vSRI–these are likely pairs of individuals in which one of the birds was seen more frequently than the other, but when they both were seen at the same time, they were always at the same location.

Biologically speaking, which association index is more appropriate–vSRI or SRI–likely depends on both the sources of sampling error (as described in Hoppitt & Farine 2018), but also the biological question of interest and the behavioral variation of individuals. If some birds foray a lot but do not spend much time within the core area of their home territory, then their vSRI with members of other group members end up being extremely high–more than is biologically warranted. But ongoing debate and fine-tuning of association indices can potentially help generate better-resolved social networks in the long run.


3.5 Other ways to filter the data

Here, we demonstrate a couple of other filtering techniques we use to quality-control the data in the paper.

3.5.1 Remove rarely seen birds

One thing we can do is remove the birds that are rarely seen–let’s say we put the cutoff at 31 times (i.e., once per day on average):

# figure out which birds are rarely seen in the whole data.
rarely.seen = V(g.sri)$name[which(V(g.sri)$count < 31)]

# restrict the arrays to those individuals that are not in
# the 'rarely.seen' string. This part could be made simpler
# with a function.
copresence.array.reduce = copresence.array[-which(dimnames(copresence.array)[[1]] %in% 
    rarely.seen), -which(dimnames(copresence.array)[[1]] %in% 
    rarely.seen), , ]
detection.array.reduce = detection.array[-which(dimnames(detection.array)[[1]] %in% 
    rarely.seen), -which(dimnames(detection.array)[[1]] %in% 
    rarely.seen), ]

# make the network
g.sri2 = array_to_net(copresence.array.reduce, detection.array.reduce, 
    ai = "SRI")

# assign group membership
V(g.sri2)$GROUP = as.character(id.dat[match(V(g.sri2)$name, id.dat$BIRD), 
    "GROUP"])

# filter out isolates
filt.g.sri2 = delete.vertices(g.sri2, which(degree(g.sri2) == 
    0))

# filter out individuals whose home group doesn't have a base
# station
filt.g.sri2 = delete.vertices(filt.g.sri2, is.na(match(V(filt.g.sri2)$GROUP, 
    allstations)))
filt.g.sri2 = delete_edges(filt.g.sri2, which(E(filt.g.sri2)$weight < 
    quantile(E(filt.g.sri2)$weight, prob = 0.1)))

V(filt.g.sri2)$color = hsv(use.groups[match(V(filt.g.sri2)$GROUP, 
    use.groups$GROUP), "hue"], use.groups[match(V(filt.g.sri2)$GROUP, 
    use.groups$GROUP), "saturation"], v = 1)
plot(filt.g.sri2, edge.width = E(filt.g.sri2)$weight * 50, vertex.size = 8, 
    vertex.label = "", edge.color = gray(0.3), layout = layout_with_fr(filt.g.sri2), 
    edge.curved = F)

Figure S22: Social netowrk after removing individuals that were rarely seen.

assortment.discrete(as_adjacency_matrix(filt.g.sri2, attr = "weight", 
    sparse = F), V(filt.g.sri2)$GROUP)$r
## [1] 0.5737661

3.5.2 Plot just the visitors network

To do this, we have to go back to Part 2.1.4, where we create the copresence and simultaneous detection arrays. Here, we need to restrict the data to just the detections of individuals away from their home group. So that routine looks like this, and will save an .Rdata file called visitor_result_arrays_60sec.Rdata, which will contain two arrays: visitor.copresence.array and visitor.detection.array.

*Note this routine needs to be run within the code script in Part 2, since some of the object names refer to ones we established there.

visitor.detection.array = array(dim = c(length(ids), length(ids), 
    length(all.timewindows.list)))
visitor.copresence.array = array(dim = c(length(ids), length(ids), 
    length(allstations), length(all.timewindows.list)))
# 

for (k in 1:length(all.timewindows.list)) {
    result.array1 = array(dim = c(max.secs.day, length(ids), 
        length(allstations)))
    timewindows.dat = all.timewindows.list[[k]]
    for (j in 1:length(allstations)) {
        pick.station = allstations[j]
        # here is where we remove the data where the individual is at
        # its home territory.
        try.data = timewindows.dat[!timewindows.dat$bird %in% 
            id.dat[which(id.dat$GROUP == pick.station), "BIRD"] & 
            timewindows.dat$group == pick.station, ]
        if (nrow(try.data) == 0) 
            next else daydat = try.data
        daydat$date.num = as.numeric(as.Date(try.data$date[1]) - 
            as.Date("2017-07-01"))
        
        # make reduced dataset
        d = daydat %>% select(bird, group, date.num, start, stop) %>% 
            arrange(bird)
        d = as.data.frame(d)
        d$start2 = d$start - sec.daystart
        d$stop2 = d$stop - sec.daystart
        
        # Make daily presence matrix
        
        m = build.mat(d)
        result.array1[, , j] = m
        
        # make co-presence matrix and add to array
        copresence.mat = t(m) %*% m
        visitor.copresence.array[, , j, k] = copresence.mat
    }
    
    # make matrix of seconds when A+B are both detected across
    # all stations and add to detection.array
    aa = apply(result.array1, c(1, 2), sum, na.rm = T)
    bb = aa
    bb[bb > 1] = 1
    visitor.detection.array[, , k] = t(bb) %*% bb
    print(k)
}

dimnames(visitor.detection.array) = list(ids, ids, 1:length(all.timewindows.list))
dimnames(visitor.copresence.array) = list(ids, ids, allstations, 
    1:length(all.timewindows.list))
save(visitor.detection.array, visitor.copresence.array, file = "visitor_result_arrays_60sec.rdata")

Now, we can go through the same routine as 3.2 to plot the visitor network

load("visitor_result_arrays_60sec.rdata")

Make the visitor network

visit.g.sri = array_to_net(visitor.copresence.array, visitor.detection.array, 
    ai = "SRI")

V(visit.g.sri)$GROUP = as.character(id.dat[match(V(visit.g.sri)$name, 
    id.dat$BIRD), "GROUP"])

# filter out isolates
filt.visit.g.sri = delete.vertices(visit.g.sri, which(degree(visit.g.sri) == 
    0))

# filter out rarely seen birds
filt.visit.g.sri = delete.vertices(filt.visit.g.sri, which(V(visit.g.sri)$count < 
    31))
visitor.group.dat = group.dat[match(unique(V(visit.g.sri)$GROUP), 
    group.dat$GROUP), ]

all.home.angles = bearing(matrix(rep(c(center.long, center.lat), 
    nrow(visitor.group.dat)), ncol = 2, byrow = T), matrix(c(visitor.group.dat$long, 
    visitor.group.dat$lat), ncol = 2))
all.distances = distHaversine(matrix(rep(c(center.long, center.lat), 
    nrow(visitor.group.dat)), ncol = 2, byrow = T), matrix(c(visitor.group.dat$long, 
    visitor.group.dat$lat), ncol = 2))

visitor.group.dat$hue = (all.home.angles + 180)/360
visitor.group.dat$saturation = all.distances/max(all.distances, 
    na.rm = T)
visitor.group.dat$saturation[which(visitor.group.dat$saturation > 
    1)] = 1
V(filt.visit.g.sri)$color = hsv(visitor.group.dat[match(V(filt.visit.g.sri)$GROUP, 
    visitor.group.dat$GROUP), "hue"], visitor.group.dat[match(V(filt.visit.g.sri)$GROUP, 
    visitor.group.dat$GROUP), "saturation"], v = 1)

spat.layout.visit = make_spatial_layout(net = filt.visit.g.sri, 
    id_data = id.dat, group_data = visitor.group.dat)

plot(filt.visit.g.sri, edge.width = E(filt.visit.g.sri)$weight * 
    30, vertex.size = 5, vertex.label = "", layout = spat.layout.visit, 
    edge.curved = T, edge.color = "black", main = "Figure 6a")

Figure S23: The co-visitation social network, using a spatial layout

3.5.3 Plot the visitor network by territory (Figure 6b-d)

In Figures 6b-d in the main text, we show how the structure of the visitor-only network is very different depending on the territory where you are monitoring the visitors. This is because territories vary in ‘how interesting’ they are to visitors. We do not yet know exactly what factors lead to this variability–e.g., is it the resources (granaries) that are there, or the specific group members that are there, etc? That will be an ongoing area of research. For now, we just demonstrate how we can generate site-specific social networks. To do this, we generate separate networks by receiver station by using the “stations” argument in the array_to_net() function.

g.MLF2 = array_to_net(visitor.copresence.array, visitor.detection.array, 
    stations = "MLF2")

g.HORS = array_to_net(visitor.copresence.array, visitor.detection.array, 
    stations = "HORS")

g.1800 = array_to_net(visitor.copresence.array, visitor.detection.array, 
    stations = "1800")

g.SHAN = array_to_net(visitor.copresence.array, visitor.detection.array, 
    stations = "SHAN")

Then, we organize these networks into a list.

nets.list = list(MLF2 = g.MLF2, HORS = g.HORS, `1800` = g.1800, 
    SHAN = g.SHAN)

Then, we can just use a loop (or you can use an apply function if you like) to plot those networks independently into different panels.

par(mfrow = c(2, 2), mar = c(1, 1, 1, 1))
for (i in 1:length(nets.list)) {
    g = nets.list[[i]]
    spl = make_spatial_layout(net = g, id_data = id.dat, group_data = visitor.group.dat)
    V(g)$status = as.character(id.dat[match(V(g)$name, id.dat$BIRD), 
        "SEXSTATUS"])
    V(g)$GROUP = as.character(id.dat[match(V(g)$name, id.dat$BIRD), 
        "GROUP"])
    V(g)$color = hsv(visitor.group.dat$hue[match(V(g)$GROUP, 
        visitor.group.dat$GROUP)], visitor.group.dat$saturation[match(V(g)$GROUP, 
        visitor.group.dat$GROUP)], v = 1)
    
    plot(g, edge.width = E(g)$weight * 100, vertex.size = 0.1, 
        vertex.label = "", edge.color = "black", layout = spl, 
        edge.curved = T, rescale = F, xlim = c(-121.57, -121.54), 
        ylim = c(36.36, 36.395), asp = 0, frame = T, main = paste(names(nets.list)[[i]]))
    
    points(visitor.group.dat$long[which(visitor.group.dat$GROUP == 
        names(nets.list)[[i]])], visitor.group.dat$lat[which(visitor.group.dat$GROUP == 
        names(nets.list)[[i]])], pch = "*", col = "red", cex = 2)
}

Figure S24: The co-visitation social network broken down by receiver station.

Part 4: Why we don’t use GMM events

Gaussian Mixture Modelling (GMM) is now commonly used in the workflow to build social networks from datastreams, primarily in the context of RFID-based studies of social association (Psorakis et al. 2012, 2015). In the main text, we briefly mention why Gaussian Mixture Models are not appropriate for the type of data we use here (i.e., detection of individuals at territories). We elaborate on the reasoning here.

Psorakis et al. (2012, 2015) first applied GMM to a data stream that represented how RFID-tagged birds visit bird feeders in Wytham Woods, UK. It is important to note that this method was designed to solve a particular solution–how do we develop robust criteria for identifying ‘foraging groups’ from temporal data streams alone? That is, if we can’t know anything about spatial proximity of individuals, how can we figure out what is a group just based on the time stamps of visits to a particular location (i.e., an antenna that is placed in front of a bird feeder)?

Prior to the GMM approach, people imposed arbitrary thresholds to define associations. So, let’s say, our threshold is 10 seconds. Then, we would say that any pair of birds that were logged at the antenna within 10 seconds of each other are associated. But what do we know that birds that visit the feeder 11 seconds apart are NOT part of the same group? And are birds that visit within 9 seconds of each other always in the same group?

The GMM approach avoids this arbitrariness by using machine learning to detect ‘bursts’ of activity at the bird feeder, which we call ‘gathering events’. Birds that appear in the same gathering event are then considered part of the same group.

In summary, the GMM approach is geared towards finding bursts of activity that indicate time periods where a group of individuals came to forage at a feeder collectively. In theory, automated telemetry systems can be used to accomplish a similar task–i.e., to detect bursts of activity within range of a receiver station.

But there are at least two critical differences between automated telemetry data and RFID data. First, individuals could ‘reside’ within range of a receiver in automated telemetry systems, which extends to hundreds of meters, whereas this would likely not happen with RFID systems where the detection range is less than one meter. This is especially true when automated telemetry receivers are placed in territories–like our woodpecker study. Another difference is that multiple individuals can be detected nearly simultaneously at a single automated telemetry receiver, whereas RFID data will only detect one individual (because only one individual can fit within the antenna’s detection range). This changes the density of the data stream we expect to get from automated telemetry and RFID.

Basically, the reason we do not use GMM approaches to construct social networks here is because of these and primarily the former reason–i.e., the receivers are set up to detect ‘residency’ rather than ‘gathering events’. When an automated telemetry system is designed to geared towards detecting such gathering events, it may be perfectly fine to use GMM methods (as seen in Jacoby et al. 2016). But we suspect that there are many situations in which GMM approach is not well-suited for detecting associations in automated telemetry.

In summary, the Gaussian Mixture Model approach for detecting gathering events is not designed to delineate association patterns when some individuals simply stay at a location for extended periods of time regardless of group structure. This is why we do not believe it is appropriate to use this approach for building socical networks with automated telemetry systems embedded in territories.

As an illustration, the Figure S25 below shows what the ‘gathering events’ look like when applied to our woodpecker data. The figure shows detections at MACR over 2 hours (as shown above), but with the time ranges of ‘gathering events’ detected using the gmmevents() function in the asnipe package.

 *Figure S25: Detections over 2 hours (9am - 11am) at territory **MACR**, and gathering events detected by Gaussian Mixture Modelling*

Figure S25: Detections over 2 hours (9am - 11am) at territory MACR, and gathering events detected by Gaussian Mixture Modelling

You can see that almost every second of the 2 hour period is considered a gathering event, and many of the events are defined by gaps in the tag bursts, which could simply be detection error.