It’s tough to be a caribou

Tidy Tuesday caribou tracking data.

Author

Affiliation

David Fox

 

Published

Aug. 1, 2020

DOI

This week’s data is about caribou in British Columbia, Canada.

Data: (https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-06-23/readme.md)

tuesdata <- tidytuesdayR::tt_load('2020-06-23')

    Downloading file 1 of 2: `locations.csv`
    Downloading file 2 of 2: `individuals.csv`
#locations <- tuesdata$locations
individuals <-  tuesdata$individuals

There are 2 data sets, one on individuals and one on locations. What data is there about the individuals?

{skimr} is a great package for data overview. As a pro tip, you can use use {skimr} and git diffs to quickly see what has changed with a data set.

skimr::skim(individuals)
Table 1: Data summary
Name individuals
Number of rows 286
Number of columns 14
_______________________
Column type frequency:
character 8
logical 2
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
animal_id 0 1.00 6 10 0 260 0
sex 0 1.00 1 1 0 2 0
life_stage 219 0.23 3 5 0 8 0
death_cause 232 0.19 7 97 0 16 0
study_site 0 1.00 5 11 0 8 0
deploy_on_comments 199 0.30 15 167 0 71 0
deploy_off_type 0 1.00 4 7 0 4 0
deploy_off_comments 230 0.20 4 82 0 8 0

Variable type: logical

skim_variable n_missing complete_rate mean count
pregnant 267 0.07 0.84 TRU: 16, FAL: 3
with_calf 202 0.29 0.21 FAL: 66, TRU: 18

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
deploy_on_longitude 133 0.53 -121.89 0.77 -123.36 -122.58 -122.11 -121.34 -120.22 ▂▇▂▆▁
deploy_on_latitude 133 0.53 55.04 2.21 28.13 54.94 55.14 55.33 55.99 ▁▁▁▁▇
deploy_off_longitude 230 0.20 -121.95 0.80 -123.62 -122.54 -122.27 -121.35 -119.80 ▁▇▃▃▁
deploy_off_latitude 230 0.20 55.21 0.36 54.62 54.91 55.14 55.40 55.95 ▅▇▅▂▅

Looks like a lot of missing data. What’s the sex ratio of collared individuals?

individuals %>%
  tabyl(sex) %>%
  adorn_pct_formatting() %>% gt()
sex n percent
f 282 98.6%
m 4 1.4%

They only collared 4 males – that is a very skewed sex ratio. Are male caribou tremendously hard to capture for radio collaring?

What are the deploy off types (one of the few complete variables)?

individuals %>%
  tabyl(deploy_off_type) %>%
  adorn_pct_formatting() %>%
  gt()
deploy_off_type n percent
dead 60 21.0%
other 45 15.7%
removal 82 28.7%
unknown 99 34.6%

So about 1/5 died, 1/3 have been removed, over 1/3 are unknown, and a bunch are ‘other’. That is a lot of radio collars, which are not cheap, with an unknown status. Is there anything useful in the ‘deploy_off_comments’?

individuals %>%
  tabyl(deploy_off_comments) %>%
  adorn_pct_formatting() %>%
  gt()
deploy_off_comments n percent valid_percent
deployment end time set to exclude locations in Fort St John 1 0.3% 1.8%
Infometrics 4 1.4% 7.1%
Lost 17 5.9% 30.4%
Not Monitored 2 0.7% 3.6%
Recovered failed collar and replaced with a new one. 8 2.8% 14.3%
Recovered failed GPS collar and replaced with a new one. 1 0.3% 1.8%
Recovered non-working green GPS collar. Cow very skinny. 1 0.3% 1.8%
the tag remained deployed as of 7/31/2016 when data were sent to be added Movebank 22 7.7% 39.3%
NA 230 80.4% -

Not really - more missing data- and what the heck does “Infometrics” mean.. do these researchers have a graduate adviser?

I’m being snarky, but I spent 5 years collecting biological field data. One of the big challenges is taking the time to document your research and be a good data steward. There may be mitigating factors I’m not aware of with this data set, but if I was managing this project as a researcher or a grantor I might not be very happy with the quality of this data collection.
1

Being somewhat morbid, I’m curious about how these caribou died.

individuals %>%
  filter(deploy_off_type == 'dead') %>%
  mutate(death_cause = str_to_lower(death_cause)) %>%
  tabyl(death_cause) %>%
  adorn_pct_formatting() %>%
  gt()
death_cause n percent valid_percent
accidental, collar still in field, inaccessible 1 1.7% 1.9%
accidental. caught in tree well 1 1.7% 1.9%
collar still in field 2 3.3% 3.8%
predation - grizzly 3 5.0% 5.7%
predation - grizzly bear 1 1.7% 1.9%
predation - unknown predator 5 8.3% 9.4%
predation - wolf 13 21.7% 24.5%
train collision 1 1.7% 1.9%
unknown 11 18.3% 20.8%
unknown. collar inaccessible and still in field. 1 1.7% 1.9%
unknown. found 50m from trend open pit mine. area littered with rock from blasts from the mine. 1 1.7% 1.9%
unknown. suspected predation 7 11.7% 13.2%
unknown. suspected wolf (or possibly wolverine) predation 1 1.7% 1.9%
unknown. suspected wolf predation 4 6.7% 7.5%
vehicle collision 1 1.7% 1.9%
NA 7 11.7% -

Let’s do a bit of recoding on these comments.

deceased <- individuals %>%
  mutate(death_cause = str_to_lower(death_cause)) %>% 
           filter(deploy_off_type == 'dead') %>% 
  mutate(cause_of_death = case_when(
    death_cause == 'accidental, collar still in field, inaccessible' ~ "accident",
    death_cause == 'accidental.  caught in tree well'                 ~ "accident",
    death_cause == 'predation - grizzly'                             ~ "predation - grizzly bear",
    death_cause == 'collar still in field'                           ~ "unknown",
    death_cause == 'unknown.  collar inaccessible and still in field.'~ "unknown",
    death_cause == 'unknown.  found 50m from trend open pit mine.  area littered with rock from blasts from the mine.' ~ "explosion",
    death_cause == 'unknown.  suspected predation' ~ "suspected predation",
    death_cause == 'unknown.  suspected wolf (or possibly wolverine) predation' ~ "suspected predation",
    death_cause == 'unknown.  suspected wolf predation' ~ "suspected predation",
    is.na((death_cause)) ~ 'unknown',
    TRUE ~ death_cause
))

deceased %>%
  tabyl(cause_of_death) %>%
  adorn_pct_formatting() %>%
  gt()
cause_of_death n percent
accident 2 3.3%
explosion 1 1.7%
predation - grizzly bear 4 6.7%
predation - unknown predator 5 8.3%
predation - wolf 13 21.7%
suspected predation 12 20.0%
train collision 1 1.7%
unknown 21 35.0%
vehicle collision 1 1.7%

So these researchers need some QC on their data collection, but they’re busy scientists so we’ll forgive them. It looks like wolf predation is an issue for caribou -but also grizzly bears. One poor caribou was hit by a train, another by a car, and it seems like a researcher is suggesting -possibly- one got killed by debris from a blast at a mine! Being a caribou sounds rough.

Let’s look at the locations of where these caribou met their fates. While ggplot spatial functions are evolving rapidly, The {sf} package has become the default package for working with spatial data and plays nicely with tidyverse principals and tools. Just a note on coordinate precision. Each decimal place is about a factor of 10. These locations are accurate to within about 10 meters. For more on this – XCD of course has the definitive summary

deceased_sf <- deceased %>%
  filter(!is.na(deploy_off_longitude)) %>%
  st_as_sf(., coords = c("deploy_off_longitude", "deploy_off_latitude"), crs = 4326)

{tmap} is an excellent package for interactive mapping in R that goes a bit beyond ggplot. Other good mapping packages worth checking out are {Mapview} and {mapdeck}.

tmap_mode("view")

tm_shape(deceased_sf) +
  tm_dots(col = "cause_of_death", 
             palette = "plasma")
cause_of_death
accident
explosion
predation - grizzly bear
predation - unknown predator
predation - wolf
suspected predation
train collision
unknown
vehicle collision
Leaflet | Tiles © Esri — Esri, DeLorme, NAVTEQ

Footnotes

  1. I’m being snarky, but I spent 5 years collecting biological field data. One of the big challenges is taking the time to document your research and be a good data steward. There may be mitigating factors I’m not aware of with this data set, but if I was managing this project as a researcher or a grantor I might not be very happy with the quality of this data collection.[↩]