It’s tough to be a caribou

Tidy Tuesday caribou tracking data.

Author

Affiliation

David Fox

Published

Aug. 1, 2020

DOI

This week’s data is about caribou in British Columbia, Canada.

Data: (https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-06-23/readme.md)

tuesdata <- tidytuesdayR::tt_load('2020-06-23')


    Downloading file 1 of 2: `locations.csv`
    Downloading file 2 of 2: `individuals.csv`

#locations <- tuesdata$locations
individuals <-  tuesdata$individuals

There are 2 data sets, one on individuals and one on locations. What data is there about the individuals?

{skimr} is a great package for data overview. As a pro tip, you can use use {skimr} and git diffs to quickly see what has changed with a data set.

skimr::skim(individuals)

Table 1: Data summary
Name	individuals
Number of rows	286
Number of columns	14
_______________________
Column type frequency:
character	8
logical	2
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
animal_id	0	1.00	6	10	260
sex	0	1.00	1	1	2
life_stage	219	0.23	3	5	8
death_cause	232	0.19	7	97	16
study_site	0	1.00	5	11	8
deploy_on_comments	199	0.30	15	167	71
deploy_off_type	0	1.00	4	7	4
deploy_off_comments	230	0.20	4	82	8

Variable type: logical

skim_variable	n_missing	complete_rate	mean	count
pregnant	267	0.07	0.84	TRU: 16, FAL: 3
with_calf	202	0.29	0.21	FAL: 66, TRU: 18

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
deploy_on_longitude	133	0.53	-121.89	0.77	-123.36	-122.58	-122.11	-121.34	-120.22	▂▇▂▆▁
deploy_on_latitude	133	0.53	55.04	2.21	28.13	54.94	55.14	55.33	55.99	▁▁▁▁▇
deploy_off_longitude	230	0.20	-121.95	0.80	-123.62	-122.54	-122.27	-121.35	-119.80	▁▇▃▃▁
deploy_off_latitude	230	0.20	55.21	0.36	54.62	54.91	55.14	55.40	55.95	▅▇▅▂▅

Looks like a lot of missing data. What’s the sex ratio of collared individuals?

individuals %>%
  tabyl(sex) %>%
  adorn_pct_formatting() %>% gt()

sex	n	percent
f	282	98.6%
m	4	1.4%

They only collared 4 males – that is a very skewed sex ratio. Are male caribou tremendously hard to capture for radio collaring?

What are the deploy off types (one of the few complete variables)?

individuals %>%
  tabyl(deploy_off_type) %>%
  adorn_pct_formatting() %>%
  gt()

deploy_off_type	n	percent
dead	60	21.0%
other	45	15.7%
removal	82	28.7%
unknown	99	34.6%

So about 1/5 died, 1/3 have been removed, over 1/3 are unknown, and a bunch are ‘other’. That is a lot of radio collars, which are not cheap, with an unknown status. Is there anything useful in the ‘deploy_off_comments’?

individuals %>%
  tabyl(deploy_off_comments) %>%
  adorn_pct_formatting() %>%
  gt()

deploy_off_comments	n	percent	valid_percent
deployment end time set to exclude locations in Fort St John	1	0.3%	1.8%
Infometrics	4	1.4%	7.1%
Lost	17	5.9%	30.4%
Not Monitored	2	0.7%	3.6%
Recovered failed collar and replaced with a new one.	8	2.8%	14.3%
Recovered failed GPS collar and replaced with a new one.	1	0.3%	1.8%
Recovered non-working green GPS collar. Cow very skinny.	1	0.3%	1.8%
the tag remained deployed as of 7/31/2016 when data were sent to be added Movebank	22	7.7%	39.3%
NA	230	80.4%	-

Not really - more missing data- and what the heck does “Infometrics” mean.. do these researchers have a graduate adviser?

I’m being snarky, but I spent 5 years collecting biological field data. One of the big challenges is taking the time to document your research and be a good data steward. There may be mitigating factors I’m not aware of with this data set, but if I was managing this project as a researcher or a grantor I might not be very happy with the quality of this data collection.

Being somewhat morbid, I’m curious about how these caribou died.

individuals %>%
  filter(deploy_off_type == 'dead') %>%
  mutate(death_cause = str_to_lower(death_cause)) %>%
  tabyl(death_cause) %>%
  adorn_pct_formatting() %>%
  gt()

death_cause	n	percent	valid_percent
accidental, collar still in field, inaccessible	1	1.7%	1.9%
accidental. caught in tree well	1	1.7%	1.9%
collar still in field	2	3.3%	3.8%
predation - grizzly	3	5.0%	5.7%
predation - grizzly bear	1	1.7%	1.9%
predation - unknown predator	5	8.3%	9.4%
predation - wolf	13	21.7%	24.5%
train collision	1	1.7%	1.9%
unknown	11	18.3%	20.8%
unknown. collar inaccessible and still in field.	1	1.7%	1.9%
unknown. found 50m from trend open pit mine. area littered with rock from blasts from the mine.	1	1.7%	1.9%
unknown. suspected predation	7	11.7%	13.2%
unknown. suspected wolf (or possibly wolverine) predation	1	1.7%	1.9%
unknown. suspected wolf predation	4	6.7%	7.5%
vehicle collision	1	1.7%	1.9%
NA	7	11.7%	-

Let’s do a bit of recoding on these comments.

deceased <- individuals %>%
  mutate(death_cause = str_to_lower(death_cause)) %>% 
           filter(deploy_off_type == 'dead') %>% 
  mutate(cause_of_death = case_when(
    death_cause == 'accidental, collar still in field, inaccessible' ~ "accident",
    death_cause == 'accidental.  caught in tree well'                 ~ "accident",
    death_cause == 'predation - grizzly'                             ~ "predation - grizzly bear",
    death_cause == 'collar still in field'                           ~ "unknown",
    death_cause == 'unknown.  collar inaccessible and still in field.'~ "unknown",
    death_cause == 'unknown.  found 50m from trend open pit mine.  area littered with rock from blasts from the mine.' ~ "explosion",
    death_cause == 'unknown.  suspected predation' ~ "suspected predation",
    death_cause == 'unknown.  suspected wolf (or possibly wolverine) predation' ~ "suspected predation",
    death_cause == 'unknown.  suspected wolf predation' ~ "suspected predation",
    is.na((death_cause)) ~ 'unknown',
    TRUE ~ death_cause
))

deceased %>%
  tabyl(cause_of_death) %>%
  adorn_pct_formatting() %>%
  gt()

cause_of_death	n	percent
accident	2	3.3%
explosion	1	1.7%
predation - grizzly bear	4	6.7%
predation - unknown predator	5	8.3%
predation - wolf	13	21.7%
suspected predation	12	20.0%
train collision	1	1.7%
unknown	21	35.0%
vehicle collision	1	1.7%

So these researchers need some QC on their data collection, but they’re busy scientists so we’ll forgive them. It looks like wolf predation is an issue for caribou -but also grizzly bears. One poor caribou was hit by a train, another by a car, and it seems like a researcher is suggesting -possibly- one got killed by debris from a blast at a mine! Being a caribou sounds rough.

Let’s look at the locations of where these caribou met their fates. While ggplot spatial functions are evolving rapidly, The {sf} package has become the default package for working with spatial data and plays nicely with tidyverse principals and tools. Just a note on coordinate precision. Each decimal place is about a factor of 10. These locations are accurate to within about 10 meters. For more on this – XCD of course has the definitive summary

deceased_sf <- deceased %>%
  filter(!is.na(deploy_off_longitude)) %>%
  st_as_sf(., coords = c("deploy_off_longitude", "deploy_off_latitude"), crs = 4326)

{tmap} is an excellent package for interactive mapping in R that goes a bit beyond ggplot. Other good mapping packages worth checking out are {Mapview} and {mapdeck}.

tmap_mode("view")

tm_shape(deceased_sf) +
  tm_dots(col = "cause_of_death", 
             palette = "plasma")

Footnotes

I’m being snarky, but I spent 5 years collecting biological field data. One of the big challenges is taking the time to document your research and be a good data steward. There may be mitigating factors I’m not aware of with this data set, but if I was managing this project as a researcher or a grantor I might not be very happy with the quality of this data collection.[↩]