Tidy Tuesday caribou tracking data.
This week’s data is about caribou in British Columbia, Canada.
Data: (https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-06-23/readme.md)
tuesdata <- tidytuesdayR::tt_load('2020-06-23')
Downloading file 1 of 2: `locations.csv`
Downloading file 2 of 2: `individuals.csv`
#locations <- tuesdata$locations
individuals <- tuesdata$individuals
There are 2 data sets, one on individuals and one on locations. What data is there about the individuals?
{skimr} is a great package for data overview. As a pro tip, you can use use {skimr} and git diffs to quickly see what has changed with a data set.
skimr::skim(individuals)
Name | individuals |
Number of rows | 286 |
Number of columns | 14 |
_______________________ | |
Column type frequency: | |
character | 8 |
logical | 2 |
numeric | 4 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
animal_id | 0 | 1.00 | 6 | 10 | 0 | 260 | 0 |
sex | 0 | 1.00 | 1 | 1 | 0 | 2 | 0 |
life_stage | 219 | 0.23 | 3 | 5 | 0 | 8 | 0 |
death_cause | 232 | 0.19 | 7 | 97 | 0 | 16 | 0 |
study_site | 0 | 1.00 | 5 | 11 | 0 | 8 | 0 |
deploy_on_comments | 199 | 0.30 | 15 | 167 | 0 | 71 | 0 |
deploy_off_type | 0 | 1.00 | 4 | 7 | 0 | 4 | 0 |
deploy_off_comments | 230 | 0.20 | 4 | 82 | 0 | 8 | 0 |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
pregnant | 267 | 0.07 | 0.84 | TRU: 16, FAL: 3 |
with_calf | 202 | 0.29 | 0.21 | FAL: 66, TRU: 18 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
deploy_on_longitude | 133 | 0.53 | -121.89 | 0.77 | -123.36 | -122.58 | -122.11 | -121.34 | -120.22 | ▂▇▂▆▁ |
deploy_on_latitude | 133 | 0.53 | 55.04 | 2.21 | 28.13 | 54.94 | 55.14 | 55.33 | 55.99 | ▁▁▁▁▇ |
deploy_off_longitude | 230 | 0.20 | -121.95 | 0.80 | -123.62 | -122.54 | -122.27 | -121.35 | -119.80 | ▁▇▃▃▁ |
deploy_off_latitude | 230 | 0.20 | 55.21 | 0.36 | 54.62 | 54.91 | 55.14 | 55.40 | 55.95 | ▅▇▅▂▅ |
Looks like a lot of missing data. What’s the sex ratio of collared individuals?
individuals %>%
tabyl(sex) %>%
adorn_pct_formatting() %>% gt()
sex | n | percent |
---|---|---|
f | 282 | 98.6% |
m | 4 | 1.4% |
They only collared 4 males – that is a very skewed sex ratio. Are male caribou tremendously hard to capture for radio collaring?
What are the deploy off types (one of the few complete variables)?
individuals %>%
tabyl(deploy_off_type) %>%
adorn_pct_formatting() %>%
gt()
deploy_off_type | n | percent |
---|---|---|
dead | 60 | 21.0% |
other | 45 | 15.7% |
removal | 82 | 28.7% |
unknown | 99 | 34.6% |
So about 1/5 died, 1/3 have been removed, over 1/3 are unknown, and a bunch are ‘other’. That is a lot of radio collars, which are not cheap, with an unknown status. Is there anything useful in the ‘deploy_off_comments’?
individuals %>%
tabyl(deploy_off_comments) %>%
adorn_pct_formatting() %>%
gt()
deploy_off_comments | n | percent | valid_percent |
---|---|---|---|
deployment end time set to exclude locations in Fort St John | 1 | 0.3% | 1.8% |
Infometrics | 4 | 1.4% | 7.1% |
Lost | 17 | 5.9% | 30.4% |
Not Monitored | 2 | 0.7% | 3.6% |
Recovered failed collar and replaced with a new one. | 8 | 2.8% | 14.3% |
Recovered failed GPS collar and replaced with a new one. | 1 | 0.3% | 1.8% |
Recovered non-working green GPS collar. Cow very skinny. | 1 | 0.3% | 1.8% |
the tag remained deployed as of 7/31/2016 when data were sent to be added Movebank | 22 | 7.7% | 39.3% |
NA | 230 | 80.4% | - |
Not really - more missing data- and what the heck does “Infometrics” mean.. do these researchers have a graduate adviser?
Being somewhat morbid, I’m curious about how these caribou died.
individuals %>%
filter(deploy_off_type == 'dead') %>%
mutate(death_cause = str_to_lower(death_cause)) %>%
tabyl(death_cause) %>%
adorn_pct_formatting() %>%
gt()
death_cause | n | percent | valid_percent |
---|---|---|---|
accidental, collar still in field, inaccessible | 1 | 1.7% | 1.9% |
accidental. caught in tree well | 1 | 1.7% | 1.9% |
collar still in field | 2 | 3.3% | 3.8% |
predation - grizzly | 3 | 5.0% | 5.7% |
predation - grizzly bear | 1 | 1.7% | 1.9% |
predation - unknown predator | 5 | 8.3% | 9.4% |
predation - wolf | 13 | 21.7% | 24.5% |
train collision | 1 | 1.7% | 1.9% |
unknown | 11 | 18.3% | 20.8% |
unknown. collar inaccessible and still in field. | 1 | 1.7% | 1.9% |
unknown. found 50m from trend open pit mine. area littered with rock from blasts from the mine. | 1 | 1.7% | 1.9% |
unknown. suspected predation | 7 | 11.7% | 13.2% |
unknown. suspected wolf (or possibly wolverine) predation | 1 | 1.7% | 1.9% |
unknown. suspected wolf predation | 4 | 6.7% | 7.5% |
vehicle collision | 1 | 1.7% | 1.9% |
NA | 7 | 11.7% | - |
Let’s do a bit of recoding on these comments.
deceased <- individuals %>%
mutate(death_cause = str_to_lower(death_cause)) %>%
filter(deploy_off_type == 'dead') %>%
mutate(cause_of_death = case_when(
death_cause == 'accidental, collar still in field, inaccessible' ~ "accident",
death_cause == 'accidental. caught in tree well' ~ "accident",
death_cause == 'predation - grizzly' ~ "predation - grizzly bear",
death_cause == 'collar still in field' ~ "unknown",
death_cause == 'unknown. collar inaccessible and still in field.'~ "unknown",
death_cause == 'unknown. found 50m from trend open pit mine. area littered with rock from blasts from the mine.' ~ "explosion",
death_cause == 'unknown. suspected predation' ~ "suspected predation",
death_cause == 'unknown. suspected wolf (or possibly wolverine) predation' ~ "suspected predation",
death_cause == 'unknown. suspected wolf predation' ~ "suspected predation",
is.na((death_cause)) ~ 'unknown',
TRUE ~ death_cause
))
deceased %>%
tabyl(cause_of_death) %>%
adorn_pct_formatting() %>%
gt()
cause_of_death | n | percent |
---|---|---|
accident | 2 | 3.3% |
explosion | 1 | 1.7% |
predation - grizzly bear | 4 | 6.7% |
predation - unknown predator | 5 | 8.3% |
predation - wolf | 13 | 21.7% |
suspected predation | 12 | 20.0% |
train collision | 1 | 1.7% |
unknown | 21 | 35.0% |
vehicle collision | 1 | 1.7% |
So these researchers need some QC on their data collection, but they’re busy scientists so we’ll forgive them. It looks like wolf predation is an issue for caribou -but also grizzly bears. One poor caribou was hit by a train, another by a car, and it seems like a researcher is suggesting -possibly- one got killed by debris from a blast at a mine! Being a caribou sounds rough.
Let’s look at the locations of where these caribou met their fates. While ggplot spatial functions are evolving rapidly, The {sf} package has become the default package for working with spatial data and plays nicely with tidyverse principals and tools. Just a note on coordinate precision. Each decimal place is about a factor of 10. These locations are accurate to within about 10 meters. For more on this – XCD of course has the definitive summary
deceased_sf <- deceased %>%
filter(!is.na(deploy_off_longitude)) %>%
st_as_sf(., coords = c("deploy_off_longitude", "deploy_off_latitude"), crs = 4326)
{tmap} is an excellent package for interactive mapping in R that goes a bit beyond ggplot. Other good mapping packages worth checking out are {Mapview} and {mapdeck}.