Shelter usage in Toronto (2017-2021)

I look at Toronto shelter usage numbers between 2017 and January 2021. I document and adjust for a systematic error in the 2017 data. The data show that homelessness in Toronto is a large problem; essentially all shelters are almost always full. COVID changed the nature of the problem by reducing the number of people using shelters, however they must be sleeping somewhere. I compare January 2021 with January 2020 and estimate that following COVID there are now around an additional 3,500 people sleeping somewhere outside of shelters, possibly outdoors.

Author
Published

January 24, 2021

Introduction

The extent of homelessness in Toronto was highlighted for the R community in December 2020 when the dataset was used in TidyTuesday. That dataset contained data for 2017 to 2019, inclusive. In this post I expand the dataset through to January 2021 to see what has happened since COVID. I also document and adjust for a systematic error in the 2017 dataset. I compare the usage of shelters this year with last year. I estimate that on Friday, 22 January 2021, when it was -11C with winds of 17km/h, there were roughly 3,634 additional people sleeping somewhere other than a shelter, compared with this time last year.

Set up workspace and gather data

I’ll use the R statistical programming language (R Core Team 2020). The datasets are accessed via the opendatatoronto package (Gelfand 2020). This package wraps around the City of Toronto’s Open Data Portal and allows the direct import of data rather than the need to visit the website. The website is great, but using the package enhances reproducibility. I’ll use the tidyverse package to make data manipulation easier (Wickham et al. 2019).

library(opendatatoronto)
library(tidyverse)
── Attaching core tidyverse packages ──── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Get the data
# Based on https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-12-01/readme.md
all_data <- 
  opendatatoronto::search_packages("Daily Shelter Occupancy") %>% 
  opendatatoronto::list_package_resources() %>% 
  dplyr::filter(name %in% c("daily-shelter-occupancy-2017-csv",
                     "daily-shelter-occupancy-2018-csv", 
                     "daily-shelter-occupancy-2019-csv", 
                     "daily-shelter-occupancy-2020.csv")) %>% 
  group_split(name) %>% # Don't totally understand how this works
  map_dfr(get_resource, .id = "file")
write_csv(all_data, "inputs/raw_data.csv")

Let’s just have a quick look at the data.

all_data <- read_csv("inputs/raw_data.csv", 
                     col_types = c("iiccccccccccii")) 
# The col_types hieroglyphics above stand for integer, integer, character... etc.
head(all_data)
# A tibble: 6 × 14
   file `_id` OCCUPANCY_DATE  ORGANIZATION_NAME SHELTER_NAME
  <int> <int> <chr>           <chr>             <chr>       
1     1     1 2017-01-01T00:… COSTI Immigrant … COSTI Recep…
2     1     2 2017-01-01T00:… Christie Ossingt… Christie Os…
3     1     3 2017-01-01T00:… Christie Ossingt… Christie Os…
4     1     4 2017-01-01T00:… Christie Refugee… Christie Re…
5     1     5 2017-01-01T00:… City of Toronto   Birchmount …
6     1     6 2017-01-01T00:… City of Toronto   Birkdale Re…
# ℹ 9 more variables: SHELTER_ADDRESS <chr>,
#   SHELTER_CITY <chr>, SHELTER_PROVINCE <chr>,
#   SHELTER_POSTAL_CODE <chr>, FACILITY_NAME <chr>,
#   PROGRAM_NAME <chr>, SECTOR <chr>, OCCUPANCY <int>,
#   CAPACITY <int>

Data cleaning and preparation

Make the data easier to deal with

The column names aren’t overly nice to type, there are a few columns that we’re not really going to use much, and finally a few of the columns have data that are less obvious than they should be. For instance, the ‘file’ tells us the year of the data, but because of the import settings it’s 1, 2… instead of 2017, 2018….

toronto_shelters <-
  all_data %>% 
  janitor::clean_names() %>% # Make the column names easier to type. Thanks Sharla!
  mutate(file_year = 
           case_when(
             file == "1" ~ 2017,
             file == "2" ~ 2018,
             file == "3" ~ 2019,
             file == "4" ~ 2020,
             TRUE ~ -1)
  ) %>% # Just make the column easier to deal with
  select(-id, -file)

The main issue with the data is the dates. In 2017-2019 (inclusive) they appear to be year-month-day, but for 2020 it seems like month-day-year. The separator is also inconsistent between ‘-’ and ‘/’. I’ll first clean that up, check our guesses, and then get to the main issue. I’m going to draw on the lubridate package (Grolemund and Wickham 2011).

library(lubridate)
toronto_shelters <- 
  toronto_shelters %>% 
  # 1st line removes times (probs don't actually need to do) and 2nd makes the separation consistent
  mutate(occupancy_date = str_remove(occupancy_date, "T[:digit:]{2}:[:digit:]{2}:[:digit:]{2}"),
         occupancy_date = str_replace_all(occupancy_date, "/", "-")
  ) %>% 
  # Parsing differs between 2017-2019 and 2020. Last line is a catch-all - shouldn't get there.
  mutate(date = case_when(
    file_year == "2020" ~ mdy(occupancy_date, quiet = TRUE), 
    file_year %in% c("2017", "2018", "2019") ~ ymd(occupancy_date, quiet = TRUE),
    TRUE ~ NA_Date_
    )
    ) %>% 
  select(file_year, date, occupancy_date, organization_name:capacity)

Check content of day, month, and year

Let’s just check that my guess of the date orderings was at least plausible by looking at the distribution of year, month, and day bits.

toronto_shelters <- 
  toronto_shelters %>% 
  separate(occupancy_date, into = c('one', 'two', 'three'), sep = "-", remove = FALSE)
toronto_shelters %>% 
  filter(file_year %in% c(2017, 2018, 2019)) %>% 
  count(one) %>% 
  rename(Year = one, Number = n) %>% 
  kableExtra::kbl(caption = "Count of entries by year for 2017-2019") %>%
  kableExtra::kable_styling()
Count of entries by year for 2017-2019
Year Number
2017 38700
2018 37770
2019 39446
toronto_shelters %>% 
  filter(file_year %in% c(2017, 2018, 2019)) %>% 
  count(two) %>% 
  rename(Month = two, Number = n) %>% 
  kableExtra::kbl(caption = "Count of entries by month for 2017-2019") %>%
  kableExtra::kable_styling()
Count of entries by month for 2017-2019
Month Number
01 9747
02 8868
03 9912
04 9600
05 9950
06 9625
07 9850
08 9743
09 9430
10 9694
11 9497
12 10000
toronto_shelters %>% 
  filter(file_year %in% c(2017, 2018, 2019)) %>% 
  count(three) %>% 
  ggplot(aes(x = three, y = n)) +
  geom_point() +
  theme_minimal() +
  labs(x = "Day",
       y = "Number")

Distribution of days for 2017-2019

And again, but for 2020.

toronto_shelters %>% 
  filter(file_year == 2020) %>% 
  count(one) %>% 
  rename(Month = one, Number = n) %>% 
  kableExtra::kbl(caption = "Count of entries by month for 2020") %>%
  kableExtra::kable_styling()
Count of entries by month for 2020
Month Number
01 3503
02 3277
03 3555
04 3562
05 3671
06 3534
07 3601
08 3355
09 3210
10 3317
11 3195
12 3281
toronto_shelters %>% 
  filter(file_year == 2020) %>% 
  count(two) %>% 
  ggplot(aes(x = two, y = n)) +
  geom_point() +
  theme_minimal() +
  labs(x = "Day",
       y = "Number")

Distribution of days for 2020

toronto_shelters %>% 
  filter(file_year == 2020) %>% 
  count(three) %>% 
  rename(Year = three, Number = n) %>% 
  kableExtra::kbl(caption = "Count of entries by year for 2020") %>%
  kableExtra::kable_styling()
Count of entries by year for 2020
Year Number
2020 41061

That’s all looking fine. We’d know that we have issues if the distribution of the days wasn’t roughly uniform, or if we have values other than [1-12] in the month.

Check columns agree about the year

Let’s now also check that the year implied by the date matches the year implied by the file.

toronto_shelters %>% 
  mutate(check_year = year(date) == file_year) %>% 
  filter(check_year == FALSE)
# A tibble: 0 × 18
# ℹ 18 variables: file_year <dbl>, date <date>,
#   occupancy_date <chr>, one <chr>, two <chr>,
#   three <chr>, organization_name <chr>,
#   shelter_name <chr>, shelter_address <chr>,
#   shelter_city <chr>, shelter_province <chr>,
#   shelter_postal_code <chr>, facility_name <chr>,
#   program_name <chr>, sector <chr>, occupancy <int>, …
toronto_shelters <- 
  toronto_shelters %>% 
  select(-occupancy_date, -one, -two, -three, -file_year)

That’s also fine. And I’ll clean-up by removing the unnecessary columns.

One last thing - plot raw data

Everything seems fine, but it’s always important to ‘Plot. Your. Raw. Data.’, so before moving on, I should plot the raw data to see if there’s anything else going on. (Here, students seem to get confused what ‘raw’ means; I’m using it to refer to as close to the original dataset as possible, so no sums, or averages, etc, if possible. Not necessarily before any cleaning. Sometimes your data are too disperse for that so there will be an element of manipulation. The main point is that you, at the very least, need to plot the data that you’re going to be modelling.)

Let’s just plot the order. As this dataset has been put together by a human we’d expect that it’d be in order of date. Let’s just plot the date in the order it appears in the dataset (Figure @ref(fig:plotorder)).

toronto_shelters %>% 
  mutate(row_number = c(1:nrow(toronto_shelters))) %>% 
  ggplot(aes(x = row_number, y = date), alpha = 0.1) +
  geom_point() +
  theme_minimal() +
  labs(x = "Row number",
       y = "Date")

Comparison of row number with date

😱😱😱 This is a bit of a ‘hacky’ graph but it illustrates the point which is that the data are not in order in the dataset. If they were in order, then we’d expect them to be along the diagonal.

It’s super weird that they’re not in order in the raw data. Above, I checked by splitting them into pieces (day, month, year) and the counts were okay. But the ‘hacky’ graph was pretty hacky, so let’s try to summarise the data a little and then have another look. We’ll get a count by date and the sector of the shelter.

# Based on Lisa Lendway: 
# https://github.com/llendway/tidy_tuesday_in_thirty/blob/main/2020_12_01_tidy_tuesday.Rmd
toronto_shelters_by_day <- 
  toronto_shelters %>% 
  # We only want rows with both occupancy and capacity  
  tidyr::drop_na(occupancy, capacity) %>% 
  # We want to know the occupancy by date and sector
  group_by(date, sector) %>% 
  summarise(occupancy = sum(occupancy),
            capacity = sum(capacity),
            usage = occupancy / capacity, .groups = 'drop')

head(toronto_shelters_by_day)
# A tibble: 6 × 5
  date       sector   occupancy capacity usage
  <date>     <chr>        <int>    <int> <dbl>
1 2017-01-01 Co-ed          530      582 0.911
2 2017-01-01 Families      1030     1150 0.896
3 2017-01-01 Men           1595     1706 0.935
4 2017-01-01 Women          641      697 0.920
5 2017-01-01 Youth          499      518 0.963
6 2017-01-02 Co-ed          458      582 0.787

We are interested in availability of shelter spots in Toronto on the basis of sector for each day. Different sectors focus on different folks: Co-ed, Families, Men, Women, Youth. Now for each day for each sector we have a proportion (note: horrifyingly >1 is possible). In the notes to the data we’re told that the capacity in 2020 may not be accurate, so for this chart we’ll just focus on 2017-2019 (inclusive) (Figure @ref(fig:plotoccupancyrate)).

# Graph 2017-2019 (inc)
toronto_shelters_by_day %>% 
  filter(year(date) != "2020") %>% 
  ggplot(aes(x = date, y = usage, color = sector)) + 
  geom_point(aes(group = sector), alpha = 0.3) +
  scale_y_continuous(limits = c(0, NA)) +
  labs(color = "Type",
       x = "Date",
       y = "Occupancy rate") +
  theme_minimal() +
  scale_color_brewer(palette = "Set1")

Occupancy rate per day in Toronto shelters

That one looks kind of okay, but we’ll again see the problem immediately when we plot the raw number occupied (we can bring this through to include 2020 as it’s not to do with capacity) (Figure @ref(fig:plotoccupancy)).

toronto_shelters_by_day %>% 
  ggplot(aes(x = date, y = occupancy, color = sector)) + 
  geom_point(aes(group = sector), alpha = 0.3) +
  scale_y_continuous(limits = c(0, NA)) +
  labs(color = "Type",
       x = "Date",
       y = "Occupancy (number)") +
  theme_minimal() +
  scale_color_brewer(palette = "Set1")

Occupancy per day in Toronto shelters

We can see that using modified data hides the problem. Let’s focus on 2017, as that’s where the biggest issue is and facet by month (Figure @ref(fig:sheltersin2017)).

toronto_shelters_by_day %>% 
  filter(year(date) == 2017) %>% 
  ggplot(aes(x = day(date), y = occupancy, color = sector)) + 
  geom_point(aes(group = sector), alpha = 0.3) +
  scale_y_continuous(limits = c(0, NA)) +
  labs(color = "Type",
       x = "Day",
       y = "Occupancy (number)",
       title = "Toronto shelters in 2017",
       subtitle = "Occupancy per day") +
  facet_wrap(vars(month(date, label = TRUE)),
             scales = "free_x") +
  theme_minimal() +
  scale_color_brewer(palette = "Set1")

Occupancy in Toronto shelters in 2017

Just to check, let’s plot the same for 2018 (Figure @ref(fig:sheltersin2018)).

toronto_shelters_by_day %>% 
  filter(year(date) == 2018) %>% 
  ggplot(aes(x = day(date), y = occupancy, color = sector)) + 
  geom_point(aes(group = sector), alpha = 0.3) +
  scale_y_continuous(limits = c(0, NA)) +
  labs(color = "Type",
       x = "Day",
       y = "Occupancy (number)") +
  facet_wrap(vars(month(date, label = TRUE)),
             scales = "free_x") +
  theme_minimal() +
  scale_color_brewer(palette = "Dark2")

Daily occupancy in Toronto shelters in 2018

This gives us an idea of what we ought to expect in 2017 - why should they be significantly different? To start, focus on January 2017 and see if that makes it clearer what is going on (Figure @ref(fig:sheltersinjan2017)).

toronto_shelters_by_day %>% 
  filter(year(date) == 2017) %>%
  filter(month(date) == 1) %>% 
  ggplot(aes(x = day(date), y = occupancy, color = sector)) + 
  geom_point(aes(group = sector)) +
  scale_y_continuous(limits = c(0, NA)) +
  labs(color = "Type",
       x = "Day",
       y = "Occupancy (number)") +
  theme_minimal() +
  scale_color_brewer(palette = "Set1")

Daily occupancy in Toronto shelters in January 2017

This perhaps gives us some idea of what is going on. Let’s just check February and see if it looks similar (Figure @ref(fig:sheltersinfeb2017)).

toronto_shelters_by_day %>% 
  filter(year(date) == 2017) %>%
  filter(month(date) == 2) %>% 
  ggplot(aes(x = day(date), y = occupancy, color = sector)) + 
  geom_point(aes(group = sector)) +
  scale_y_continuous(limits = c(0, NA)) +
  labs(color = "Type",
       x = "Day",
       y = "Occupancy (number)") +
  theme_minimal() +
  scale_color_brewer(palette = "Set1")

Daily occupancy in Toronto shelters in February 2017

We’ve clearly got a problem with the first twelve days of the month. We noted at the start that when you look at the data it’s a bit odd in that it’s not in order. Let’s take another look at that by going back to the data as it was given to us (as opposed to the data by day that we’ve been using) (Figure @ref(fig:sheltersin2017byrow)).

toronto_shelters %>% 
  mutate(counter = 1:nrow(toronto_shelters)) %>% 
  filter(year(date) == 2017) %>% 
  ggplot(aes(x = counter, y = date)) +
  geom_point(alpha = 0.3) +
  labs(x = "Row in the dataset",
       y = "Date of that row") +
  theme_minimal()

Date of each row in order in 2017

Although there’s no rule that says the dataset has to be in order of the date, if it were, then all the points would lie on the diagonal line. We have a lot of deviation from that. To get a sense of what we’re expecting let’s look at all four years (Figure @ref(fig:sheltersdatebyrow)).

toronto_shelters %>% 
  mutate(counter = 1:nrow(toronto_shelters)) %>% 
  ggplot(aes(x = counter, y = date)) +
  geom_point(alpha = 0.3) +
  facet_wrap(vars(year(date)),
             scales = "free") +
  labs(x = "Row in the dataset",
       y = "Date of that row") +
  theme_minimal()

Date of each row in order (2017-2020)

It looks like 2020 is as we’d expect. 2019 has a few odd situations, but not too many. 2018 has a small cluster early in the dataset and then possibly something systematic toward the middle. But it’s clear that 2017 has a large number of systematic issues.

In general, I think that in 2017 the first 12 days are the wrong way around, i.e we think it’s year-month-day, but it’s actually year-day-month, but there are exceptions. As a first pass, let’s just try to flip those first 12 days of each month and see if that helps. It’ll be fairly blunt, but hopefully gets us somewhere.

toronto_shelters <- 
  toronto_shelters %>% 
  mutate(
    year = year(date),
    month = month(date),
    day = day(date),
    date = as.character(date),
    changed_date = if_else(
      date %in% c("2017-02-01", "2017-03-01", "2017-04-01", "2017-05-01", "2017-06-01", 
                  "2017-07-01", "2017-08-01", "2017-09-01", "2017-10-01", "2017-11-01", 
                  "2017-12-01", "2017-01-02", "2017-03-02", "2017-04-02", "2017-05-02", 
                  "2017-06-02", "2017-07-02", "2017-08-02", "2017-09-02", "2017-10-02", 
                  "2017-11-02", "2017-12-02", "2017-01-03", "2017-02-03", "2017-04-03", 
                  "2017-05-03", "2017-06-03", "2017-07-03", "2017-08-03", "2017-09-03", 
                  "2017-10-03", "2017-11-03", "2017-12-03", "2017-01-04", "2017-02-04", 
                  "2017-03-04", "2017-05-04", "2017-06-04", "2017-07-04", "2017-08-04", 
                  "2017-09-04", "2017-10-04", "2017-11-04", "2017-12-04", "2017-01-05", 
                  "2017-02-05", "2017-03-05", "2017-04-05", "2017-06-05", "2017-07-05", 
                  "2017-08-05", "2017-09-05", "2017-10-05", "2017-11-05", "2017-12-05", 
                  "2017-01-06", "2017-02-06", "2017-03-06", "2017-04-06", "2017-05-06", 
                  "2017-07-06", "2017-08-06", "2017-09-06", "2017-10-06", "2017-11-06", 
                  "2017-12-06", "2017-01-07", "2017-02-07", "2017-03-07", "2017-04-07", 
                  "2017-05-07", "2017-06-07", "2017-08-07", "2017-09-07", "2017-10-07", 
                  "2017-11-07", "2017-12-07", "2017-01-08", "2017-02-08", "2017-03-08", 
                  "2017-04-08", "2017-05-08", "2017-06-08", "2017-07-08", "2017-09-08", 
                  "2017-10-08", "2017-11-08", "2017-12-08", "2017-01-09", "2017-02-09", 
                  "2017-03-09", "2017-04-09", "2017-05-09", "2017-06-09", "2017-07-09", 
                  "2017-08-09", "2017-10-09", "2017-11-09", "2017-12-09", "2017-01-10", 
                  "2017-02-10", "2017-03-10", "2017-04-10", "2017-05-10", "2017-06-10", 
                  "2017-07-10", "2017-08-10", "2017-09-10", "2017-11-10", "2017-12-10", 
                  "2017-01-11", "2017-02-11", "2017-03-11", "2017-04-11", "2017-05-11", 
                  "2017-06-11", "2017-07-11", "2017-08-11", "2017-09-11", "2017-10-11", 
                  "2017-12-11", "2017-01-12", "2017-02-12", "2017-03-12", "2017-04-12", 
                  "2017-05-12", "2017-06-12", "2017-07-12", "2017-08-12", "2017-09-12", 
                  "2017-10-12", "2017-11-12"),
      paste(year, day, month, sep = "-"),
      paste(year, month, day, sep = "-"),
    ),
    changed_date = ymd(changed_date)
    ) %>% 
  select(-year, -month, -day)

Now let’s take a look (Figure @ref(fig:sheltersdatebyrowadj)).

toronto_shelters %>% 
  mutate(counter = 1:nrow(toronto_shelters)) %>% 
  filter(year(date) == 2017) %>% 
  ggplot(aes(x = counter, y = changed_date)) +
  geom_point(alpha = 0.3) +
  labs(x = "Row in the dataset",
       y = "Date of that row") +
  theme_minimal()

Date of each row in order in 2017 after adjustment

We can see that’s almost entirely taken care of the systematic differences. However it’s probably a little blunt. For instance, notice there are now no entries below the diagonal (Figure @ref(fig:sheltersdatebyrowadj2017)).

toronto_shelters_adjusted <- 
  toronto_shelters %>% 
  # We only want rows with occupancy
  tidyr::drop_na(occupancy, capacity) %>% 
  # We want to know the occupancy by date and sector
  group_by(changed_date, sector) %>% 
  summarise(occupancy = sum(occupancy), .groups = 'drop') 

toronto_shelters_adjusted %>% 
  filter(year(changed_date) == 2017) %>% 
  ggplot(aes(x = day(changed_date), y = occupancy, color = sector)) + 
  geom_point(aes(group = sector), alpha = 0.3) +
  scale_y_continuous(limits = c(0, NA)) +
  labs(color = "Type",
       x = "Changed day",
       y = "Occupancy (number)") +
  facet_wrap(vars(month(changed_date, label = TRUE)),
             scales = "free_x") +
  theme_minimal() +
  scale_color_brewer(palette = "Set1")

Toronto shelters daily occupancy in 2017 after adjustment

We could keep going here to try to get to the bottom of it, but the baby is going to wake up soon and I’ve got a history of wasting too much time on these types of things. One of the great things about the City of Toronto Data Portal is that each dataset has a publisher and a contact email. I’ll email them and will update this when they get back to me.

Model

…With a laptop, some free software, and a cup of coffee, I can examine what ought to seem like a staggering amount of information. …I sit here at home, surveying the scope of what’s being inflicted on people across the country and around the world as this disease spreads. … … People sometimes think (or complain) that working with quantitative data like this inures you to the reality of the human lives that lie behind the numbers. Numbers and measures are crude; they pick up the wrong things; they strip out the meaning of what’s happening to real people; they make it easy to ignore what can’t be counted. There’s something to those complaints. But it’s mostly a lazy critique. In practice, I find that far from distancing you from questions of meaning, quantitative data forces you to confront them. The numbers draw you in. Working with data like this is an unending exercise in humility, a constant compulsion to think through what you can and cannot see, and a standing invitation to understand what the measures really capture—what they mean, and for whom. …”

Kieran Healy, 2020, The Kitchen Counter Observatory, 21 May. https://kieranhealy.org/blog/archives/2020/05/21/the-kitchen-counter-observatory/

Let’s start by looking at the effect of COVID on occupancy. On 17 March Ontario declared a state of emergency, so let’s focus on the time around then (Figure @ref(fig:focuson2020) includes a dashed line at that point).

toronto_shelters_adjusted %>% 
  filter(year(changed_date) == 2020) %>%
  ggplot(aes(x = changed_date, y = occupancy, color = sector)) + 
  geom_point(aes(group = sector), alpha = 0.3) +
  geom_vline(xintercept = ymd("2020-03-17"), linetype = 'dotted')+
  scale_y_continuous(limits = c(0, NA)) +
  labs(color = "Type",
       x = "Changed date",
       y = "Occupancy (number)") +
  theme_minimal() +
  scale_color_brewer(palette = "Set1")

Shelter usage in 2020

It’s clear that soon after the state of emergency was declared the usage of shelters began to decrease. Understandably the homeless, like all of us, want to socially distance to the extent possible and this likely means avoiding shelters. If you live in Toronto one thing that you’ll notice is that there are a lot of homeless people living in parks since COVID started. While this was fine in summer, the issue is that in winter it is well below 0C overnight.

We might like to see if we can estimate how many additional people are sleeping outside in January. We’ll do this by comparing the number that slept in the shelters in January 2021, compared with the number in the shelters in January 2020, which is before COVID hit Toronto. As Kieran Healy says, an exercise like this is confronting.

While it’s a stretch to say that if they’re not sleeping in a shelter then they must be sleeping outside, it’s also not clear where else they could be if they’re not in a shelter. It’s possible that they have found permanent housing and so are no longer using shelters, however I don’t know of any substantial change in public policy or resource allocation such that the whole difference is due to people finding permanent homes.

We’ll use the data that the city has so far released for 2021. The data provider warns us against using capacity, so we’ll focus on occupancy.

current_data <- opendatatoronto::get_resource("29852011-c9c2-4b6d-a79a-b718f5ed1ae2")

write_csv(current_data, "inputs/raw_data-January.csv")
current_data <- read_csv("inputs/raw_data-January.csv", 
                         col_types = c("icccicccicccc")) 

head(current_data)
# A tibble: 6 × 13
    `_id` OCCUPANCY_DATE SECTOR SHELTER_POSTAL_CODE CAPACITY
    <int> <chr>          <chr>  <chr>                  <int>
1 1914671 2021-01-01T05… Women  M5S 2P1                    8
2 1914672 2021-01-01T05… Famil… M5S 2P1                  174
3 1914673 2021-01-01T05… Famil… M5S 2P1                  140
4 1914674 2021-01-01T05… Famil… M5S 2P1                  360
5 1914675 2021-01-01T05… Co-ed  M5S 2P1                    8
6 1914676 2021-01-01T05… Women  M5S 2P1                    8
# ℹ 8 more variables: SHELTER_PROVINCE <chr>,
#   FACILITY_NAME <chr>, SHELTER_NAME <chr>,
#   OCCUPANCY <int>, ORGANIZATION_NAME <chr>,
#   SHELTER_ADDRESS <chr>, SHELTER_CITY <chr>,
#   PROGRAM_NAME <chr>

Just want to do the same basic clean up of the dates as before.

current_data <- 
  current_data %>% 
  janitor::clean_names() %>% # Make the column names easier to type. Thanks Sharla!
  mutate(occupancy_date = str_remove(occupancy_date, "T[:digit:]{2}:[:digit:]{2}:[:digit:]{2}"),
  ) %>% 
  mutate(date = ymd(occupancy_date, quiet = TRUE))
head(current_data)
# A tibble: 6 × 14
       id occupancy_date sector shelter_postal_code capacity
    <int> <chr>          <chr>  <chr>                  <int>
1 1914671 2021-01-01     Women  M5S 2P1                    8
2 1914672 2021-01-01     Famil… M5S 2P1                  174
3 1914673 2021-01-01     Famil… M5S 2P1                  140
4 1914674 2021-01-01     Famil… M5S 2P1                  360
5 1914675 2021-01-01     Co-ed  M5S 2P1                    8
6 1914676 2021-01-01     Women  M5S 2P1                    8
# ℹ 9 more variables: shelter_province <chr>,
#   facility_name <chr>, shelter_name <chr>,
#   occupancy <int>, organization_name <chr>,
#   shelter_address <chr>, shelter_city <chr>,
#   program_name <chr>, date <date>

For each day we want to know the number in each category. We’re then going to compare this to this time last year.

toronto_shelters_by_day_current <- 
  current_data %>% 
  # We only want rows with occupancy
  tidyr::drop_na(occupancy) %>% 
  # We want to know the occupancy by date and sector
  group_by(date, sector) %>% 
  summarise(occupancy = sum(occupancy), .groups = 'drop') %>% 
  mutate(month_day = paste(month(date), day(date), sep = "-")) %>% 
  rename(occupancy_2021 = occupancy)

head(toronto_shelters_by_day_current)
# A tibble: 6 × 4
  date       sector   occupancy_2021 month_day
  <date>     <chr>             <int> <chr>    
1 2021-01-01 Co-ed               515 1-1      
2 2021-01-01 Families           1128 1-1      
3 2021-01-01 Men                 986 1-1      
4 2021-01-01 Women               377 1-1      
5 2021-01-01 Youth               285 1-1      
6 2021-01-02 Co-ed               511 1-2      
toronto_shelters_by_day_2020 <- 
  toronto_shelters %>% 
  # We only want rows with occupancy
  tidyr::drop_na(occupancy, capacity) %>% 
  # We want to know the occupancy by date and sector
  group_by(changed_date, sector) %>% 
  summarise(occupancy = sum(occupancy), .groups = 'drop') %>% 
  filter(year(changed_date) == 2020) %>% 
  mutate(month_day = paste(month(changed_date), day(changed_date), sep = "-")) %>% 
  # Don't have to rename but it makes the join easier
  rename(date = changed_date, 
         occupancy_2020 = occupancy)

head(toronto_shelters_by_day_2020)
# A tibble: 6 × 4
  date       sector   occupancy_2020 month_day
  <date>     <chr>             <int> <chr>    
1 2020-01-01 Co-ed               744 1-1      
2 2020-01-01 Families           2670 1-1      
3 2020-01-01 Men                1996 1-1      
4 2020-01-01 Women               845 1-1      
5 2020-01-01 Youth               534 1-1      
6 2020-01-02 Co-ed               744 1-2      

Now we’re going to combine the datasets so that for each day, say ‘4 January’, we know that in 2020 shelter usage was X and on that day in 2021 we know that shelter usage was Y (Table @ref(tab:jandaily)).

toronto_shelters_by_day_current <- 
  toronto_shelters_by_day_current %>% 
  left_join(toronto_shelters_by_day_2020, by = c("month_day" = "month_day", "sector" = "sector")) %>% 
  rename(date = date.x) %>% 
  select(date, sector, occupancy_2020, occupancy_2021) %>% 
  mutate(difference = occupancy_2021 - occupancy_2020)


toronto_shelters_by_day_current %>% 
  rename(Date = date,
         Sector = sector,
         `Occupancy in 2020` = occupancy_2020,
         `Occupancy in 2021` = occupancy_2021,  
         Difference = difference) %>% 
  kableExtra::kbl(caption = "Comparison of shelter usage in January 2021 with January 2020 by day") %>%
  kableExtra::kable_styling()
Comparison of shelter usage in January 2021 with January 2020 by day
Date Sector Occupancy in 2020 Occupancy in 2021 Difference
2021-01-01 Co-ed 744 515 -229
2021-01-01 Families 2670 1128 -1542
2021-01-01 Men 1996 986 -1010
2021-01-01 Women 845 377 -468
2021-01-01 Youth 534 285 -249
2021-01-02 Co-ed 744 511 -233
2021-01-02 Families 2660 1122 -1538
2021-01-02 Men 1990 985 -1005
2021-01-02 Women 838 382 -456
2021-01-02 Youth 532 282 -250
2021-01-03 Co-ed 743 505 -238
2021-01-03 Families 2645 1125 -1520
2021-01-03 Men 1985 976 -1009
2021-01-03 Women 834 384 -450
2021-01-03 Youth 534 276 -258
2021-01-04 Co-ed 745 508 -237
2021-01-04 Families 2636 1138 -1498
2021-01-04 Men 1985 990 -995
2021-01-04 Women 848 387 -461
2021-01-04 Youth 535 276 -259
2021-01-05 Co-ed 744 510 -234
2021-01-05 Families 2621 1137 -1484
2021-01-05 Men 1991 984 -1007
2021-01-05 Women 844 383 -461
2021-01-05 Youth 529 271 -258
2021-01-06 Co-ed 742 510 -232
2021-01-06 Families 2607 1139 -1468
2021-01-06 Men 1991 987 -1004
2021-01-06 Women 845 388 -457
2021-01-06 Youth 534 267 -267
2021-01-07 Co-ed 742 509 -233
2021-01-07 Families 2611 1138 -1473
2021-01-07 Men 1986 979 -1007
2021-01-07 Women 848 389 -459
2021-01-07 Youth 535 268 -267
2021-01-08 Co-ed 742 509 -233
2021-01-08 Families 2618 1133 -1485
2021-01-08 Men 1998 979 -1019
2021-01-08 Women 848 386 -462
2021-01-08 Youth 536 269 -267
2021-01-09 Co-ed 743 506 -237
2021-01-09 Families 2638 1133 -1505
2021-01-09 Men 1999 979 -1020
2021-01-09 Women 840 386 -454
2021-01-09 Youth 533 260 -273
2021-01-10 Co-ed 745 500 -245
2021-01-10 Families 2627 1128 -1499
2021-01-10 Men 1999 971 -1028
2021-01-10 Women 838 383 -455
2021-01-10 Youth 535 263 -272
2021-01-11 Co-ed 741 502 -239
2021-01-11 Families 2616 1121 -1495
2021-01-11 Men 2003 971 -1032
2021-01-11 Women 835 383 -452
2021-01-11 Youth 538 254 -284
2021-01-12 Co-ed 740 503 -237
2021-01-12 Families 2634 1136 -1498
2021-01-12 Men 1996 965 -1031
2021-01-12 Women 844 383 -461
2021-01-12 Youth 536 262 -274
2021-01-13 Co-ed 741 500 -241
2021-01-13 Families 2629 1146 -1483
2021-01-13 Men 1993 956 -1037
2021-01-13 Women 842 378 -464
2021-01-13 Youth 532 260 -272
2021-01-14 Co-ed 739 498 -241
2021-01-14 Families 2635 1145 -1490
2021-01-14 Men 1994 915 -1079
2021-01-14 Women 842 372 -470
2021-01-14 Youth 537 265 -272
2021-01-15 Co-ed 741 506 -235
2021-01-15 Families 2655 1131 -1524
2021-01-15 Men 1983 920 -1063
2021-01-15 Women 847 371 -476
2021-01-15 Youth 539 266 -273
2021-01-16 Co-ed 744 511 -233
2021-01-16 Families 2654 1133 -1521
2021-01-16 Men 1988 918 -1070
2021-01-16 Women 845 370 -475
2021-01-16 Youth 533 264 -269
2021-01-17 Co-ed 742 513 -229
2021-01-17 Families 2649 1130 -1519
2021-01-17 Men 1992 923 -1069
2021-01-17 Women 844 366 -478
2021-01-17 Youth 532 264 -268
2021-01-18 Co-ed 743 507 -236
2021-01-18 Families 2639 1125 -1514
2021-01-18 Men 1999 923 -1076
2021-01-18 Women 844 371 -473
2021-01-18 Youth 535 264 -271
2021-01-19 Co-ed 744 504 -240
2021-01-19 Families 2631 1106 -1525
2021-01-19 Men 1995 931 -1064
2021-01-19 Women 849 379 -470
2021-01-19 Youth 536 263 -273
2021-01-20 Co-ed 742 503 -239
2021-01-20 Families 2629 1081 -1548
2021-01-20 Men 2001 941 -1060
2021-01-20 Women 840 371 -469
2021-01-20 Youth 536 263 -273
2021-01-21 Co-ed 745 501 -244
2021-01-21 Families 2647 1075 -1572
2021-01-21 Men 2007 941 -1066
2021-01-21 Women 839 367 -472
2021-01-21 Youth 534 263 -271
2021-01-22 Co-ed 744 505 -239
2021-01-22 Families 2648 1058 -1590
2021-01-22 Men 2014 949 -1065
2021-01-22 Women 843 367 -476
2021-01-22 Youth 531 267 -264
2021-01-23 Co-ed 740 508 -232
2021-01-23 Families 2653 1058 -1595
2021-01-23 Men 2008 959 -1049
2021-01-23 Women 842 362 -480
2021-01-23 Youth 530 273 -257

Let’s get some average statistics for January by sector (Table @ref(tab:janaverage)).

january_average <- 
  toronto_shelters_by_day_current %>% 
  group_by(sector) %>% 
  summarise(Difference = mean(difference)) %>% 
  mutate(Difference = as.integer(Difference))

january_average %>% 
  rename(Sector = sector,
         `Difference this January compared with last` = Difference) %>% 
  kableExtra::kbl(caption = "Overall comparison of shelter usage in January 2021 with January 2020") %>%
  kableExtra::kable_styling()
Overall comparison of shelter usage in January 2021 with January 2020
Sector Difference this January compared with last
Co-ed -236
Families -1516
Men -1037
Women -465
Youth -267

Let’s look at a comparison graph (Figure @ref(fig:finalgraph)).

toronto_shelters_by_day_current %>% 
  select(-difference) %>% 
  pivot_longer(cols = c("occupancy_2020", "occupancy_2021"),
               names_to = "year",
               values_to = "number"
               ) %>% 
  mutate(year = if_else(year == 'occupancy_2020', '2020', '2021')) %>% 
  ggplot(aes(x = date, y = number, color = year)) +
  geom_point() +
  facet_wrap(vars(sector)) +
  theme_minimal() +
  labs(x = "Date",
       y = "Occupancy in shelters (#)",
       color = "Year")  +
  scale_color_brewer(palette = "Set1")

Comparison of shelter usage in January 2021 with January 2020

I estimate that in January, on average, there are roughly 3,517 additional people sleeping outside in below freezing conditions this year compared with last year. That is to say, shelter usage is that much below what it was, and I’m not sure where else they could go. In particular, on Friday night, when it was horrendously cold, I estimate there were 3,634 additional people likely sleeping outside, by comparing the usage of shelters on 22 January 2020 with 22 January 2021.

Even if I’m off by an order of magnitude, the city and province clearly should do more.

Acknowledgments

Thank you to Monica Alexander for helpful comments.

References

Gelfand, Sharla. 2020. Opendatatoronto: Access the City of Toronto Open Data Portal. https://CRAN.R-project.org/package=opendatatoronto.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. https://www.jstatsoft.org/v40/i03/.
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.