Parsing your .pdfs in R

In my last blog post, we discussed how to read .pdf files into RStudio. Using pdftools, we were able to read in .pdfs that were both machine-ready and not.

Getting your .pdfs into R

Doing quantitative text analysis often means working with documents in .pdf format, and these documents may or may not be in a machine-readable format. Assuming we are using RStudio, how do we read these files into our environment so that we can clean, process, and analyze them?

Interactive maps and tables in R

Code and tutorial prepared for the Toronto Data Workshop session on July 30, 2020. You can download the corresponding slide deck for this workshop here. Since launching the Policing the Pandemic Mapping Project with Alexander McClelland, a lot of people have asked us how we built the interactive map and database.

Neither confirm nor deny

I was recently listening to a Radiolab podcast on the history of the phrase "can neither confirm nor deny", formally known as the "Glomar Response". If you have not yet heard this episode, I highly recommend it.

A Gentle Introduction to Tesseract OCR

It is a perennial problem in Canada that municipal, provincial, and federal government agencies disclose records under Access to Information (ATI)/Freedom of Information (FOI) law in non-machine readable (image) formats by default.