Some helpful functions for preprocessing text data

While analyzing text data can be a lot fun, preprocessing text data is generally not. It can also be extremely difficult, especially when you’re just getting into computational text analysis or the R programming language. Enter: library(compositr). Although this library is very much a work in progress, I have chosen to release it in it current form.

Compositr is a one stop shop for all (ok, most) of your text preprocessing needs. In a single, painless interactive session, compositr will clean your data up for analysis, and can even convert your data into various formats, such a DocumentFeatureMatrix. Compositr will also provide you with a summary print out of what you did to your text data (in the console, but also as a .txt file saved locally if you wish).

The library works by integrating a number of incredible R libraries behind the scenes, including tidytext, texclean, and tm, among others.

People who are new to computational text analysis or the R programming language may find compositr especially useful. Compositr may also serve as a useful pedagogical tool.

You can find the code, installation instructions, and a brief tutorial on my github.

Alex Luscombe
Alex Luscombe
PhD Candidate in Criminology