A new R library for preprocessing text data
While analyzing text data can be a lot fun, preprocessing text data is generally not. It can also be extremely difficult, especially when you're just getting into computational text analysis or the R programming language. Enter: library(compositr)
. Although this library is very much a work in progress, I have chosen to release it in it current form.
Compositr is a one stop shop for all (ok, most) of your text preprocessing needs. In a single, painless interactive session, compositr
will clean your data up for analysis, and can even convert your data into various formats, such a DocumentFeatureMatrix. Compositr will also provide you with a summary print out of what you did to your text data (in the console, but also as a .txt file saved locally if you wish).
The library works by integrating a number of incredible R libraries behind the scenes, including tidytext
, texclean
, and tm
, among others.
People who are new to computational text analysis or the R programming language may find compositr especially useful. Compositr may also serve as a useful pedagogical tool.
You can find the code, installation instructions, and a brief tutorial on my github.