An R package for the Quantitative Analysis of Textual Data
knitr::opts_chunk$set(
warning = FALSE,
collapse = TRUE,
comment = "##",
fig.path = "man/images/"
)
library("badger")
r badge_devel("quanteda/quanteda", "royalblue")
quanteda is an R package for managing and analyzing text, created and maintained by Kenneth Benoit and Kohei Watanabe. Its creation was funded by the European Research Council grant ERC-2011-StG 283794-QUANTESS and its continued development is supported by the Quanteda Initiative CIC.
For more details, see https://quanteda.io.
The quanteda 4.0 is a major release that improves functionality and performance and further improves function consistency by removing previously deprecated functions. It also includes significant new tokeniser rules that make the default tokeniser smarter than ever, with new Unicode and ICU-compliant rules enabling it to work more consistently with even more languages.
We describe more fully these significant changes in:
We completed the trend of splitting quanteda into modular packages with the release of v3. The quanteda family of packages includes the following:
textmodel_*()
functions. This was split from the main package with the v2 releasetextstat_*()
functions, split with the v3 releasetextplot_*()
functions, split with the v3 releaseWe are working on additional package releases, available in the meantime from our GitHub pages:
and more to come.
The normal way from CRAN, using your R GUI or
install.packages("quanteda")
(New for quanteda v4.0) For Linux users: Because all installations on Linux are compiled, Linux users will first need to install the Intel oneAPI Threading Building Blocks for parallel computing for installation to work.
To install TBB on Linux:
# Fedora, CentOS, RHEL
sudo yum install tbb-devel
# Debian and Ubuntu
sudo apt install libtbb-dev
Because this compiles some C++ and Fortran source code, you will need to have installed the appropriate compilers to build the development version.
You will also need to install TBB:
macOS:
First, you will need to install XCode command line tools.
xcode-select --install
Then install the TBB libraries and the pkg-config utility: (after installing Homebrew):
brew install tbb pkg-config
Finally, you will need to install gfortran.
Windows:
Install RTools, which includes the TBB libraries.
quanteda takes advantage of parallel computing through the TBB (Threading Building Blocks) library to speed up computations. This guide provides step-by-step instructions on how to set up your system for using Quanteda with parallel capabilities on Windows, macOS, and Linux.
Windows:
Download and install RTools from RTools download page.
macOS:
Install XCode Command Line Tools
xcode-select --install
Install Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Install TBB and pkg-config
brew install tbb pkg-config
Install gfortran
brew install gcc
Linux:
Install TBB:
sudo yum install tbb-devel
sudo apt install libtbb-dev
More details are provided in the quanteda documentation.
See the quick start guide to learn how to use quanteda.
Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. (2018) “quanteda: An R package for the quantitative analysis of textual data”. Journal of Open Source Software 3(30), 774. https://doi.org/10.21105/joss.00774.
For a BibTeX entry, use the output from citation(package = "quanteda")
.
If you like quanteda, please consider leaving feedback or a testimonial here.
Contributions in the form of feedback, comments, code, and bug reports are most welcome. How to contribute: