Platform based on Flask + Dash to perform statistical tests, preprocessing and visualization of text data.

- token frequency + preprocessing (stopwords filtering, ngrams, stemming)
- word vectors generation (PCA or TSNE for dimensionality reduction)
- Unsupervised vowel classification
- Chi^2 test on corpora

Natural Language Decorators

Small python module to create text preprocessing pipelines using decorators.


Emacs package to perform basic tokenization and n-gram statistics

Stats your lyrics

See how your song’s lyrics are classified by a SKLearn machine learning model, how repetitive they are compared to 3k other songs, what songs are most similar to it, and what their word vectors look like.


curl -X POST -H "Content-Type: text/plain" —data "this is a test"

Dante dashboard

A dashboard with basic corpus stats from Dante’s Divina Commedia.