Projects
NLPWorkspace
Platform based on Flask + Dash to perform statistical tests, preprocessing and visualization of text data.
- token frequency + preprocessing (stopwords filtering, ngrams, stemming)
- word vectors generation (PCA or TSNE for dimensionality reduction)
- Unsupervised vowel classification
- Chi^2 test on corpora
Natural Language Decorators
Small python module to create text preprocessing pipelines using decorators.
Linguistic-mode
Emacs package to perform basic tokenization and n-gram statistics
Stats your lyrics
See how your song’s lyrics are classified by a SKLearn machine learning model, how repetitive they are compared to 3k other songs, what songs are most similar to it, and what their word vectors look like.
Yodinator
curl -X POST -H "Content-Type: text/plain" —data "this is a test" https://yodinator-j7bugp5hfa-ew.a.run.app/yodathis
Dante dashboard
A dashboard with basic corpus stats from Dante’s Divina Commedia.