Skip to content

andreplima/pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pipeline

An example on how to preprocess textual content using NLTK with Python or PySpark. Code employed in the following technical report, in which different tools and resources to preprocess textual content are compared:

Diaz, A. K. R., de Lima, A. P., Silva, A. M., da Silva Costa, F. H., Pagnossim, J. L. M., & Peres, S. M. (2018). Relatorio Técnico PPgSI-001/2018 Uma análise comparativa das ferramentas de pré-processamento de dados textuais: NLTK, PreTexT e R. (available here: https://tinyurl.com/y2tt7j2o)

About

Textual content preprocessing pipeline developed in Python and NLTK over Apache Spark.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages