-
Notifications
You must be signed in to change notification settings - Fork 284
Deprecated syllable_tokenize #322 #550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
syllable_tokenize is deprecated, use subword_tokenize instead
Todo
|
Hello @wannaphong! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-04-22 17:32:08 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove "(default)" from "dict" in the docstring.
* *tcc* (default) - Thai Character Cluster (Theeramunkong et al. 2000) | ||
* *etcc* - Enhanced Thai Character Cluster (Inrut et al. 2001) | ||
* *wangchanberta* - SentencePiece from wangchanberta model. | ||
* *dict* (default) - newmm word tokenizer with a syllable dictionary | ||
* *ssg* - CRF syllable segmenter for Thai |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"dict" is not a default subword tokenization engine.
Current default is "tcc",
according to DEFAULT_SUBWORD_TOKENIZE_ENGINE constant in
https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/tokenize/__init__.py
"dict (default) - newmm..." should be just "* dict - newmm..."
syllable_tokenize is deprecated, use subword_tokenize instead #322
What does this changes
Deprecated syllable_tokenize
Your checklist for this pull request
🚨Please review the guidelines for contributing to this repository.