-
Notifications
You must be signed in to change notification settings - Fork 285
Closed
Labels
documentationimprove documentation and test casesimprove documentation and test cases
Milestone
Description
Schedule
- First development release: 1 August 2021
- Beta release: 20 January 2022
- Production release: 29 January 2022
Docs: https://pythainlp.github.io/dev-docs/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
GitHub: https://github.com/PyThaiNLP/pythainlp
News
Since PyThaiNLP 3.0, We will end support PyThaiNLP on Python 3.6. Python 3.6 users can use PyThaiNLP 2.3.1
We have updated the dict & rule for newmm. If you use newmm for word tokenization in your model, we recommend you retrain your model.
What is new?
Deprecation and other API changes
- Deprecated syllable_tokenize #322 #550 Deprecated syllable_tokenize.
syllable_tokenize
is deprecated, usesubword_tokenize
instead - 701fb3a
pythainlp.tag.named_entity.ThaiNameTagger
is change topythainlp.tag.thainer.ThaiNameTagger
. This old class will be deprecated in PyThaiNLP version 2.5.
Augment
- Add pythainlp.augment #580 Add Thai Text Augmentation
Corpus
- Misspellings and errors in dictionary for word tokenization #557 Fix lots of misspellings in dictionary (words_th.txt)
- Add get_corpus_default_db and thainer 1.5 model #576 Add get_corpus_default_db and thainer 1.5 model. Now, You can add corpus on
default_db.json
and you dont load last thainer model from Internet.
Tag
- Add tltk #599 Add tltk (pos_tag and ner) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- Add NER class #600 Add NER class -
NER
class for Named-entity recognizer tasks.
Translate
- Add Translate class #589 Add
pythainlp.translate.Translate
Class - Add Chinese-Thai Machine Translation #588 Add Chinese-Thai Machine Translation
- Add Thai-French Machine Translation #635 Add Thai-French Machine Translation
Tokenization
- Tokenize repeating dots and commas from numbers (fix #461) #562 Tokenize repeating dots and commas from numbers
- Fix token_max_len bug that makes it always zero #585 Fix token_max_len bug that makes it always zero
- Tokenize repeating dots and commas from numbers (fix #461) #562 Tokenize repeating dots and commas from numbers (fix Problem with syllable tokenization #461)
- Update sentenceseg_crfcut.model #594 Retrained sentenceseg_crfcut.model for PyThaiNLP 2.4
- 3144110 Add SEFR CUT to pythainlp
- Add tltk #599 Add tltk (sentence_tokenize and word_tokenize) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- Add nlpo3 #622 Add nlpo3
Transliterate
- Refactor Royin Transliterate: Avoid embedded if blocks and simplified consonant replacing operations #566 Refactor Royin Transliterate: Avoid embedded if blocks and simplified consonant replacing operations
- Fix token_max_len bug that makes it always zero #585 Manually merge update-royin branch with dev branch to add O-ANG rule
- Add tltk #599 Add tltk (g2p and ipa) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- Add pythainlp.transliterate.puan #620 #624 Add pythainlp.transliterate.puan
Word Vector
- Use get_vector() instead of deprecated word_vec() #573 Fix token_max_len bug that makes it always zero
- Add pythainlp.word_vector.WordVector #583 Add
pythainlp.word_vector.WordVector
Spell
- Add correct engine #591 Add more spelling engine
- Add tltk #599 Add tltk (spell) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
Generate
- Add pythainlp.generate #579 Add pythainlp.generate
Tool
- Add misspell module #614 Add misspell module
Other
- Add tltk #599 Add tltk - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- e357cf8 Update requirements from ssg 0.0.6 to ssg 0.0.8
- Spoonerism: Add supports words more 3 syllables Spoonerism: Add supports words more 3 syllables #631
- Add maiyamok Add maiyamok #623 This function is preprocessing MaiYaMok in Thai sentence.
Metadata
Metadata
Assignees
Labels
documentationimprove documentation and test casesimprove documentation and test cases