Deprecated syllable_tokenize #322 #550

wannaphong · 2021-04-14T09:53:09Z

syllable_tokenize is deprecated, use subword_tokenize instead #322

What does this changes

Deprecated syllable_tokenize

Your checklist for this pull request

🚨Please review the guidelines for contributing to this repository.

Passed code styles and structures
Passed code linting checks and unit test

syllable_tokenize is deprecated, use subword_tokenize instead

wannaphong · 2021-04-14T09:54:35Z

Todo

move test set
edit docs

pep8speaks · 2021-04-14T09:56:08Z

Hello @wannaphong! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file tests/test_tokenize.py:

Line 304:80: E501 line too long (89 > 79 characters)
Line 306:80: E501 line too long (80 > 79 characters)

Comment last updated at 2021-04-22 17:32:08 UTC

coveralls · 2021-04-14T10:04:23Z

Coverage decreased (-0.02%) to 95.715% when pulling 9bf1842 on merge-syllable-subword into 449e9b0 on dev.

bact

Please remove "(default)" from "dict" in the docstring.

bact · 2021-04-19T16:41:54Z

pythainlp/tokenize/core.py

        * *tcc* (default) -  Thai Character Cluster (Theeramunkong et al. 2000)
        * *etcc* - Enhanced Thai Character Cluster (Inrut et al. 2001)
        * *wangchanberta* - SentencePiece from wangchanberta model.
+        * *dict* (default) - newmm word tokenizer with a syllable dictionary
+        * *ssg* - CRF syllable segmenter for Thai


"dict" is not a default subword tokenization engine.

Current default is "tcc",
according to DEFAULT_SUBWORD_TOKENIZE_ENGINE constant in
https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/tokenize/__init__.py

"dict (default) - newmm..." should be just "* dict - newmm..."

Deprecated syllable_tokenize #322

c742ded

syllable_tokenize is deprecated, use subword_tokenize instead

Update core.py

9d0453d

Update core.py

92cefd3

wannaphong requested a review from bact April 19, 2021 15:38

bact requested changes Apr 19, 2021

View reviewed changes

wannaphong added 2 commits April 23, 2021 00:29

Update core.py

2f39603

Update test_tokenize.py

9bf1842

wannaphong requested a review from bact April 22, 2021 17:32

bact approved these changes Apr 23, 2021

View reviewed changes

wannaphong added this to the 2.4 milestone Apr 23, 2021

wannaphong merged commit 036e985 into dev Apr 23, 2021

wannaphong deleted the merge-syllable-subword branch April 24, 2021 07:01

wannaphong mentioned this pull request Jul 18, 2021

PyThaiNLP 3.0 change log #545

Closed

wannaphong mentioned this pull request Aug 14, 2023

Add syllable_tokenize #834

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deprecated syllable_tokenize #322 #550

Deprecated syllable_tokenize #322 #550

Uh oh!

wannaphong commented Apr 14, 2021 •

edited

Loading

Uh oh!

wannaphong commented Apr 14, 2021 •

edited

Loading

Uh oh!

pep8speaks commented Apr 14, 2021 •

edited

Loading

Uh oh!

coveralls commented Apr 14, 2021 •

edited

Loading

Uh oh!

bact left a comment

Uh oh!

bact Apr 19, 2021

Uh oh!

Uh oh!

Deprecated syllable_tokenize #322 #550

Deprecated syllable_tokenize #322 #550

Uh oh!

Conversation

wannaphong commented Apr 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this changes

Your checklist for this pull request

Uh oh!

wannaphong commented Apr 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Apr 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2021-04-22 17:32:08 UTC

Uh oh!

coveralls commented Apr 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bact left a comment

Choose a reason for hiding this comment

Uh oh!

bact Apr 19, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wannaphong commented Apr 14, 2021 •

edited

Loading

wannaphong commented Apr 14, 2021 •

edited

Loading

pep8speaks commented Apr 14, 2021 •

edited

Loading

coveralls commented Apr 14, 2021 •

edited

Loading