Normalise Lakkhangyao #673

chameleonTK · 2022-05-17T07:33:00Z

What does this change do?

Add a normalising rule for Lakkhangyao ๅ
According to this, it should only follow, ฤ and ฦ. Apart from that it should be normalised to า.

How this fixes it

Add the following regex in reorder_vowels function

("([^\u0e24\u0e26])\u0e45", "\\1\u0e32"),  # Lakkhangyao -> Sara Aa

Your checklist for this pull request

🚨Please review the guidelines for contributing to this repository.

[/] Passed code styles and structures
[/] Passed code linting checks and unit test

coveralls · 2022-05-17T08:32:23Z

Coverage remained the same at 97.147% when pulling 4e22fda on chameleonTK:feat/add-norm-rules into 14e9a4d on PyThaiNLP:dev.

wannaphong · 2022-05-18T05:16:47Z

Thank you for contributing!

See [3.1 Milestone](https://github.com/PyThaiNLP/pythainlp/milestone/16). ## What is new? ### Deprecation and other API changes #687 Remove deprecated function - pythainlp.word_vector; doesnt_match, get_model, most_similar_cosmul, sentence_vectorizer, similarity. use WordVector class instead - pythainlp.util.delete_tone. use pythainlp.util.remove_tonemark instead - Remove pythainlp.util.time_time. use pythainlp.util.time_to_thaiword instead - pythainlp.tokenize.syllable_tokenize. use pythainlp.tokenize.subword_tokenize instead ### Name Entity Tagging - #665 Add Thai-NNER `pythainlp.tag.NNER` - #658 Add LST20NER onnx model. It is LST20NER model to onnx model from fine-turning by [WangchanBERTa model](https://huggingface.co/airesearch/wangchanberta-base-att-spm-uncased). ### Transliteration - #659 Add ISO 11940 transliteration - #660 Add Thai W2P v0.2 - #686 Add wunsen - #694 Wunsen Mandarin and Japanese update ### PyThaiNLP Corpus downloader - #656 Add support zip/tar.gz to download corpus ### Text normalization - #673 Add a normalising rule for Lakkhangyao ๅ ### Translate - #674 add gpu option ### Text summarize - #679 Add mt5 cpe kmutt thai sentence sum ### Util - #682 Add live-dead syllable classification - #684 Add live dead syllable classify - #690 Add tone detector ### Other - #689 map NG tag to PART - #691 Remove TinyDB as a dependency - #692 Fix notifications that newer versions of corpora are available

chameleonTK added 3 commits May 17, 2022 08:13

normalise Lakkhangyao

4b50f61

add test

af95a77

update test case

4e22fda

wannaphong added this to the 3.1 milestone May 18, 2022

wannaphong merged commit 3d2cd79 into PyThaiNLP:dev May 18, 2022

wannaphong mentioned this pull request May 18, 2022

PyThaiNLP 3.1 change log #643

Closed

wannaphong mentioned this pull request Aug 31, 2022

Start PyThaiNLP v3.1.0-dev0 #693

Merged

wannaphong mentioned this pull request Sep 1, 2022

PyThaiNLP v3.1.0-dev1 #695

Merged

wannaphong mentioned this pull request Sep 24, 2022

PyThaiNLP v3.1.0 Released! #713

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Normalise Lakkhangyao #673

Normalise Lakkhangyao #673

Uh oh!

chameleonTK commented May 17, 2022

Uh oh!

coveralls commented May 17, 2022 •

edited

Loading

Uh oh!

wannaphong commented May 18, 2022

Uh oh!

Uh oh!

Normalise Lakkhangyao #673

Normalise Lakkhangyao #673

Uh oh!

Conversation

chameleonTK commented May 17, 2022

What does this change do?

How this fixes it

Your checklist for this pull request

Uh oh!

coveralls commented May 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wannaphong commented May 18, 2022

Uh oh!

Uh oh!

coveralls commented May 17, 2022 •

edited

Loading