Skip to content

Conversation

wannaphong
Copy link
Member

@wannaphong wannaphong commented Oct 31, 2021

What does this changes

MaiYaMok (ๆ) is the mark of duplicate word in Thai language. This function is preprocessing MaiYaMok in Thai sentence.

from pythainlp.util import maiyamok

maiyamok("เด็กๆชอบไปโรงเรียน")
# output: ['เด็ก', 'เด็ก', 'ชอบ', 'ไป', 'โรงเรียน']

maiyamok(["ทำไม","คน","ดี"," ","ๆ","ๆ"," ","ถึง","ทำ","ไม่ได้"])
# output: ['ทำไม', 'คน', 'ดี', 'ดี', 'ดี', ' ', 'ถึง', 'ทำ', 'ไม่ได้']

original code from https://web.facebook.com/groups/thainlp/posts/663608874020606/

Your checklist for this pull request

🚨Please review the guidelines for contributing to this repository.

  • Passed code styles and structures
  • Passed code linting checks and unit test

@pep8speaks
Copy link

pep8speaks commented Oct 31, 2021

Hello @wannaphong! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-10-31 11:00:52 UTC

@wannaphong wannaphong requested a review from bact October 31, 2021 05:20
@wannaphong
Copy link
Member Author

@coveralls
Copy link

coveralls commented Oct 31, 2021

Coverage Status

Coverage increased (+0.08%) to 97.756% when pulling ead5b8b on add-clean-maiyamok into 6135ba5 on dev.

@wannaphong wannaphong added this to the 3.0 milestone Nov 4, 2021
@wannaphong wannaphong merged commit 40af25d into dev Nov 5, 2021
@bact
Copy link
Member

bact commented Nov 6, 2021

So we stick with the name maiyamok()?

@wannaphong
Copy link
Member Author

maiyamok

I think that this name is clear. from https://en.wiktionary.org/wiki/%E0%B9%86, It's "THAI CHARACTER MAIYAMOK". or Should change it to clean_maiyamok?

@bact bact deleted the add-clean-maiyamok branch November 8, 2021 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants