Skip to content

Conversation

wannaphong
Copy link
Member

@wannaphong wannaphong commented Sep 16, 2020

Clause tokenize model trained from LST20 Corpus with CRF model.
LST20 Corpus from National Electronics and Computer Technology Center, Thailand. You can download dataset from https://aiforthai.in.th/corpus.php.

Model train by Mr.Wannaphong Phatthiyaphaibun

Model License : CC-0

Code

TODO

  • document
  • test

@pep8speaks
Copy link

pep8speaks commented Sep 16, 2020

Hello @wannaphong! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-10-03 09:47:32 UTC

@coveralls
Copy link

coveralls commented Sep 16, 2020

Coverage Status

Coverage increased (+0.04%) to 95.47% when pulling b8ab03e on add-clause into 261f032 on dev.

@wannaphong
Copy link
Member Author

wannaphong commented Sep 16, 2020

Model Card - LST20 CLS

Model Details

  • Developer : Wannaphong Phatthiyaphaibun
  • Model date : 2020-10-03
  • Model version : 0.2
  • CRF Model
  • License : CC0

Intended Use

  • Segmenting Thai text into clauses (smaller than sentence, bigger than word)
  • Not suitable for other language or non-news domain.

Factors

  • Based on known problems with thai natural Language processing.

Metrics

  • Evaluation metrics include precision, recall and f1-score.

Training Data
LST20 Corpus Train set (news domain)

Evaluation Data
LST20 Corpus Test set (news domain)

Quantitative Analyses

              precision    recall  f1-score   support

       B_CLS       0.90      0.94      0.92     16111
       E_CLS       0.90      0.94      0.92     15947
       I_CLS       0.99      0.97      0.98    169565

   micro avg       0.97      0.97      0.97    201623
   macro avg       0.93      0.95      0.94    201623
weighted avg       0.97      0.97      0.97    201623
 samples avg       0.94      0.94      0.94    201623

Ethical Considerations
no ideas

Caveats and Recommendations

  • You should segmente words before use.
  • Thai text only

@wannaphong wannaphong requested a review from bact September 17, 2020 10:06
@bact bact added the enhancement enhance functionalities label Sep 17, 2020
@bact bact added this to the 2.3 milestone Sep 17, 2020
Copy link
Member

@bact bact left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to make the feature names more consistent.

@wannaphong wannaphong requested a review from bact October 3, 2020 09:51
@wannaphong wannaphong merged commit c932d3f into dev Oct 7, 2020
@bact bact added the hacktoberfest-accepted hacktoberfest accepted pull requests. label Oct 17, 2020
@bact bact deleted the add-clause branch October 17, 2020 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement enhance functionalities hacktoberfest-accepted hacktoberfest accepted pull requests.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants