Skip to content

enhance pos tagging with transformers function #866

@pavaris-pm

Description

@pavaris-pm

In PR #857 , pos_tag_transformers was added which consist of 3 models, however, to call and engine, the full name of it must be specified, also the output still not the same format as another tagger. For example

pos_tag_transformers(words="แมวทำอะไรตอนห้าโมงเช้า", engine = "bert-base-th-cased-blackboard")
# outputs
# [{'entity_group': 'NN', 'score': 0.910759, 'word': 'แมวมา', 'start': 0, 'end': 5},
#  {'entity_group': 'VV', 'score': 0.9462489, 'word': '##ทำ', 'start': 5,  'end': 7},
# {'entity_group': 'NN', 'score': 0.8325567, 'word': '##อะไรตอนห้าโมงเช้า',  'start': 7, 'end': 24}]

which is very hard for the normal user to remember its entire name (at least me to remember "bert-base-th-cased-blackboard" is impossible), and may result in more mess in the internal code if another transformers model trained on new corpus are added. we will end up with a lot of if-else condition in order to call a model in the future

According to that i've cleaned up the code to let a user call a model with parameters named engine and corpus same as what we have from the former function that is pos_tag and pos_tag_sents and also fix output format in PR #865. This will reduce how hard to remember the entire model name, and better experience for users. What do you think ? @wannaphong

Metadata

Metadata

Assignees

No one assigned

    Labels

    refactoringa technical improvement which does not add any new features or change existing features.

    Type

    No type

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions