-
Notifications
You must be signed in to change notification settings - Fork 286
Description
In PR #857 , pos_tag_transformers
was added which consist of 3 models, however, to call and engine, the full name of it must be specified, also the output still not the same format as another tagger. For example
pos_tag_transformers(words="แมวทำอะไรตอนห้าโมงเช้า", engine = "bert-base-th-cased-blackboard")
# outputs
# [{'entity_group': 'NN', 'score': 0.910759, 'word': 'แมวมา', 'start': 0, 'end': 5},
# {'entity_group': 'VV', 'score': 0.9462489, 'word': '##ทำ', 'start': 5, 'end': 7},
# {'entity_group': 'NN', 'score': 0.8325567, 'word': '##อะไรตอนห้าโมงเช้า', 'start': 7, 'end': 24}]
which is very hard for the normal user to remember its entire name (at least me to remember "bert-base-th-cased-blackboard" is impossible), and may result in more mess in the internal code if another transformers model trained on new corpus are added. we will end up with a lot of if-else condition in order to call a model in the future
According to that i've cleaned up the code to let a user call a model with parameters named engine
and corpus
same as what we have from the former function that is pos_tag
and pos_tag_sents
and also fix output format in PR #865. This will reduce how hard to remember the entire model name, and better experience for users. What do you think ? @wannaphong
Metadata
Metadata
Assignees
Labels
Type
Projects
Status