Skip to content

Should we merge syllable_tokenize() and subword_tokenize() together? #322

@bact

Description

@bact

They are both subword level.

Currently we have 2 level of analysis and 4 kinds of implementations ("engines"):

  • TCC-based
    • Original TCC
    • Extended TCC
  • Syllable
    • Using syllable dictionary
    • Using CRF (ssg)

Should we call them all "subword"?

Note: currently "ssg" is available as an option in both syllable_tokenize() and subword_tokenize().

Metadata

Metadata

Assignees

No one assigned

    Labels

    refactoringa technical improvement which does not add any new features or change existing features.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions