Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
9f82f32
sort imports
bact Oct 31, 2024
a73d88b
move imports of thai_word_tone_detector inside the fucntion
bact Oct 31, 2024
885a8d5
specify test suite
bact Oct 31, 2024
0e88cf6
fix import order
bact Oct 31, 2024
332257b
sort imports
bact Oct 31, 2024
4a8e0e7
fix circular import
bact Oct 31, 2024
695599c
Add tokenize to test
bact Oct 31, 2024
36222af
Update __init__.py
bact Oct 31, 2024
863cccd
Update __init__.py
bact Oct 31, 2024
0d3540c
move extra tests to testx
bact Oct 31, 2024
becced0
Move extras to testx
bact Oct 31, 2024
0513cea
Update test suites
bact Oct 31, 2024
d307861
Update __init__.py
bact Oct 31, 2024
32150cc
Update __init__.py
bact Oct 31, 2024
754c84f
Update __init__.py
bact Oct 31, 2024
4b05b7f
Update __init__.py
bact Oct 31, 2024
febb228
Split test_util and testx_util
bact Oct 31, 2024
8f9cc11
move more to testx_util
bact Oct 31, 2024
e944d9c
Update __init__.py
bact Oct 31, 2024
6c08df1
Zero test suite
bact Nov 1, 2024
c8385dc
Test*Package -> *TestCase
bact Nov 1, 2024
3381ce5
consolidate constants
bact Nov 1, 2024
d0c6f93
Split test_corpus, testx_corpus
bact Nov 1, 2024
4e3ac5a
Merge branch 'PyThaiNLP:dev' into add-test-suites
bact Nov 1, 2024
bef1942
Update __init__.py
bact Nov 1, 2024
fdb7665
Merge branch 'add-test-suites' of https://github.com/bact/pythainlp i…
bact Nov 1, 2024
d68a1cd
Update __init__.py
bact Nov 1, 2024
bfa2c02
Update __init__.py
bact Nov 1, 2024
8f2551b
Update __init__.py
bact Nov 1, 2024
e2d4b63
Update __init__.py
bact Nov 1, 2024
205f05f
Update __init__.py
bact Nov 1, 2024
4dca3a0
use unittest tests
bact Nov 1, 2024
8b9a091
Add more tests
bact Nov 1, 2024
08febf8
Split test_soundex, testx_soundex
bact Nov 1, 2024
12e535c
Add conditions in workflow
bact Nov 1, 2024
44c89b0
Update unittest.yml
bact Nov 1, 2024
da8bf6a
Update unittest.yml
bact Nov 1, 2024
52a853a
Use powershell
bact Nov 1, 2024
06766e5
Update unittest.yml
bact Nov 1, 2024
2eb0d4f
fix wheel url
bact Nov 1, 2024
961a18d
Update unittest.yml
bact Nov 1, 2024
331eb2b
Test on 3.9-3.13
bact Nov 1, 2024
441b817
Update unittest.yml
bact Nov 1, 2024
6e04060
Update unittest.yml
bact Nov 1, 2024
f5ece2d
Update unittest.yml
bact Nov 1, 2024
6d8fe2f
Update unittest.yml
bact Nov 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 40 additions & 20 deletions .github/workflows/unittest.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Unit test and code coverage
name: Unit test and coverage

on:
push:
Expand All @@ -18,9 +18,14 @@ jobs:
fail-fast: false
matrix:
os: ["macos-latest", "ubuntu-latest", "windows-latest"]
python-version: ["3.10"]
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]

runs-on: ${{ matrix.os }}
env:
PYICU_WIN_VER: 2.14
INSTALL_PYICU_WIN: false
INSTALL_TORCH: false
INSTALL_FULL_DEPS: false

steps:
- name: Checkout
Expand All @@ -32,8 +37,11 @@ jobs:
cache: "pip"
- name: Install build tools
run: |
python -m pip install --upgrade "pip<24.1" "setuptools==73.0.1"
python -m pip install coverage coveralls
pip install --upgrade "pip<24.1" "setuptools>=65.0.2,<=73.0.1"
pip install coverage coveralls
# pip<24.1 because https://github.com/omry/omegaconf/pull/1195
# setuptools>=65.0.2 because https://github.com/pypa/setuptools/commit/d03da04e024ad4289342077eef6de40013630a44#diff-9ea6e1e3dde6d4a7e08c7c88eceed69ca745d0d2c779f8f85219b22266efff7fR1
# setuptools<=73.0.1 because https://github.com/pypa/setuptools/issues/4620
- name: Install ICU (macOS)
if: startsWith(matrix.os, 'macos-')
run: |
Expand All @@ -43,26 +51,38 @@ jobs:
ICU_VER=$(pkg-config --modversion icu-i18n)
echo "ICU_VER=${ICU_VER}"
echo "ICU_VER=${ICU_VER}" >> "${GITHUB_ENV}"
- name: Install ICU (Windows)
if: startsWith(matrix.os, 'windows-')
- name: Install PyICU (Windows)
if: startsWith(matrix.os, 'windows-') && env.INSTALL_PYICU_WIN == 'true'
shell: powershell
run: |
python -m pip install "https://github.com/cgohlke/pyicu-build/releases/download/v2.14/PyICU-2.14-cp310-cp310-win_amd64.whl"
# if needed, get pip wheel link from https://github.com/cgohlke/pyicu-build/releases
$PYTHON_WIN_VER = "${{ matrix.python-version }}"
$CP_VER = "cp" + $PYTHON_WIN_VER.Replace(".", "")
$WHEEL_URL = "https://github.com/cgohlke/pyicu-build/releases/download/v${{ env.PYICU_WIN_VER }}/PyICU-${{ env.PYICU_WIN_VER }}-${CP_VER}-${CP_VER}-win_amd64.whl"
pip install "$WHEEL_URL"
# Get wheel URL from https://github.com/cgohlke/pyicu-build/releases
- name: Install PyTorch
if: env.INSTALL_TORCH == 'true'
run: pip install torch
# if needed, get pip wheel link from http://download.pytorch.org/whl/torch/
# - name: Install dependencies
# env:
# SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL: True
# run: |
# python -m pip install -r docker_requirements.txt
# If torch for the platform is not available in PyPI, use this command:
# pip install "<torch_wheel_url>"
# Get wheel URL from http://download.pytorch.org/whl/torch/
- name: Install dependencies
if: env.INSTALL_FULL_DEPS == 'true'
env:
SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL: True
run: pip install -r docker_requirements.txt
- name: Install PyThaiNLP
run: |
python -m pip install .
- name: Test
run: pip install .
# Use the command below, if you want to install a small set of external
# packages, which includes numpy, pyicu, python-crfsuite, and requests:
# pip install .[compact]
- name: Unit test and code coverage
run: coverage run -m unittest tests
# Use 'unittest tests' instead of 'unittest discover' to avoid loading
# tests with external imports.
# Test cases loaded is defined in __init__.py in the tests directory.
- name: Coverage report
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
COVERALLS_SERVICE_NAME: github
run: |
coveralls
# coverage run -m unittest discover
run: coveralls
5 changes: 2 additions & 3 deletions pythainlp/ancient/aksonhan.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
# -*- coding: utf-8 -*-
# SPDX-FileCopyrightText: 2016-2024 PyThaiNLP Project
# SPDX-License-Identifier: Apache-2.0
from pythainlp.util import Trie
from pythainlp import thai_consonants, thai_tonemarks
from pythainlp.tokenize import Tokenizer
from pythainlp.corpus import thai_orst_words

from pythainlp.tokenize import Tokenizer
from pythainlp.util import Trie

_dict_aksonhan = {}
for i in list(thai_consonants):
Expand Down
2 changes: 2 additions & 0 deletions pythainlp/augment/lm/fasttext.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@
# SPDX-License-Identifier: Apache-2.0
import itertools
from typing import List, Tuple

from gensim.models.fasttext import FastText as FastText_gensim
from gensim.models.keyedvectors import KeyedVectors

from pythainlp.tokenize import word_tokenize


Expand Down
5 changes: 2 additions & 3 deletions pythainlp/augment/lm/phayathaibert.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,20 @@
# SPDX-FileCopyrightText: 2016-2024 PyThaiNLP Project
# SPDX-License-Identifier: Apache-2.0

from typing import List
import random
import re
from typing import List

from pythainlp.phayathaibert.core import ThaiTextProcessor


_MODEL_NAME = "clicknext/phayathaibert"


class ThaiTextAugmenter:
def __init__(self) -> None:
from transformers import (
AutoTokenizer,
AutoModelForMaskedLM,
AutoTokenizer,
pipeline,
)

Expand Down
2 changes: 1 addition & 1 deletion pythainlp/augment/word2vec/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,5 @@
__all__ = ["Word2VecAug", "Thai2fitAug", "LTW2VAug"]

from pythainlp.augment.word2vec.core import Word2VecAug
from pythainlp.augment.word2vec.thai2fit import Thai2fitAug
from pythainlp.augment.word2vec.ltw2v import LTW2VAug
from pythainlp.augment.word2vec.thai2fit import Thai2fitAug
1 change: 1 addition & 0 deletions pythainlp/augment/word2vec/bpemb_wv.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# SPDX-FileCopyrightText: 2016-2024 PyThaiNLP Project
# SPDX-License-Identifier: Apache-2.0
from typing import List, Tuple

from pythainlp.augment.word2vec.core import Word2VecAug


Expand Down
2 changes: 1 addition & 1 deletion pythainlp/augment/word2vec/core.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# -*- coding: utf-8 -*-
# SPDX-FileCopyrightText: 2016-2024 PyThaiNLP Project
# SPDX-License-Identifier: Apache-2.0
from typing import List, Tuple
import itertools
from typing import List, Tuple


class Word2VecAug:
Expand Down
1 change: 1 addition & 0 deletions pythainlp/augment/word2vec/ltw2v.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# SPDX-FileCopyrightText: 2016-2024 PyThaiNLP Project
# SPDX-License-Identifier: Apache-2.0
from typing import List, Tuple

from pythainlp.augment.word2vec.core import Word2VecAug
from pythainlp.corpus import get_corpus_path
from pythainlp.tokenize import word_tokenize
Expand Down
1 change: 1 addition & 0 deletions pythainlp/augment/word2vec/thai2fit.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# SPDX-FileCopyrightText: 2016-2024 PyThaiNLP Project
# SPDX-License-Identifier: Apache-2.0
from typing import List, Tuple

from pythainlp.augment.word2vec.core import Word2VecAug
from pythainlp.corpus import get_corpus_path
from pythainlp.tokenize import THAI2FIT_TOKENIZER
Expand Down
6 changes: 3 additions & 3 deletions pythainlp/augment/wordnet.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@
"postype2wordnet",
]

from collections import OrderedDict
import itertools
from collections import OrderedDict
from typing import List

from nltk.corpus import wordnet as wn

from pythainlp.corpus import wordnet
from pythainlp.tokenize import word_tokenize
from pythainlp.tag import pos_tag

from pythainlp.tokenize import word_tokenize

orchid = {
"": "",
Expand Down
3 changes: 2 additions & 1 deletion pythainlp/classify/param_free.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@
# SPDX-License-Identifier: Apache-2.0

import gzip
import json
from typing import List, Tuple

import numpy as np
import json


class GzipModel:
Expand Down
3 changes: 1 addition & 2 deletions pythainlp/cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,10 @@
# SPDX-FileCopyrightText: 2016-2024 PyThaiNLP Project
# SPDX-License-Identifier: Apache-2.0
"""Command line helpers."""

import sys
from argparse import ArgumentParser

from pythainlp.cli import data, soundex, tag, tokenize, benchmark

# a command should start with a verb when possible
COMMANDS = sorted(["data", "soundex", "tag", "tokenize", "benchmark"])

Expand Down
1 change: 1 addition & 0 deletions pythainlp/cli/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import os

import yaml

from pythainlp import cli
from pythainlp.benchmarks import word_tokenization

Expand Down
1 change: 1 addition & 0 deletions pythainlp/coref/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@
PyThaiNLP Coreference Resolution
"""
__all__ = ["coreference_resolution"]

from pythainlp.coref.core import coreference_resolution
1 change: 1 addition & 0 deletions pythainlp/coref/_fastcoref.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# SPDX-FileCopyrightText: 2016-2024 PyThaiNLP Project
# SPDX-License-Identifier: Apache-2.0
from typing import List

import spacy


Expand Down
12 changes: 6 additions & 6 deletions pythainlp/coref/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# SPDX-License-Identifier: Apache-2.0
from typing import List

model = None
_MODEL = None


def coreference_resolution(
Expand Down Expand Up @@ -40,17 +40,17 @@ def coreference_resolution(
# 'clusters': [[(0, 10), (50, 52)]]}
# ]
"""
global model
global _MODEL
if isinstance(texts, str):
texts = [texts]

if model is None and model_name == "han-coref-v1.0":
if _MODEL is None and model_name == "han-coref-v1.0":
from pythainlp.coref.han_coref import HanCoref

model = HanCoref(device=device)
_MODEL = HanCoref(device=device)

if model:
return model.predict(texts)
if _MODEL:
return _MODEL.predict(texts)

return [
{"text": text, "clusters_string": [], "clusters": []} for text in texts
Expand Down
1 change: 1 addition & 0 deletions pythainlp/coref/han_coref.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# SPDX-FileCopyrightText: 2016-2024 PyThaiNLP Project
# SPDX-License-Identifier: Apache-2.0
import spacy

from pythainlp.coref._fastcoref import FastCoref


Expand Down
2 changes: 1 addition & 1 deletion pythainlp/corpus/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@
"thai_wsd_dict",
]

from typing import FrozenSet, List, Union
import warnings
from typing import FrozenSet, List, Union

from pythainlp.corpus import get_corpus, get_corpus_as_is, get_corpus_path

Expand Down
8 changes: 4 additions & 4 deletions pythainlp/corpus/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,13 @@
"""
Corpus related functions.
"""
import json
import os
from typing import Union
import json

from pythainlp import __version__
from pythainlp.corpus import corpus_db_path, corpus_db_url, corpus_path
from pythainlp.tools import get_full_data_path
from pythainlp import __version__


_CHECK_MODE = os.getenv("PYTHAINLP_READ_MODE")

Expand Down Expand Up @@ -293,9 +292,10 @@ def _download(url: str, dst: str) -> int:
"""
_CHUNK_SIZE = 64 * 1024 # 64 KiB

import requests
from urllib.request import urlopen

import requests

file_size = int(urlopen(url).info().get("Content-Length", -1))
r = requests.get(url, stream=True)
with open(get_full_data_path(dst), "wb") as f:
Expand Down
1 change: 0 additions & 1 deletion pythainlp/corpus/icu.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@

from pythainlp.corpus.common import get_corpus


_THAI_ICU_FILENAME = "icubrk_th.txt"


Expand Down
4 changes: 1 addition & 3 deletions pythainlp/corpus/tnc.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,7 @@
from collections import defaultdict
from typing import List, Tuple

from pythainlp.corpus import get_corpus
from pythainlp.corpus import get_corpus_path

from pythainlp.corpus import get_corpus, get_corpus_path

_FILENAME = "tnc_freq.txt"
_BIGRAM = "tnc_bigram_word_freqs"
Expand Down
6 changes: 4 additions & 2 deletions pythainlp/el/_multiel.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,16 @@ def __init__(self, model_name="bela", device="cuda"):
self.model_name = model_name
self.device = device
self.load_model()

def load_model(self):
try:
from multiel import BELA
except ImportError:
except ImportError as exc:
raise ImportError(
"Can't import multiel package, you can install by pip install multiel."
)
) from exc
self._bela_run = BELA(device=self.device)

def process_batch(self, list_text):
if isinstance(list_text, str):
list_text = [list_text]
Expand Down
4 changes: 2 additions & 2 deletions pythainlp/generate/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@
Thai Text Generation
"""

__all__ = ["Unigram", "Bigram", "Trigram"]
__all__ = ["Bigram", "Trigram", "Unigram"]

from pythainlp.generate.core import Unigram, Bigram, Trigram
from pythainlp.generate.core import Bigram, Trigram, Unigram
9 changes: 5 additions & 4 deletions pythainlp/generate/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,14 @@
"""
import random
from typing import List, Union
from pythainlp.corpus.tnc import unigram_word_freqs as tnc_word_freqs_unigram
from pythainlp.corpus.tnc import bigram_word_freqs as tnc_word_freqs_bigram
from pythainlp.corpus.tnc import trigram_word_freqs as tnc_word_freqs_trigram
from pythainlp.corpus.ttc import unigram_word_freqs as ttc_word_freqs_unigram

from pythainlp.corpus.oscar import (
unigram_word_freqs as oscar_word_freqs_unigram,
)
from pythainlp.corpus.tnc import bigram_word_freqs as tnc_word_freqs_bigram
from pythainlp.corpus.tnc import trigram_word_freqs as tnc_word_freqs_trigram
from pythainlp.corpus.tnc import unigram_word_freqs as tnc_word_freqs_unigram
from pythainlp.corpus.ttc import unigram_word_freqs as ttc_word_freqs_unigram


class Unigram:
Expand Down
Loading
Loading