pythainlp.corpus.get_corpus_path() should not try to download the corpus automatically

เสนอว่าไม่ควรใช้ `pythainlp.corpus.get_corpus_path()` นั้นเรียกดาวน์โหลดแฟ้มโดยอัตโนมัติหากมันหาแฟ้มไม่เจอครับ ควรจะปล่อยให้ผู้ใช้ตัดสินใจเองมากกว่า

Current `get_corpus_path()` try to download the corpus file if it is not yet exist locally:

https://github.com/PyThaiNLP/pythainlp/blob/831a9fcfd24e069b6e929283b3abdc161a9a5608/pythainlp/corpus/core.py#L81

```python
    if db.search(query.name == name):
        path = get_full_data_path(db.search(query.name == name)[0]["file"])

        if not os.path.exists(path):
            download(name)
```

I proposed that it shouldn't do that.

If the file is not exist, user/developer should get notified and decided if they want to download it or not (using API or using command line).

Currently, inside pythainlp module, every single call of `get_corpus_path()` do exactly that. They check if returned path is "true", if not they call `pythainlp.corpus.download()` by themselves:
- https://github.com/PyThaiNLP/pythainlp/blob/831a9fcfd24e069b6e929283b3abdc161a9a5608/pythainlp/tag/named_entity.py#L79
- https://github.com/PyThaiNLP/pythainlp/blob/831a9fcfd24e069b6e929283b3abdc161a9a5608/pythainlp/transliterate/thai2rom.py#L24
- https://github.com/PyThaiNLP/pythainlp/blob/831a9fcfd24e069b6e929283b3abdc161a9a5608/pythainlp/transliterate/thaig2p.py#L25
- https://github.com/PyThaiNLP/pythainlp/blob/831a9fcfd24e069b6e929283b3abdc161a9a5608/pythainlp/ulmfit/__init__.py#L134
- https://github.com/PyThaiNLP/pythainlp/blob/831a9fcfd24e069b6e929283b3abdc161a9a5608/pythainlp/word_vector/__init__.py#L23

So removing the auto-download inside `pythainlp.corpus.get_corpus_path()` will not change the behavior of those functions in submodules. (Anyway, it can be further discuss if we want to remove the auto-downloads from those submodules as well or not).

## Proposed return values
I propose these for discussion:
- **full path** - if the corpus name is valid and the file is exist locally
- **""** (empty string) - if the corpus name is valid but the file is not exist locally
- **None** - if the corpus name is not valid (not inside the corpus database)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pythainlp.corpus.get_corpus_path() should not try to download the corpus automatically #385

Proposed return values

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pythainlp.corpus.get_corpus_path() should not try to download the corpus automatically #385

Description

Proposed return values

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions