Skip to content

bug: Duplicate key in dictionary #846

@BLKSerene

Description

@BLKSerene

Description

Two issues of duplicate keys in dictionary are found in the codebase:

_punctuation_and_digits = {
"ๆ": "«",
"ฯ": "ǂ",
"๏": "§",
"ฯ": "ǀ",
"๚": "ǁ",
"๛": "»",
"๐": "0",
"๑": "1",
"๒": "2",
"๓": "3",
"๔": "4",
"๕": "5",
"๖": "6",
"๗": "7",
"๘": "8",
"๙": "9",
}

The key "ฯ" has been assigned two different values: "ǂ" and "ǀ".

dict_ipa_rtgs = {
"b":"b",
"d":"d",
"f":"f",
"h":"h",
"j":"y",
"k":"k",
"kʰ":"kh",
"l":"l",
"m":"m",
"n":"n",
"ŋ":"ng",
"p":"p",
"pʰ":"ph",
"r":"r",
"s":"s",
"t":"t",
"tʰ":"th",
"tɕ":"ch",
"tɕʰ":"ch",
"w":"w",
"ʔ":"",
"j":"i",
"a":"a",
"e":"e",
"ɛ":"ae",
"i":"i",
"o":"o",
"ɔ":"o",
"u":"u",
"ɯ":"ue",
"ɤ":"oe",
"aː":"a",
"eː":"e",
"ɛː":"ae",
"iː":"i",
"oː":"o",
"ɔː":"o",
"uː":"u",
"ɯː":"ue",
"ɤː":"oe",
"ia":"ia",
"ua":"ua",
"ɯa":"uea",
"aj":"ai",
"aw":"ao",
"ew":"eo",
"ɛw":"aeo",
"iw":"io",
"ɔj":"io",
"uj":"ui",
"aːj":"ai",
"aːw":"ao",
"eːw":"eo",
"ɛːw":"aeo",
"oːj":"oi",
"ɔːj":"oi",
"ɤːj":"oei",
"iaw":"iao",
"uaj":"uai",
"ɯaj":"ueai",
".":".",
}

The key "j" has been assigned two different values: "y" and "i".

Since I do not speak Thai, I do not have the required knowledge to decide on which line should be removed.

Expected results

No duplicate keys in dictionaries.

Current results

Duplicate keys in dictionaries.

Steps to reproduce

Check in the codebase.

PyThaiNLP version

4.0.2

Python version

3.10.13

Operating system and version

Windows 11 22H2 x64

More info

No response

Possible solution

No response

Files

No response

Metadata

Metadata

Assignees

Labels

Hacktoberfestfor Hacktoberfest eventbugbugs in the library

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions