2024-09-04 changes (huge!) #19

Yukinoroh · 2024-09-04T06:36:42Z

See file "mod24.txt" for details.
Contents were too big to paste here.

Yukinoroh · 2024-09-04T10:43:37Z

Right now I am running some more scripts to compare CDP/CNS/Daikanwa to UCS data to find:
・CDP/CNS/Daikanwa character refered to in the UCS file and whose decomposition data exactly equals to that of an existing UCS character;
・Partial decomposition data in UCS that could be replaced by a CDP/CNS/Daikanwa character.

Yukinoroh · 2024-09-08T05:11:37Z

I have just compared the UCS files and Daikanwa and I found three issues in the Daikanwa files. I don't know how to fix them as I don't have access to the source material, but I am quite sure there is a problem:
・M-49556 is labelled as equivalent of U+2458B 𤖋 but yet has decomposition data of U+2E34E 𮍎.
(Maybe this is due to U+2458B 𤖋 being fixed later and M-49556 left as is? I see a lot of data redundancy in the non-UCS files...)
・M-25016 has incomplete decomposition data "⿰禾"
・M-32388 has incomplete decomposition data "⿱⺿"

Yukinoroh · 2024-09-11T00:08:32Z

Additionally, I found two erros in the CBETA file:
・CB08110 強 has decomposition of "U-00020F22 𠼢"
・Both CB02326 and CB11790 use decomposition "⿱巳廾". This may be not a mistake, but I don't know which one to use to simplify U-000266AA 𦚪 and U-00029426 𩐦, so I will leave them untouched for now.

chise · 2024-09-11T00:44:01Z

Additionally, there is a problem with the "CB08110 強" character in IDS-CBETA.txt; it has decomposition of "U-00020F22 𠼢".

cf. https://www.chise.org/est/view/character/rep.cbeta=08110
https://www.chise.org/est/view/character/repi.cbeta=08110

CB08110 is 𠼢, not 強.

In general, the second column was automatically generated in old days, that was the mapping in the time or sometimes it would break due to some problems of Sometimes it would break due to some problems with wrong setting, editing or definitions of XEmacs CHISE. In addition, the second column is designed to display isolated character of the encoding, not Unicode mapping. So, it should be stored entity-reference. If normal Unicode character is stored, it may be bug.

Anyway, IDS-CBETA.txt was automatically generated and not maintained enough. We need Taishō Tripiṭaka (大正新脩大藏經) to check this file semantically, but now I don't have these volumes. Even if I can access them easily, I don't have enough time to do semantically check. I can regenerate it based on the current CHISE character ontology, but it might introduce new semantic bugs instead of fix syntactic problems. I think the value of this file is that it records information about when the file was created, including syntax issues.

chise · 2024-09-11T00:45:05Z

By the way, we should move to issue.

Yukinoroh · 2024-09-11T00:56:32Z

So far by comparing the UCS file to CDP, Daikanwa and CBETA I have some 350 fixes, and half of them suggest a unicode character back into UCS. I still need to process against CNS. (P.S.: I will not commit until you checked the current fixes.)

Yukinoroh · 2024-09-15T09:25:25Z

I finished comparing UCS files against CDP, Daikanwa, CBETA and CNS (and in between each of these I re-compared UCS to itself). I have 388 fixes waiting. By the way I will not be home from the 16th to about the 23rd. I will not be able to commit before the 23rd.

Yukinoroh · 2024-09-29T07:27:59Z

Actually they are 589 fixes, not 388. I was missing a huge chunk of them in the summary file. Let me know when you're ready to take them.

Yukinoroh · 2025-08-04T14:36:50Z

Hello, any news? As I commented after sending the pull request, I have another set of changes pending. I would like to make them available to the community. How should I proceed?

Yukinoroh added 2 commits September 4, 2024 15:25

Add files via upload

5f9353e

Add files via upload

2b4f1bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2024-09-04 changes (huge!) #19

2024-09-04 changes (huge!) #19

Uh oh!

Yukinoroh commented Sep 4, 2024

Uh oh!

Yukinoroh commented Sep 4, 2024 •

edited

Loading

Uh oh!

Yukinoroh commented Sep 8, 2024 •

edited

Loading

Uh oh!

Yukinoroh commented Sep 11, 2024 •

edited

Loading

Uh oh!

chise commented Sep 11, 2024

Uh oh!

chise commented Sep 11, 2024

Uh oh!

Yukinoroh commented Sep 11, 2024

Uh oh!

Yukinoroh commented Sep 15, 2024 •

edited

Loading

Uh oh!

Yukinoroh commented Sep 29, 2024

Uh oh!

Yukinoroh commented Aug 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

2024-09-04 changes (huge!) #19

Are you sure you want to change the base?

2024-09-04 changes (huge!) #19

Uh oh!

Conversation

Yukinoroh commented Sep 4, 2024

Uh oh!

Yukinoroh commented Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yukinoroh commented Sep 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yukinoroh commented Sep 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chise commented Sep 11, 2024

Uh oh!

chise commented Sep 11, 2024

Uh oh!

Yukinoroh commented Sep 11, 2024

Uh oh!

Yukinoroh commented Sep 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yukinoroh commented Sep 29, 2024

Uh oh!

Yukinoroh commented Aug 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Yukinoroh commented Sep 4, 2024 •

edited

Loading

Yukinoroh commented Sep 8, 2024 •

edited

Loading

Yukinoroh commented Sep 11, 2024 •

edited

Loading

Yukinoroh commented Sep 15, 2024 •

edited

Loading