-
Notifications
You must be signed in to change notification settings - Fork 6
2024-09-04 changes (huge!) #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Right now I am running some more scripts to compare CDP/CNS/Daikanwa to UCS data to find: |
|
I have just compared the UCS files and Daikanwa and I found three issues in the Daikanwa files. I don't know how to fix them as I don't have access to the source material, but I am quite sure there is a problem: |
|
Additionally, I found two erros in the CBETA file: |
cf. https://www.chise.org/est/view/character/rep.cbeta=08110 CB08110 is 𠼢, not 強. In general, the second column was automatically generated in old days, that was the mapping in the time or sometimes it would break due to some problems of Sometimes it would break due to some problems with wrong setting, editing or definitions of XEmacs CHISE. In addition, the second column is designed to display isolated character of the encoding, not Unicode mapping. So, it should be stored entity-reference. If normal Unicode character is stored, it may be bug. Anyway, IDS-CBETA.txt was automatically generated and not maintained enough. We need Taishō Tripiṭaka (大正新脩大藏經) to check this file semantically, but now I don't have these volumes. Even if I can access them easily, I don't have enough time to do semantically check. I can regenerate it based on the current CHISE character ontology, but it might introduce new semantic bugs instead of fix syntactic problems. I think the value of this file is that it records information about when the file was created, including syntax issues. |
|
By the way, we should move to issue. |
|
So far by comparing the UCS file to CDP, Daikanwa and CBETA I have some 350 fixes, and half of them suggest a unicode character back into UCS. I still need to process against CNS. (P.S.: I will not commit until you checked the current fixes.) |
|
I finished comparing UCS files against CDP, Daikanwa, CBETA and CNS (and in between each of these I re-compared UCS to itself). I have 388 fixes waiting. By the way I will not be home from the 16th to about the 23rd. I will not be able to commit before the 23rd. |
|
Actually they are 589 fixes, not 388. I was missing a huge chunk of them in the summary file. Let me know when you're ready to take them. |
|
Hello, any news? As I commented after sending the pull request, I have another set of changes pending. I would like to make them available to the community. How should I proceed? |
See file "mod24.txt" for details.
Contents were too big to paste here.