-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Improve parsing of short DOIs #6920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fix merge
tobiasdiez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot. It's a nice improvement, and I really like that you added many tests for it. The code looks good to me, I've only a few nitpick comments.
For the code styling, have a look at: https://devdocs.jabref.org/getting-into-the-code/guidelines-for-setting-up-a-local-workspace#using-jabrefs-code-style
| for (Field field : FIELDS) { | ||
| entry.getField(field).flatMap(DOI::parse) | ||
| .ifPresent(unused -> removeFieldValue(entry, field, changes)); | ||
| .ifPresent(unused -> removeFieldValue(entry, field, changes)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the added indent.
| this.doi = "10/"+shortcutDoiMatcher.group(1); | ||
| isShortDoi = true; | ||
| } else { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove empty line
| * checks whether the DOI might as well be something else, like: | ||
| * "10/2012" or " 10:as23 ". | ||
| * | ||
| * sets the property "isAmbiguous" accordingly. ( NOT USED YET! ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the convention that in general dead/unused code should be removed. Do you have any concrete plans/ideas where the isAmbiguous variable is needed?
If not, I would ask you to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Tobias,
Yes I wasn't sure whether to open a new issue or to discuss my idea here in the comments somewhere:
When dois are cleaned up, the fields Notes and url are scanned for dois and short dois. The way it is implemented now, an entry like eg 10/2021 is interpreted as a short doi, when this might very well be something else, like a date...
checkForAmbiguousShortDOI() detects these ambiguous cases and sets the property isAmbiguous accordingly.
I stopped there, because I am still trying to find out whether a short doi like 10/1234 is even possible i.e. whether it can be fully numeric. I wrote an email to [email protected], but haven't heard back yet...
Also, I wanted to open a discussion on what to do with such ambiguous cases. We could implement some DOIexistanceChecker that checks whether the supposed doi can be found. But a) I am not sure what would be the most efficient way of doing this and b) I am worried about scaling issues when there are thousands of ambiguous dois... Maybe the checking should be optional via a tick-box in respective gui?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point to be aware of. For the conversation from the Notes and Url fields, I would say we can be optimistic and put it in the doi field if it's look like an DOI. I guess it's very rare to have only a date in the notes field.
I like the idea of checking doi's for existence (as an integrity check). I'm not sure how to implement it effectively, without querying crossref's api for each doi separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. the way I understand you, the integrity checking would be applied to all DOIs right? should i remove the unused method and property regarding ambiguous short DOIs then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, integrity checks are always applied to all entries; and some users have 10k...
For now I would say remove the code for ambiguous dois. The code is not lost, as we can go back to this PR to copy it once it's really needed.
| assertEquals("10/gf4gqc", new DOI("doi:10/gf4gqc").getDOI()); | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove new empty lines
|
Please fix the checkstyle issues. You can install the extension in Intellij directly: |
Hi Siedlerchr, What am I doing wrong? |
|
No idea what's wrong with the plugin, but you can always execute ./gradlew checkstyle on the comandline |
Siedlerchr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now! Thanks for tackling this issue
| @Test | ||
| public void acceptURNPrefix() { | ||
| assertEquals("10.123/456", new DOI("urn:10.123/456").getDOI()); | ||
| assertEquals("10.123/456", new DOI("urn:doi:10.123/456").getDOI()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: urn:doi is not a valid DOI: See https://en.wikipedia.org/wiki/Digital_object_identifier#Standardization
koppor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now
…-issue-6707 * upstream/master: (35 commits) Fix a fetcher test for the ShortDOIService (#6945) Fixes Shared Database: Changes filtering in CoarseChangeFilter to attribute property (#6868) Changed default value of "search and store files relative to bibtex file" to true (#6928) Replace comment by just a failure (#6943) Fix: in entry types editor selected field is not removed after first click (#6941) Fix remove actions for entry types in the editor (#6933) Always use Java 15 (#6929) Update DevDocs: workaround for issues with local openjfx maven libraries (#6931) Fixes bugs in the `regex` cite key pattern modifier (#6893) Add missing author Readability for citation key patterns (#6706) Add new author Reset to master and add default case to switch (#6847) Bump mockito-core from 3.5.10 to 3.5.11 (#6924) Bump byte-buddy-parent from 1.10.14 to 1.10.15 (#6923) Bump org.beryx.jlink from 2.21.4 to 2.22.0 (#6925) Bump xmpbox from 2.0.20 to 2.0.21 (#6926) Bump pascalgn/automerge-action from v0.9.0 to v0.10.0 (#6927) Improve parsing of short DOIs (#6920) Bump junit-vintage-engine from 5.6.2 to 5.7.0 (#6910) ...
Fixes #6880
no more erroneous interpretation as short DOI of url's/notes like:
still correct detection of short DOIs hidden in url's as long as doi is part of subdomain, eg:
correct detection of short DOIs like:
added respective tests.
added entry to CHANGELOG