Skip to content

Conversation

@PremKolar
Copy link
Contributor

@PremKolar PremKolar commented Sep 17, 2020

Fixes #6880

no more erroneous interpretation as short DOI of url's/notes like:

still correct detection of short DOIs hidden in url's as long as doi is part of subdomain, eg:

  • "www.doi.xyz.com/10/abc123"

correct detection of short DOIs like:

added respective tests.

added entry to CHANGELOG

  • Change in CHANGELOG.md described (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

@Siedlerchr Siedlerchr changed the title Bugfix/issue6880 Improve parsing of short DOIs Sep 18, 2020
@Siedlerchr Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Sep 18, 2020
Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. It's a nice improvement, and I really like that you added many tests for it. The code looks good to me, I've only a few nitpick comments.

For the code styling, have a look at: https://devdocs.jabref.org/getting-into-the-code/guidelines-for-setting-up-a-local-workspace#using-jabrefs-code-style

for (Field field : FIELDS) {
entry.getField(field).flatMap(DOI::parse)
.ifPresent(unused -> removeFieldValue(entry, field, changes));
.ifPresent(unused -> removeFieldValue(entry, field, changes));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the added indent.

this.doi = "10/"+shortcutDoiMatcher.group(1);
isShortDoi = true;
} else {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove empty line

* checks whether the DOI might as well be something else, like:
* "10/2012" or " 10:as23 ".
*
* sets the property "isAmbiguous" accordingly. ( NOT USED YET! )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the convention that in general dead/unused code should be removed. Do you have any concrete plans/ideas where the isAmbiguous variable is needed?
If not, I would ask you to remove it.

Copy link
Contributor Author

@PremKolar PremKolar Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Tobias,
Yes I wasn't sure whether to open a new issue or to discuss my idea here in the comments somewhere:
When dois are cleaned up, the fields Notes and url are scanned for dois and short dois. The way it is implemented now, an entry like eg 10/2021 is interpreted as a short doi, when this might very well be something else, like a date...
checkForAmbiguousShortDOI() detects these ambiguous cases and sets the property isAmbiguous accordingly.
I stopped there, because I am still trying to find out whether a short doi like 10/1234 is even possible i.e. whether it can be fully numeric. I wrote an email to [email protected], but haven't heard back yet...
Also, I wanted to open a discussion on what to do with such ambiguous cases. We could implement some DOIexistanceChecker that checks whether the supposed doi can be found. But a) I am not sure what would be the most efficient way of doing this and b) I am worried about scaling issues when there are thousands of ambiguous dois... Maybe the checking should be optional via a tick-box in respective gui?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point to be aware of. For the conversation from the Notes and Url fields, I would say we can be optimistic and put it in the doi field if it's look like an DOI. I guess it's very rare to have only a date in the notes field.

I like the idea of checking doi's for existence (as an integrity check). I'm not sure how to implement it effectively, without querying crossref's api for each doi separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. the way I understand you, the integrity checking would be applied to all DOIs right? should i remove the unused method and property regarding ambiguous short DOIs then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, integrity checks are always applied to all entries; and some users have 10k...

For now I would say remove the code for ambiguous dois. The code is not lost, as we can go back to this PR to copy it once it's really needed.

assertEquals("10/gf4gqc", new DOI("doi:10/gf4gqc").getDOI());
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove new empty lines

@tobiasdiez tobiasdiez added the status: changes-required Pull requests that are not yet complete label Sep 18, 2020
@Siedlerchr
Copy link
Member

Please fix the checkstyle issues. You can install the extension in Intellij directly:
https://github.com/JabRef/jabref/tree/master/config

> Task :classes
[ant:checkstyle] [ERROR] /home/runner/work/jabref/jabref/src/main/java/org/jabref/model/entry/identifier/DOI.java:53:13: '+' is not followed by whitespace. [WhitespaceAround]

> Task :checkstyleMain
[ant:checkstyle] [ERROR] /home/runner/work/jabref/jabref/src/main/java/org/jabref/model/entry/identifier/DOI.java:92:96: ',' is preceded with whitespace. [NoWhitespaceBefore]
[ant:checkstyle] [ERROR] /home/runner/work/jabref/jabref/src/main/java/org/jabref/model/entry/identifier/DOI.java:139:37: '+' is not followed by whitespace. [WhitespaceAround]
[ant:checkstyle] [ERROR] /home/runner/work/jabref/jabref/src/main/java/org/jabref/model/entry/identifier/DOI.java:139:37: '+' is not preceded with whitespace. [WhitespaceAround]

@PremKolar
Copy link
Contributor Author

Please fix the checkstyle issues. You can install the extension in Intellij directly:
https://github.com/JabRef/jabref/tree/master/config

> Task :classes
[ant:checkstyle] [ERROR] /home/runner/work/jabref/jabref/src/main/java/org/jabref/model/entry/identifier/DOI.java:53:13: '+' is not followed by whitespace. [WhitespaceAround]

> Task :checkstyleMain
[ant:checkstyle] [ERROR] /home/runner/work/jabref/jabref/src/main/java/org/jabref/model/entry/identifier/DOI.java:92:96: ',' is preceded with whitespace. [NoWhitespaceBefore]
[ant:checkstyle] [ERROR] /home/runner/work/jabref/jabref/src/main/java/org/jabref/model/entry/identifier/DOI.java:139:37: '+' is not followed by whitespace. [WhitespaceAround]
[ant:checkstyle] [ERROR] /home/runner/work/jabref/jabref/src/main/java/org/jabref/model/entry/identifier/DOI.java:139:37: '+' is not preceded with whitespace. [WhitespaceAround]

Hi Siedlerchr,
Whenever I try to add the checkstyle file to the checkstyle plugin I get the following error:


com.puppycrawl.tools.checkstyle.api.CheckstyleException: cannot initialize module TreeWalker - Unable to instantiate 'InvalidJavadocPosition' class, it is also not possible to instantiate it as .InvalidJavadocPosition, InvalidJavadocPositionCheck, .InvalidJavadocPositionCheck. Please recheck that class name is specified as canonical name or read how to configure short name usage https://checkstyle.org/config.html#Packages. Please also recheck that provided ClassLoader to Checker is configured correctly.
	at com.puppycrawl.tools.checkstyle.Checker.setupChild(Checker.java:463)
	at com.puppycrawl.tools.checkstyle.api.AutomaticBean.configure(AutomaticBean.java:198)
	at org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:61)
	at org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:26)
	at org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.executeCommand(CheckstyleActionsImpl.java:130)
	at org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:60)
	at org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:51)
	at org.infernus.idea.checkstyle.checker.CheckerFactoryWorker.run(CheckerFactoryWorker.java:46)
Caused by: com.puppycrawl.tools.checkstyle.api.CheckstyleException: Unable to instantiate 'InvalidJavadocPosition' class, it is also not possible to instantiate it as .InvalidJavadocPosition, InvalidJavadocPositionCheck, .InvalidJavadocPositionCheck. Please recheck that class name is specified as canonical name or read how to configure short name usage https://checkstyle.org/config.html#Packages. Please also recheck that provided ClassLoader to Checker is configured correctly.
	at com.puppycrawl.tools.checkstyle.PackageObjectFactory.createModule(PackageObjectFactory.java:210)
	at com.puppycrawl.tools.checkstyle.TreeWalker.setupChild(TreeWalker.java:152)
	at com.puppycrawl.tools.checkstyle.api.AutomaticBean.configure(AutomaticBean.java:198)
	at com.puppycrawl.tools.checkstyle.Checker.setupChild(Checker.java:458)
	... 7 more

What am I doing wrong?
Choosing the IntelliJ IDEA code style XML as style file works, but that alone doesn't have any discernible effect on anything... Sorry, I am quite new to Java and very new to IntelliJ IDEA...

@Siedlerchr
Copy link
Member

No idea what's wrong with the plugin, but you can always execute ./gradlew checkstyle on the comandline

Copy link
Member

@Siedlerchr Siedlerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now! Thanks for tackling this issue

@Test
public void acceptURNPrefix() {
assertEquals("10.123/456", new DOI("urn:10.123/456").getDOI());
assertEquals("10.123/456", new DOI("urn:doi:10.123/456").getDOI());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: urn:doi is not a valid DOI: See https://en.wikipedia.org/wiki/Digital_object_identifier#Standardization

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now

@koppor koppor merged commit d944eae into JabRef:master Sep 20, 2020
@koppor koppor removed status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers status: changes-required Pull requests that are not yet complete labels Sep 20, 2020
Siedlerchr added a commit that referenced this pull request Sep 26, 2020
…-issue-6707

* upstream/master: (35 commits)
  Fix a fetcher test for the ShortDOIService (#6945)
  Fixes Shared Database: Changes filtering in CoarseChangeFilter to attribute property (#6868)
  Changed default value of "search and store files relative to bibtex file" to true (#6928)
  Replace comment by just a failure (#6943)
  Fix: in entry types editor selected field is not removed after first click  (#6941)
  Fix remove actions for entry types in the editor (#6933)
  Always use Java 15 (#6929)
  Update DevDocs: workaround for issues with local openjfx maven libraries (#6931)
  Fixes bugs in the `regex` cite key pattern modifier (#6893)
  Add missing author
  Readability for citation key patterns (#6706)
  Add new author
  Reset to master and add default case to switch (#6847)
  Bump mockito-core from 3.5.10 to 3.5.11 (#6924)
  Bump byte-buddy-parent from 1.10.14 to 1.10.15 (#6923)
  Bump org.beryx.jlink from 2.21.4 to 2.22.0 (#6925)
  Bump xmpbox from 2.0.20 to 2.0.21 (#6926)
  Bump pascalgn/automerge-action from v0.9.0 to v0.10.0 (#6927)
  Improve parsing of short DOIs (#6920)
  Bump junit-vintage-engine from 5.6.2 to 5.7.0 (#6910)
  ...
@PremKolar PremKolar mentioned this pull request Dec 14, 2020
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cleanup entry "Move DOIs from note and URL field to DOI field and remove http prefix" incorrectly recognizies urls ending with "2010/stuff" as DOIs

4 participants