Make the DOI Resolution Fetcher return nothing when the DOI leads to a host for which a tailored fetcher exists #6937

Toromtomtom · 2020-09-23T19:33:02Z

Change in CHANGELOG.md described (not applicable)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
Screenshots added in PR description (for UI changes)
Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

…a host for which a tailored fetcher exists.

tobiasdiez · 2020-09-24T09:32:48Z

As you noticed, the underlying problem is actually that the DOI fetcher has a higher trust value as the publishers. I think it would be a good idea to change it to "publishers > identifier-based resolution (doi, arXiv) > general search (google)". @JabRef/developers @Toromtomtom do you see any problem with this solution?

Toromtomtom · 2020-09-24T10:14:02Z

I think it would be a good idea to change it to "publishers > identifier-based resolution (doi, arXiv) > general search (google)".

I also think that this would be a better solution.

koppor · 2020-09-24T17:40:19Z

~~+1 from my side, too~~

Toromtomtom · 2020-09-25T17:09:18Z

I reverted my previous commits and decreased the trust level of the DOI resolution fetcher. This works for me, but maybe someone more involved in the project wants to weigh in on the ranking of the full text fetchers.

koppor · 2020-09-26T16:09:50Z

Food for thought:

A DOI uniquely identifies a paper. Per defition, a DOI leads to the right paper. Everything else is good guessing.
One title of a paper may lead to different publications of it. One the confernce version, the other the journal version. --> the PDF could be chosen randomly
What about the consequences for other fetchers? Do we overlook something?
Can't we contact Springer to fix their DOI 2 PDF mapping?

Proposal: Can we add a special handling for Springer? If a DOI directs to Springer, we use the Springer Fetcher. In all other cases, the functionality is untouched. In this way, we accept that this is a hack.

To really judge, there would be a test needed retrieving 1000 papers and check whether the retrieval rate is higher or lower with this check. - Alternatively, can we add telemetry for that?

koppor · 2020-09-26T16:11:38Z

src/main/java/org/jabref/logic/importer/fetcher/DoiResolution.java

    @Override
    public TrustLevel getTrustLevel() {
-        return TrustLevel.SOURCE;
+        return TrustLevel.META_SEARCH;


This decreases the result quality of the DOI fetcher (always leading to the "right" paper) to the quality of Google Scholar. (From the highest to the lowest)

Can the solution of the title? 😇

Make the DOI Resolution Fetcher return nothing when the DOI leads to a host for which a tailored fetcher exists

Siedlerchr · 2020-09-26T16:16:37Z

The problem is that a DOI often does not lead to the fulltext version directly, but to the site where to find the fulltext. And our DOIResolution Fetcher does some magic guessing by looking at the first PDF-link the sourcecode of the website.
Every publisher's/journal page looks different.

tobiasdiez · 2020-09-26T16:50:22Z

The springer fetcher also only looks at the DOI, but uses the springer API to find the correct URL for the download.

jabref/src/main/java/org/jabref/logic/importer/fetcher/SpringerLink.java

Lines 39 to 49 in deb2f20

    
           Optional<DOI> doi = entry.getField(StandardField.DOI).flatMap(DOI::parse); 
        
           if (!doi.isPresent()) { 
        
               return Optional.empty(); 
        
           } 
        
           // Available in catalog? 
        
           try { 
        
               HttpResponse<JsonNode> jsonResponse = Unirest.get(API_URL) 
        
                                                            .queryString("api_key", API_KEY) 
        
                                                            .queryString("q", String.format("doi:%s", doi.get().getDOI())) 
        
                                                            .asJson();

koppor · 2020-09-29T06:35:30Z

Devcall decision: Use first solution. -- @koppor will do git magic

Toromtomtom · 2020-09-29T07:17:07Z

All right, thanks for taking care of this!

tobiasdiez · 2020-09-29T11:02:27Z

@koppor In addition, the SpringerLink should have a higher trust score as the DoiResolution fetcher, since it's also DOI-based but custom-tailored to Springer. I would also merge this class with the other springer fetcher.

Toromtomtom · 2020-10-06T08:27:33Z

Is there anything I can do to move this forward? Reset the branch or something?

koppor · 2020-10-07T10:55:06Z

Steps:

I do as promised at Make the DOI Resolution Fetcher return nothing when the DOI leads to a host for which a tailored fetcher exists #6937 (comment)
I search all issues and PRs to document the idea of the concept (priorities, information sources, ...)
I think how to transform the comment Make the DOI Resolution Fetcher return nothing when the DOI leads to a host for which a tailored fetcher exists #6937 (comment) to a follow-up issue. Which implications should be thought of.

In parallel, I discuss with @stefan-kolb, because he invented the whole thing. My mistake was not to enforce that design decisions are documented (either as ADR or as other text files)

stefan-kolb · 2020-10-07T10:57:08Z

Initial PR #3882
More info and discussion: #3881

koppor · 2020-10-07T21:10:08Z

I think, I collected all documentation and put it at the appropriate place at #6990.

So, nothing to do for @Toromtomtom in this PR.

koppor · 2020-10-07T21:10:44Z

The first two commits are in master now. See ce9f714.

Toromtomtom added 2 commits September 23, 2020 21:24

Make the DOI Resolution Fetcher return nothing when the DOI leads to …

eae8a8a

…a host for which a tailored fetcher exists.

fixes a Checkstyle error

8c13808

Toromtomtom added 2 commits September 25, 2020 17:53

revert the last two commits

99f76d2

decrease the trust level of the DOI resolution fetcher

6783268

Siedlerchr approved these changes Sep 25, 2020

View reviewed changes

Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Sep 25, 2020

koppor reviewed Sep 26, 2020

View reviewed changes

koppor mentioned this pull request Oct 7, 2020

Fix DOI fetcher and add documentation on fetcher trust levels #6990

Merged

5 tasks

koppor closed this Oct 7, 2020

Toromtomtom deleted the fix-6922 branch October 8, 2020 06:07

Uh oh!

Make the DOI Resolution Fetcher return nothing when the DOI leads to a host for which a tailored fetcher exists #6937

Make the DOI Resolution Fetcher return nothing when the DOI leads to a host for which a tailored fetcher exists #6937

Uh oh!

Conversation

Toromtomtom commented Sep 23, 2020

Uh oh!

tobiasdiez commented Sep 24, 2020

Uh oh!

Toromtomtom commented Sep 24, 2020

Uh oh!

koppor commented Sep 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Toromtomtom commented Sep 25, 2020

Uh oh!

koppor commented Sep 26, 2020

Uh oh!

koppor Sep 26, 2020

Choose a reason for hiding this comment

Uh oh!

Siedlerchr commented Sep 26, 2020

Uh oh!

tobiasdiez commented Sep 26, 2020

Uh oh!

koppor commented Sep 29, 2020

Uh oh!

Toromtomtom commented Sep 29, 2020

Uh oh!

tobiasdiez commented Sep 29, 2020

Uh oh!

Toromtomtom commented Oct 6, 2020

Uh oh!

koppor commented Oct 7, 2020

Uh oh!

stefan-kolb commented Oct 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

koppor commented Oct 7, 2020

Uh oh!

koppor commented Oct 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

koppor commented Sep 24, 2020 •

edited

Loading

stefan-kolb commented Oct 7, 2020 •

edited

Loading