Skip to content

Conversation

@Toromtomtom
Copy link
Contributor

Fixes #6922

  • Change in CHANGELOG.md described (not applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

@tobiasdiez
Copy link
Member

As you noticed, the underlying problem is actually that the DOI fetcher has a higher trust value as the publishers. I think it would be a good idea to change it to "publishers > identifier-based resolution (doi, arXiv) > general search (google)". @JabRef/developers @Toromtomtom do you see any problem with this solution?

@Toromtomtom
Copy link
Contributor Author

I think it would be a good idea to change it to "publishers > identifier-based resolution (doi, arXiv) > general search (google)".

I also think that this would be a better solution.

@koppor
Copy link
Member

koppor commented Sep 24, 2020

+1 from my side, too

@Toromtomtom
Copy link
Contributor Author

I reverted my previous commits and decreased the trust level of the DOI resolution fetcher. This works for me, but maybe someone more involved in the project wants to weigh in on the ranking of the full text fetchers.

@Siedlerchr Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Sep 25, 2020
@koppor
Copy link
Member

koppor commented Sep 26, 2020

Food for thought:

  • A DOI uniquely identifies a paper. Per defition, a DOI leads to the right paper. Everything else is good guessing.
  • One title of a paper may lead to different publications of it. One the confernce version, the other the journal version. --> the PDF could be chosen randomly
  • What about the consequences for other fetchers? Do we overlook something?
  • Can't we contact Springer to fix their DOI 2 PDF mapping?

Proposal: Can we add a special handling for Springer? If a DOI directs to Springer, we use the Springer Fetcher. In all other cases, the functionality is untouched. In this way, we accept that this is a hack.

To really judge, there would be a test needed retrieving 1000 papers and check whether the retrieval rate is higher or lower with this check. - Alternatively, can we add telemetry for that?

@Override
public TrustLevel getTrustLevel() {
return TrustLevel.SOURCE;
return TrustLevel.META_SEARCH;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decreases the result quality of the DOI fetcher (always leading to the "right" paper) to the quality of Google Scholar. (From the highest to the lowest)

Can the solution of the title? 😇

Make the DOI Resolution Fetcher return nothing when the DOI leads to a host for which a tailored fetcher exists

@Siedlerchr
Copy link
Member

The problem is that a DOI often does not lead to the fulltext version directly, but to the site where to find the fulltext. And our DOIResolution Fetcher does some magic guessing by looking at the first PDF-link the sourcecode of the website.
Every publisher's/journal page looks different.

@tobiasdiez
Copy link
Member

The springer fetcher also only looks at the DOI, but uses the springer API to find the correct URL for the download.

Optional<DOI> doi = entry.getField(StandardField.DOI).flatMap(DOI::parse);
if (!doi.isPresent()) {
return Optional.empty();
}
// Available in catalog?
try {
HttpResponse<JsonNode> jsonResponse = Unirest.get(API_URL)
.queryString("api_key", API_KEY)
.queryString("q", String.format("doi:%s", doi.get().getDOI()))
.asJson();

@koppor
Copy link
Member

koppor commented Sep 29, 2020

Devcall decision: Use first solution. -- @koppor will do git magic

@Toromtomtom
Copy link
Contributor Author

All right, thanks for taking care of this!

@tobiasdiez
Copy link
Member

@koppor In addition, the SpringerLink should have a higher trust score as the DoiResolution fetcher, since it's also DOI-based but custom-tailored to Springer. I would also merge this class with the other springer fetcher.

@Toromtomtom
Copy link
Contributor Author

Is there anything I can do to move this forward? Reset the branch or something?

@koppor
Copy link
Member

koppor commented Oct 7, 2020

Steps:

  1. I do as promised at Make the DOI Resolution Fetcher return nothing when the DOI leads to a host for which a tailored fetcher exists #6937 (comment)
  2. I search all issues and PRs to document the idea of the concept (priorities, information sources, ...)
  3. I think how to transform the comment Make the DOI Resolution Fetcher return nothing when the DOI leads to a host for which a tailored fetcher exists #6937 (comment) to a follow-up issue. Which implications should be thought of.

In parallel, I discuss with @stefan-kolb, because he invented the whole thing. My mistake was not to enforce that design decisions are documented (either as ADR or as other text files)

@stefan-kolb
Copy link
Member

stefan-kolb commented Oct 7, 2020

Initial PR #3882
More info and discussion: #3881

@koppor
Copy link
Member

koppor commented Oct 7, 2020

I think, I collected all documentation and put it at the appropriate place at #6990.

So, nothing to do for @Toromtomtom in this PR.

@koppor koppor closed this Oct 7, 2020
@koppor
Copy link
Member

koppor commented Oct 7, 2020

The first two commits are in master now. See ce9f714.

@Toromtomtom Toromtomtom deleted the fix-6922 branch October 8, 2020 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DoiResolution Fetcher fetches whole book for some Springer conference papers

5 participants