Cleanup entry "Move DOIs from note and URL field to DOI field and remove http prefix" incorrectly recognizies urls ending with "2010/stuff" as DOIs

JabRef version 5.2--2020-09-06--c0b139a on Windows 10 10.0 amd64, Java 14.0.2

- [x] **Mandatory**: I have tested the latest development version from http://builds.jabref.org/master/ and the problem persists

Steps to reproduce the behavior:
1. Save the file
```bibtex
@Misc{TrustedSlind,
  author   = {Konrad Slind},
  title    = {Trusted Extensions of Interactive Theorem Provers: Workshop Summary},
  date     = {2010-08},
  location = {Cambridge, England},
  url      = {http://www.cs.utexas.edu/users/kaufmann/itp-trusted-extensions-aug-2010/summary/summary.pdf},
}
```
  as a `.bib` file.

2. Open this file in JabRef
3. Click on the one entry to select it
4. Click Quality -> Cleanup entries / Alt+F8
5. Ensure that only the first item ("Move DOIs from note and URL field to DOI field and remove http prefix") is checked
6. Click OK
7. Double-click on the entry and click "BibTeX source"

Note that the new source is
```bibtex
@Misc{TrustedSlind,
  author   = {Konrad Slind},
  title    = {Trusted Extensions of Interactive Theorem Provers: Workshop Summary},
  date     = {2010-08},
  doi      = {10/summary},
  location = {Cambridge, England},
}
```

This url is not a DOI link, though!  Presumably this is because the matcher code at https://github.com/JabRef/jabref/blob/ba68c09765d1cf7a7b90fec4c78924243210b1dc/src/main/java/org/jabref/model/entry/identifier/DOI.java#L30-L77
considers all non-space text starting with `http://` or `https://`, followed by `10/` followed by any non-space text, to be a DOI.  This is absurd.  The character immediately preceding the `10`, `doi:`, or `urn:` should at the very least be required to be a url separator character such as `/`, `:`, `?`, `&`, or `=`.

	// Regex
	// (see http://www.doi.org/doi_handbook/2_Numbering.html)
	private static final String DOI_EXP = ""
	+ "(?:urn:)?" // optional urn
	+ "(?:doi:)?" // optional doi
	+ "(" // begin group \1
	+ "10" // directory indicator
	+ "(?:\\.[0-9]+)+" // registrant codes
	+ "[/:%]" // divider
	+ "(?:.+)" // suffix alphanumeric string
	+ ")"; // end group \1
	private static final String FIND_DOI_EXP = ""
	+ "(?:urn:)?" // optional urn
	+ "(?:doi:)?" // optional doi
	+ "(" // begin group \1
	+ "10" // directory indicator
	+ "(?:\\.[0-9]+)+" // registrant codes
	+ "[/:]" // divider
	+ "(?:[^\\s]+)" // suffix alphanumeric without space
	+ ")"; // end group \1

	// Regex (Short DOI)
	private static final String SHORT_DOI_EXP = ""
	+ "(?:urn:)?" // optional urn
	+ "(?:doi:)?" // optional doi
	+ "(" // begin group \1
	+ "10" // directory indicator
	+ "[/:%]" // divider
	+ "[a-zA-Z0-9]+"
	+ ")"; // end group \1
	private static final String FIND_SHORT_DOI_EXP = ""
	+ "(?:urn:)?" // optional urn
	+ "(?:doi:)?" // optional doi
	+ "(" // begin group \1
	+ "10" // directory indicator
	+ "[/:]" // divider
	+ "[a-zA-Z0-9]+"
	+ "(?:[^\\s]+)" // suffix alphanumeric without space
	+ ")"; // end group \1

	private static final String HTTP_EXP = "https?://[^\\s]+?" + DOI_EXP;
	private static final String SHORT_DOI_HTTP_EXP = "https?://[^\\s]+?" + SHORT_DOI_EXP;
	// Pattern
	private static final Pattern EXACT_DOI_PATT = Pattern.compile("^(?:https?://[^\\s]+?)?" + DOI_EXP + "$", Pattern.CASE_INSENSITIVE);
	private static final Pattern DOI_PATT = Pattern.compile("(?:https?://[^\\s]+?)?" + FIND_DOI_EXP, Pattern.CASE_INSENSITIVE);
	// Pattern (short DOI)
	private static final Pattern EXACT_SHORT_DOI_PATT = Pattern.compile("^(?:https?://[^\\s]+?)?" + SHORT_DOI_EXP, Pattern.CASE_INSENSITIVE);
	private static final Pattern SHORT_DOI_PATT = Pattern.compile("(?:https?://[^\\s]+?)?" + FIND_SHORT_DOI_EXP, Pattern.CASE_INSENSITIVE);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Cleanup entry "Move DOIs from note and URL field to DOI field and remove http prefix" incorrectly recognizies urls ending with "2010/stuff" as DOIs #6880

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Cleanup entry "Move DOIs from note and URL field to DOI field and remove http prefix" incorrectly recognizies urls ending with "2010/stuff" as DOIs #6880

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions