Skip to content

N-triples parser not in line with N-triples specification #1276

@csae8092

Description

@csae8092

while trying to parse an n-triples with rdflib version 4.2.2

rdflib.Graph().parse(data='<https://arche-curation.acdh-dev.oeaw.ac.at/api/8458> <https://vocabs.acdh.oeaw.ac.at/schema#hasIdentifier> <make\\u0020me> .', format='nt')

an error is thrown:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/rdflib/plugins/parsers/ntriples.py", line 140, in parse
    self.parseline()
  File "/usr/lib/python3/dist-packages/rdflib/plugins/parsers/ntriples.py", line 195, in parseline
    object = self.object()
  File "/usr/lib/python3/dist-packages/rdflib/plugins/parsers/ntriples.py", line 228, in object
    objt = self.uriref() or self.nodeid() or self.literal()
  File "/usr/lib/python3/dist-packages/rdflib/plugins/parsers/ntriples.py", line 235, in uriref
    uri = self.eat(r_uriref).group(1)
  File "/usr/lib/python3/dist-packages/rdflib/plugins/parsers/ntriples.py", line 210, in eat
    raise ParseError("Failed to eat %s at %s" % (pattern.pattern, self.line))
rdflib.plugins.parsers.ntriples.ParseError: Failed to eat <([^:]+:[^\s"<>]+)> at <make\u0020me> .

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/rdflib/graph.py", line 1043, in parse
    parser.parse(source, self, **args)
  File "/usr/lib/python3/dist-packages/rdflib/plugins/parsers/nt.py", line 26, in parse
    parser.parse(f)
  File "/usr/lib/python3/dist-packages/rdflib/plugins/parsers/ntriples.py", line 142, in parse
    raise ParseError("Invalid line: %r" % self.line)
rdflib.plugins.parsers.ntriples.ParseError: Invalid line: '<make\\u0020me> .'

The traceback suggests the object-URI has to match the <([^:]+:[^\s"<>]+)> regex which is not in line with the n-triples specification (8th statement of the https://www.w3.org/TR/n-triples/#n-triples-grammar) which doesn't require the IRIREF to contain a semicolon and allows it to contain unicode escape sequences.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions