diff --git a/docs/developers.rst b/docs/developers.rst index 0df461513..73161c34e 100644 --- a/docs/developers.rst +++ b/docs/developers.rst @@ -69,7 +69,7 @@ here, we will not reject a PR just because it does not. Tests ----- -Any new functionality being added to RDFLib _must_ have unit tests and +Any new functionality being added to RDFLib *must* have unit tests and should have doc tests supplied. Typically, you should add your functionality and new tests to a branch of diff --git a/docs/docs.rst b/docs/docs.rst index 4643e10df..b42da9d2e 100644 --- a/docs/docs.rst +++ b/docs/docs.rst @@ -10,14 +10,15 @@ These docs are generated with Sphinx. Sphinx makes it very easy to pull in doc-strings from modules, classes, methods, etc. When writing doc-strings, special reST fields can be used to annotate parameters, return-types, etc. This makes for -pretty API docs. More information about sphinx can be found `here `_. +pretty API docs. See `here `_ +for the Shinx documentation about these fields. Building -------- -To build you must have the ``sphinx`` and some additional package installed. - -The documentation's full set of requirements is listed in the ``sphinx-requirements.txt`` file within the :file:`docs/` directory. +To build you must have the ``sphinx`` and some additional package installed. +The full set of requirements is listed in the ``sphinx-requirements.txt`` file +within the :file:`docs/` directory. To install the requirements for building documentation run: @@ -26,15 +27,18 @@ To install the requirements for building documentation run: pip install -r docs/sphinx-requirements.txt -Once you have all the requirements installed you can run this command in the rdflib root directory: +Once you have all the requirements installed you can run this command in the +rdflib root directory: .. code-block:: bash python setup.py build_sphinx -Docs will be generated in :file:`build/sphinx/html/` and API documentation, generated from doc-strings, will be placed in :file:`docs/apidocs/`. +Docs will be generated in :file:`build/sphinx/html/` and API documentation, +generated from doc-strings, will be placed in :file:`docs/apidocs/`. -There is also a `tox `_ environment for building documentation: +There is also a `tox `_ environment for building +documentation: .. code-block:: bash diff --git a/docs/gettingstarted.rst b/docs/gettingstarted.rst index 48e13d0ff..b8541ef35 100644 --- a/docs/gettingstarted.rst +++ b/docs/gettingstarted.rst @@ -45,13 +45,13 @@ who hasn't worked with RDF before.* The primary interface that RDFLib exposes for working with RDF is a :class:`~rdflib.graph.Graph`. -RDFLib graphs are un-sorted containers; they have ordinary ``set`` +RDFLib graphs are un-sorted containers; they have ordinary Python ``set`` operations (e.g. :meth:`~rdflib.Graph.add` to add a triple) plus methods that search triples and return them in arbitrary order. RDFLib graphs also redefine certain built-in Python methods in order -to behave in a predictable way: they `emulate container types -`_ and +to behave in a predictable way. They do this by `emulating container types +`_ and are best thought of as a set of 3-item tuples ("triples", in RDF-speak): .. code-block:: text diff --git a/docs/index.rst b/docs/index.rst index b8f60db2a..9a96447df 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -8,18 +8,19 @@ RDFLib is a pure Python package for working with `RDF `_ * **Parsers & Serializers** - * for RDF/XML, N3, NTriples, N-Quads, Turtle, TriX, JSON-LD, RDFa and Microdata + * for RDF/XML, N3, NTriples, N-Quads, Turtle, TriX, JSON-LD, HexTuples, RDFa and Microdata * **Store implementations** - * for in-memory and persistent RDF storage, including remote SPARQL endpoints + * memory stores + * persistent, on-disk stores, using databases such as BerkeleyDB + * remote SPARQL endpoints * **Graph interface** * to a single graph - * or a conjunctive graph (multiple Named Graphs) - * or a dataset of graphs + * or to multiple Named Graphs within a dataset * **SPARQL 1.1 implementation** @@ -108,29 +109,3 @@ the tag ``[rdflib]``. A list of existing ``[rdflib]`` tagged questions is kept t You might also like to join rdflib's dev mailing list: ``__ The chat is available at `gitter `_ or via matrix `#RDFLib_rdflib:gitter.im `_. - - - -Glossary --------- - -Here are a few RDF and Python terms referred to in this documentation. They are linked to wherever they occur. - -.. glossary:: - - functional property - Properties than can only occur once for a resource, i.e. for any relation (triple, in RDF) ``x p y``, - if ``p`` is functional, for any individual ``x``, there can be at most one individual ``y``. - - OWL - The OWL 2 Web Ontology Language, informally OWL 2 or just OWL, is an ontology language for the Semantic Web - with formally defined meaning. OWL 2 ontologies provide classes, properties, individuals, and data values and - are stored as Semantic Web documents. OWL 2 ontologies can be used along with information written in RDF, and - OWL 2 ontologies themselves are primarily exchanged as RDF documents. See the `RDF 1.1 Concepts and Abstract - Syntax `_ for more info. - - RDF - The Resource Description Framework (RDF) is a framework for representing information in the Web. RDF data is - stored in graphs that are sets of subject-predicate-object triples, where the elements may be IRIs, blank nodes, - or datatyped literals. See the `OWL 2 Web Ontology Language - Document Overview `_ for more info. diff --git a/docs/namespaces_and_bindings.rst b/docs/namespaces_and_bindings.rst index c89b90b0f..c8a76fec7 100644 --- a/docs/namespaces_and_bindings.rst +++ b/docs/namespaces_and_bindings.rst @@ -10,37 +10,103 @@ The :mod:`rdflib.namespace` defines the :class:`rdflib.namespace.Namespace` clas from rdflib import Namespace - n = Namespace("http://example.org/") - n.Person # as attribute - # = rdflib.term.URIRef("http://example.org/Person") + EX = Namespace("http://example.org/") + EX.Person # a Python attribute for EX. This example is equivalent to rdflib.term.URIRef("http://example.org/Person") - n['first%20name'] # as item - for things that are not valid python identifiers - # = rdflib.term.URIRef("http://example.org/first%20name") + # use dict notation for things that are not valid Python identifiers, e.g.: + n['first%20name'] # as rdflib.term.URIRef("http://example.org/first%20name") -Note that if a name string is valid for use in an RDF namespace but not valid as a Python identifier, such as '1234', it must be addressed with the "item" syntax (using the "attribute" syntax will raise a Syntax Error). +These two styles of namespace creation - object attribute and dict - are equivalent and are made available just to allow for valid +RDF namespaces and URIs that are not valid Python identifiers. This isn't just for syntactic things like spaces, as per +the example of ``first%20name`` above, but also for Python reserved words like ``class`` or ``while``, so for the URI +``http://example.org/class``, create it with ``EX['class']``, not ``EX.class``. -The ``namespace`` module also defines many common namespaces such as RDF, RDFS, OWL, FOAF, SKOS, PROF, etc. +Common Namespaces +----------------- -Namespaces can also be associated with prefixes, in a :class:`rdflib.namespace.NamespaceManager`, i.e. using ``foaf`` for ``http://xmlns.com/foaf/0.1/``. Each RDFLib graph has a :attr:`~rdflib.graph.Graph.namespace_manager` that keeps a list of namespace to prefix mappings. The namespace manager is populated when reading in RDF, and these prefixes are used when serialising RDF, or when parsing SPARQL queries. Additional prefixes can be bound with the :meth:`rdflib.graph.bind` method. +The ``namespace`` module defines many common namespaces such as RDF, RDFS, OWL, FOAF, SKOS, PROF, etc. The list of the +namespaces provided grows with user contributions to RDFLib. -NamespaceManager ----------------- +These Namespaces, and any others that users define, can also be associated with prefixes using the :class:`rdflib.namespace.NamespaceManager`, e.g. using ``foaf`` for ``http://xmlns.com/foaf/0.1/``. +Each RDFLib graph has a :attr:`~rdflib.graph.Graph.namespace_manager` that keeps a list of namespace to prefix mappings. The namespace manager is populated when reading in RDF, and these prefixes are used when serialising RDF, or when parsing SPARQL queries. Prefixes can be bound with the :meth:`rdflib.graph.bind` method:: -Each graph comes with a `NamespaceManager`__ instance in the `namespace_manager` field; you can use the `bind` method of this instance to bind a prefix to a namespace URI:: + from rdflib import Graph, Namespace + from rdflib.namespace import FOAF + + EX = Namespace("http://example.org/") + + g = Graph() + g.bind("foaf", FOAF) # bind an RDFLib-provided namespace to a prefix + g.bind("ex", EX) # bind a user-declared namespace to a prefix + - myGraph.namespace_manager.bind('prefix', URIRef('scheme:my-namespace-uri:')) - myGraph.namespace_manager.bind('owl', OWL_NS, override=False) +The :meth:`rdflib.graph.bind` method is actually supplied by the :class:`rdflib.namespace.NamespaceManager` class - see next. -It has a method to normalize a given url : +NamespaceManager +---------------- - myGraph.namespace_manager.normalizeUri(t) +Each RDFLib graph comes with a :class:`rdflib.namespace.NamespaceManager` instance in the `namespace_manager` field; you can use the `bind` method of this instance to bind a prefix to a namespace URI, +as above, however note that the `NamespaceManager` automatically performs some bindings according to a selected strategy. + +Namespace binding strategies are indicated with the `bind_namespaces` input parameter to `NamespaceManager` instances +and may be set via ``Graph`` also:: + + from rdflib import Graph + from rdflib.namespace import NamespaceManager + + g = Graph(bind_namespaces="rdflib") # bind via Graph + + g2 = Graph() + nm = NamespaceManager(g2, bind_namespaces="rdflib") # bind via NamespaceManager + + +Valid strategies are: + +* core: + * binds several core RDF prefixes only + * owl, rdf, rdfs, xsd, xml from the NAMESPACE_PREFIXES_CORE object + * this is default +* rdflib: + * binds all the namespaces shipped with RDFLib as DefinedNamespace instances + * all the core namespaces and all the following: brick, csvw, dc, dcat + * dcmitype, cdterms, dcam, doap, foaf, geo, odrl, org, prof, prov, qb, sdo + * sh, skos, sosa, ssn, time, vann, void + * see the NAMESPACE_PREFIXES_RDFLIB object in :class:`rdflib.namespace` for up-to-date list +* none: + * binds no namespaces to prefixes + * note this is NOT default behaviour +* cc: + * using prefix bindings from prefix.cc which is a online prefixes database + * not implemented yet - this is aspirational + +Re-binding +^^^^^^^^^^ + +Note that regardless of the strategy employed, prefixes for namespaces can be overwritten with users preferred prefixes, +for example:: + + from rdflib import Graph + from rdflib.namespace import GEO # imports GeoSPARQL's namespace + + g = Graph(bind_namespaces="rdflib") # binds GeoSPARQL's namespace to prefix 'geo' + + g.bind('geosp', GEO, override=True) + + + +`NamespaceManager` also has a method to normalize a given url:: + + from rdflib.namespace import NamespaceManager + + nm = NamespaceManager(Graph()) + nm.normalizeUri(t) For simple output, or simple serialisation, you often want a nice -readable representation of a term. All terms have a -``.n3(namespace_manager = None)`` method, which will return a suitable -N3 format:: +readable representation of a term. All RDFLib terms have a +``.n3()`` method, which will return a suitable N3 format and into which you can supply a NamespaceManager instance +to provide prefixes, i.e. ``.n3(namespace_manager=some_nm)``:: >>> from rdflib import Graph, URIRef, Literal, BNode >>> from rdflib.namespace import FOAF, NamespaceManager @@ -59,16 +125,15 @@ N3 format:: >>> l.n3() '"2"^^' - >>> l.n3(g.namespace_manager) + >>> l.n3(NamespaceManager(Graph(), bind_namespaces="core")) '"2"^^xsd:integer' -The namespace manage also has a useful method compute_qname -g.namespace_manager.compute_qname(x) which takes an url and decomposes it into the parts:: +The namespace manage also has a useful method ``compute_qname`` +``g.namespace_manager.compute_qname(x)`` (or just ``g.compute_qname(x)``) which takes a URI and decomposes it into the parts:: self.assertEqual(g.compute_qname(URIRef("http://foo/bar#baz")), ("ns2", URIRef("http://foo/bar#"), "baz")) -__ http://rdflib.net/rdflib-2.4.0/html/public/rdflib.syntax.NamespaceManager.NamespaceManager-class.html Namespaces in SPARQL Queries @@ -78,7 +143,7 @@ The ``initNs`` argument supplied to :meth:`~rdflib.graph.Graph.query` is a dicti If you pass no ``initNs`` argument, the namespaces registered with the graphs namespace_manager are used:: from rdflib.namespace import FOAF - graph.query('SELECT * WHERE { ?p a foaf:Person }', initNs={ 'foaf': FOAF }) + graph.query('SELECT * WHERE { ?p a foaf:Person }', initNs={'foaf': FOAF}) In order to use an empty prefix (e.g. ``?a :knows ?b``), use a ``PREFIX`` directive with no prefix in the SPARQL query to set a default namespace: diff --git a/docs/rdf_terms.rst b/docs/rdf_terms.rst index cc880a02f..2185182ca 100644 --- a/docs/rdf_terms.rst +++ b/docs/rdf_terms.rst @@ -4,9 +4,17 @@ RDF terms in rdflib =================== -Terms are the kinds of objects that can appear in a quoted/asserted triples. Those that are part of core RDF concepts are: ``Blank Node``, ``URI Reference`` and ``Literal``, the latter consisting of a literal value and either a `datatype `_ or an :rfc:`3066` language tag. +Terms are the kinds of objects that can appear in a RDFLib's graph's triples. +Those that are part of core RDF concepts are: ``IRIs``, ``Blank Node`` +and ``Literal``, the latter consisting of a literal value and either a `datatype `_ +or an :rfc:`3066` language tag. -All terms in RDFLib are sub-classes of the :class:`rdflib.term.Identifier` class. A class diagram of the various terms can be seen in the :ref:`term_class_hierarchy` diagram. +.. note:: RDFLib's class for representing IRIs/URIs is called "URIRef" because, at the time it was implemented, that was what the then current RDF specification called URIs/IRIs. We preserve that class name but refer to the RDF object as "IRI". + +Class hierarchy +=============== + +All terms in RDFLib are sub-classes of the :class:`rdflib.term.Identifier` class. A class diagram of the various terms is: .. _term_class_hierarchy: .. kroki:: @@ -72,22 +80,25 @@ All terms in RDFLib are sub-classes of the :class:`rdflib.term.Identifier` class @enduml -Nodes are a subset of the Terms that the underlying store actually persists. +Nodes are a subset of the Terms that underlying stores actually persist. + The set of such Terms depends on whether or not the store is formula-aware. Stores that aren't formula-aware only persist those terms core to the RDF Model but those that are formula-aware also persist the N3 extensions. However, utility terms that only serve the purpose of matching nodes by term-patterns will probably only be terms and not nodes. +Python Classes +============== -URIRefs -======= +The three main RDF objects - *IRI*, *Blank Node* and *Literal* are represented in RDFLib by these three main Python classes: -A *URI reference* within an RDF graph is a Unicode string that does not contain any control characters ( #x00 - #x1F, #x7F-#x9F) -and would produce a valid URI character sequence representing an absolute URI with optional fragment -identifier -- `W3 RDF Concepts`__ +URIRef +------ -.. __: http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref +An IRI (Internationalized Resource Identifier) is represented within RDFLib using the URIRef class. From `the RDF 1.1 specification's IRI section `_: + +Here is the *URIRef* class' auto-built documentation: .. autoclass:: rdflib.term.URIRef :noindex: @@ -109,56 +120,97 @@ identifier -- `W3 RDF Concepts`__ '' -.. _rdflibliterals: +BNodes +------ -Literals -======== +In RDF, a blank node (also called BNode) is a node in an RDF graph representing a resource for which an IRI or literal is not given. The resource represented by a blank node is also called an anonymous resource. According to the RDF standard, a blank node can only be used as subject or object in a triple, although in some syntaxes like Notation 3 it is acceptable to use a blank node as a predicate. If a blank node has a node ID (not all blank nodes are labelled in all RDF serializations), it is limited in scope to a particular serialization of the RDF graph, i.e. the node p1 in one graph does not represent the same node as a node named p1 in any other graph -- `wikipedia`__ -Literals are attribute values in RDF, for instance, a person's name, the date of birth, height, etc. Literals can have a datatype (i.e. this is a *double*) or a language tag (this label is in *English*). -.. autoclass:: rdflib.term.Literal +.. __: http://en.wikipedia.org/wiki/Blank_node + +Here is the *BNode* class' auto-built documentation: + +.. autoclass:: rdflib.term.BNode :noindex: - A literal in an RDF graph contains one or two named components. - - All literals have a lexical form being a Unicode string, which SHOULD be in Normal Form C. +.. code-block:: python + + >>> from rdflib import BNode + >>> bn = BNode() + >>> bn + rdflib.term.BNode('AFwALAKU0') + >>> bn.n3() + '_:AFwALAKU0' - Plain literals have a lexical form and optionally a language tag as defined by :rfc:`3066`, normalized to lowercase. An exception will be raised if illegal language-tags are passed to :meth:`rdflib.term.Literal.__init__`. - Typed literals have a lexical form and a datatype URI being an RDF URI reference. +.. _rdflibliterals: + +Literals +-------- + +Literals are attribute values in RDF, for instance, a person's name, the date of birth, height, etc. +and are stored using simple data types, e.g. *string*, *double*, *dateTime* etc. This usually looks +something like this: + +.. code-block:: python -.. note:: When using the language tag, care must be taken not to confuse language with locale. The language tag relates only to human language text. Presentational issues should be addressed in end-user applications. + name = Literal("Nicholas") # the name 'Nicholas', as a string -.. note:: The case normalization of language tags is part of the description of the abstract syntax, and consequently the abstract behaviour of RDF applications. It does not constrain an RDF implementation to actually normalize the case. Crucially, the result of comparing two language tags should not be sensitive to the case of the original input. -- `RDF Concepts and Abstract Syntax`__ + age = Literal(39, datatype=XSD.integer) # the number 39, as an integer -.. __: http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref +A slightly special case is a *langString* which is a *string* with a language tag, e.g.: -BNodes -====== +.. code-block:: python + + name = Literal("Nicholas", lang="en") # the name 'Nicholas', as an English string + imie = Literal("MikoĊ‚aj", lang="pl") # the Polish version of the name 'Nicholas' -In RDF, a blank node (also called BNode) is a node in an RDF graph representing a resource for which a URI or literal is not given. The resource represented by a blank node is also called an anonymous resource. According to the RDF standard, a blank node can only be used as subject or object in a triple, although in some syntaxes like Notation 3 it is acceptable to use a blank node as a predicate. If a blank node has a node ID (not all blank nodes are labelled in all RDF serializations), it is limited in scope to a particular serialization of the RDF graph, i.e. the node p1 in the subsequent example does not represent the same node as a node named p1 in any other graph --`wikipedia`__ +Special literal types indicated by use of a custom IRI for a literal's ``datatype`` value, +for example the `GeoSPARQL RDF standard `_ +invents a custom datatype, ``geoJSONLiteral`` to indicate `GeoJSON geometry serlializations `_ +like this: -.. __: http://en.wikipedia.org/wiki/Blank_node +.. code-block:: python + + GEO = Namespace("http://www.opengis.net/ont/geosparql#") + + geojson_geometry = Literal( + '''{"type": "Point", "coordinates": [-83.38,33.95]}''', + datatype=GEO.geoJSONLiteral -.. autoclass:: rdflib.term.BNode + +Here is the ``Literal`` class' auto-built documentation, followed by notes on Literal from the `RDF 1.1 specification 'Literals' section `_. + +.. autoclass:: rdflib.term.Literal :noindex: -.. code-block:: python +A literal in an RDF graph contains one or two named components. + +All literals have a lexical form being a Unicode string, which SHOULD be in Normal Form C. + +Plain literals have a lexical form and optionally a language tag as defined by :rfc:`3066`, normalized to lowercase. An exception will be raised if illegal language-tags are passed to :meth:`rdflib.term.Literal.__init__`. + +Typed literals have a lexical form and a datatype URI being an RDF URI reference. + +.. note:: When using the language tag, care must be taken not to confuse language with locale. The language tag relates only to human language text. Presentational issues should be addressed in end-user applications. + +.. note:: The case normalization of language tags is part of the description of the abstract syntax, and consequently the abstract behaviour of RDF applications. It does not constrain an RDF implementation to actually normalize the case. Crucially, the result of comparing two language tags should not be sensitive to the case of the original input. -- `RDF Concepts and Abstract Syntax`__ - >>> from rdflib import BNode - >>> bn = BNode() - >>> bn - rdflib.term.BNode('AFwALAKU0') - >>> bn.n3() - '_:AFwALAKU0' -Python support --------------- -RDFLib Literals essentially behave like unicode characters with an XML Schema datatype or language attribute. +.. __: http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref + +Common XSD datatypes +^^^^^^^^^^^^^^^^^^^^ + +Most simple literals such as *string* or *integer* have XML Schema (XSD) datatypes defined for them, see the figure +below. Additionally, these XSD datatypes are listed in the :class:`XSD Namespace class ` that +ships with RDFLib, so many Python code editors will prompt you with autocomplete for them when using it. + +Remember, you don't *have* to use XSD datatypes and can always make up your own, as GeoSPARQL does, as described above. .. image:: /_static/datatype_hierarchy.png :alt: datatype hierarchy @@ -166,8 +218,14 @@ RDFLib Literals essentially behave like unicode characters with an XML Schema da :width: 629 :height: 717 +Python conversions +^^^^^^^^^^^^^^^^^^ + +RDFLib Literals essentially behave like unicode characters with an XML Schema datatype or language attribute. -The class provides a mechanism to both convert Python literals (and their built-ins such as time/date/datetime) into equivalent RDF Literals and (conversely) convert Literals to their Python equivalent. This mapping to and from Python literals is done as follows: +The class provides a mechanism to both convert Python literals (and their built-ins such as time/date/datetime) +into equivalent RDF Literals and (conversely) convert Literals to their Python equivalent. This mapping to and +from Python literals is done as follows: ====================== =========== XML Datatype Python type @@ -221,7 +279,8 @@ and the other direction with .. autofunction:: rdflib.term._castLexicalToPython -All this happens automatically when creating ``Literal`` objects by passing Python objects to the constructor, and you never have to do this manually. +All this happens automatically when creating ``Literal`` objects by passing Python objects to the constructor, +and you never have to do this manually. You can add custom data-types with :func:`rdflib.term.bind`, see also :mod:`examples.custom_datatype` diff --git a/rdflib/graph.py b/rdflib/graph.py index 522bf46ef..88099f161 100644 --- a/rdflib/graph.py +++ b/rdflib/graph.py @@ -330,6 +330,7 @@ def __init__( identifier: Optional[Union[IdentifiedNode, str]] = None, namespace_manager: Optional[NamespaceManager] = None, base: Optional[str] = None, + bind_namespaces: str = "core", ): super(Graph, self).__init__() self.base = base @@ -344,6 +345,7 @@ def __init__( else: self.__store = store self.__namespace_manager = namespace_manager + self.bind_namespaces = bind_namespaces self.context_aware = False self.formula_aware = False self.default_union = False @@ -362,7 +364,7 @@ def namespace_manager(self): this graph's namespace-manager """ if self.__namespace_manager is None: - self.__namespace_manager = NamespaceManager(self) + self.__namespace_manager = NamespaceManager(self, self.bind_namespaces) return self.__namespace_manager @namespace_manager.setter @@ -1787,6 +1789,10 @@ def contexts(self, triple=None): else: yield self.get_context(context) + def get_graph(self, identifier: Union[URIRef, BNode]) -> Union[Graph, None]: + """Returns the graph identified by given identifier""" + return [x for x in self.contexts() if x.identifier == identifier][0] + def get_context( self, identifier: Optional[Union[Node, str]], diff --git a/rdflib/namespace/__init__.py b/rdflib/namespace/__init__.py index 10c426d85..9426ff53d 100644 --- a/rdflib/namespace/__init__.py +++ b/rdflib/namespace/__init__.py @@ -85,7 +85,14 @@ rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#seeAlso') """ -__all__ = ["is_ncname", "split_uri", "Namespace", "ClosedNamespace", "NamespaceManager"] +__all__ = [ + "is_ncname", + "split_uri", + "Namespace", + "ClosedNamespace", + "DefinedNamespace", + "NamespaceManager", +] logger = logging.getLogger(__name__) @@ -313,29 +320,40 @@ def _ipython_key_completions_(self) -> List[str]: class NamespaceManager(object): - """ - - Class for managing prefix => namespace mappings - - Sample usage from FuXi ... - - .. code-block:: python - - ruleStore = N3RuleStore(additionalBuiltins=additionalBuiltins) - nsMgr = NamespaceManager(Graph(ruleStore)) - ruleGraph = Graph(ruleStore,namespace_manager=nsMgr) - - - and ... + """Class for managing prefix => namespace mappings + + This class requires an RDFlib Graph as an input parameter and may optionally have + the parameter bind_namespaces set. This second parameter selects a strategy which + is one of the following: + + * core: + * binds several core RDF prefixes only + * owl, rdf, rdfs, xsd, xml from the NAMESPACE_PREFIXES_CORE object + * this is default + * rdflib: + * binds all the namespaces shipped with RDFLib as DefinedNamespace instances + * all the core namespaces and all the following: brick, csvw, dc, dcat + * dcmitype, cdterms, dcam, doap, foaf, geo, odrl, org, prof, prov, qb, sdo + * sh, skos, sosa, ssn, time, vann, void + * see the NAMESPACE_PREFIXES_RDFLIB object for the up-to-date list + * none: + * binds no namespaces to prefixes + * note this is NOT default behaviour + * cc: + * using prefix bindings from prefix.cc which is a online prefixes database + * not implemented yet - this is aspirational + + See the + Sample usage .. code-block:: pycon >>> import rdflib >>> from rdflib import Graph >>> from rdflib.namespace import Namespace, NamespaceManager - >>> exNs = Namespace('http://example.com/') + >>> EX = Namespace('http://example.com/') >>> namespace_manager = NamespaceManager(Graph()) - >>> namespace_manager.bind('ex', exNs, override=False) + >>> namespace_manager.bind('ex', EX, override=False) >>> g = Graph() >>> g.namespace_manager = namespace_manager >>> all_ns = [n for n in g.namespace_manager.namespaces()] @@ -344,48 +362,40 @@ class NamespaceManager(object): """ - def __init__(self, graph: "Graph"): + def __init__(self, graph: "Graph", bind_namespaces: str = "core"): self.graph = graph self.__cache: Dict[str, Tuple[str, URIRef, str]] = {} self.__cache_strict: Dict[str, Tuple[str, URIRef, str]] = {} self.__log = None self.__strie: Dict[str, Any] = {} self.__trie: Dict[str, Any] = {} - for p, n in self.namespaces(): # self.bind is not always called - insert_trie(self.__trie, str(n)) - - # DefinedNamespace bindings. - self.bind("brick", BRICK) - self.bind("csvw", CSVW) - self.bind("dc", DC) - self.bind("dcat", DCAT) - self.bind("dcmitype", DCMITYPE) - self.bind("dcterms", DCTERMS) - self.bind("dcam", DCAM) - self.bind("doap", DOAP) - self.bind("foaf", FOAF) - self.bind("odrl", ODRL2) - self.bind("geo", GEO) - self.bind("org", ORG) - self.bind("owl", OWL) - self.bind("prof", PROF) - self.bind("prov", PROV) - self.bind("qb", QB) - self.bind("rdf", RDF) - self.bind("rdfs", RDFS) - self.bind("schema", SDO) - self.bind("sh", SH) - self.bind("skos", SKOS) - self.bind("sosa", SOSA) - self.bind("ssn", SSN) - self.bind("time", TIME) - self.bind("vann", VANN) - self.bind("void", VOID) - self.bind("wgs", WGS) - self.bind("xsd", XSD) - - # Namespace bindings. - self.bind("xml", XMLNS) + # This type declaration is here becuase there is no common base class + # for all namespaces and without it the inferred type of ns is not + # compatible with all prefixes. + ns: Any + # bind Namespaces as per options. + # default is core + if bind_namespaces == "none": + # binds no namespaces to prefixes + # note this is NOT default + pass + elif bind_namespaces == "rdflib": + # bind all the Namespaces shipped with RDFLib + for prefix, ns in NAMESPACE_PREFIXES_RDFLIB.items(): + self.bind(prefix, ns) + # ... don't forget the core ones too + for prefix, ns in NAMESPACE_PREFIXES_CORE.items(): + self.bind(prefix, ns) + elif bind_namespaces == "cc": + # bind any prefix that can be found with lookups to prefix.cc + # first bind core and rdflib ones + # work out remainder - namespaces without prefixes + # only look those ones up + raise NotImplementedError("Haven't got to this option yet") + else: # bind_namespaces == "core": + # bind a few core RDF namespaces - default + for prefix, ns in NAMESPACE_PREFIXES_CORE.items(): + self.bind(prefix, ns) def __contains__(self, ref: str) -> bool: # checks if a reference is in any of the managed namespaces with syntax @@ -446,6 +456,7 @@ def normalizeUri(self, rdfTerm: str) -> str: def compute_qname(self, uri: str, generate: bool = True) -> Tuple[str, URIRef, str]: + prefix: Optional[str] if uri not in self.__cache: if not _is_valid_uri(uri): @@ -556,12 +567,12 @@ def bind( override: bool = True, replace: bool = False, ) -> None: - """bind a given namespace to the prefix + """Bind a given namespace to the prefix - if override, rebind, even if the given namespace is already + If override, rebind, even if the given namespace is already bound to another prefix. - if replace, replace any existing prefix with the new namespace + If replace, replace any existing prefix with the new namespace """ @@ -573,6 +584,7 @@ def bind( raise KeyError("Prefixes may not contain spaces.") bound_namespace = self.store.namespace(prefix) + # Check if the bound_namespace contains a URI # and if so convert it into a URIRef for comparison # This is to prevent duplicate namespaces with the @@ -582,16 +594,16 @@ def bind( if bound_namespace and bound_namespace != namespace: if replace: - self.store.bind(prefix, namespace) + self.store.bind(prefix, namespace, override=override) insert_trie(self.__trie, str(namespace)) return - # prefix already in use for different namespace # # append number to end of prefix until we find one # that's not in use. if not prefix: prefix = "default" + num = 1 while 1: new_prefix = "%s%s" % (prefix, num) @@ -603,16 +615,17 @@ def bind( if not self.store.namespace(new_prefix): break num += 1 - self.store.bind(new_prefix, namespace) + self.store.bind(new_prefix, namespace, override=override) else: bound_prefix = self.store.prefix(namespace) if bound_prefix is None: - self.store.bind(prefix, namespace) + self.store.bind(prefix, namespace, override=override) elif bound_prefix == prefix: pass # already bound else: if override or bound_prefix.startswith("_"): # or a generated prefix - self.store.bind(prefix, namespace) + self.store.bind(prefix, namespace, override=override) + insert_trie(self.__trie, str(namespace)) def namespaces(self) -> Iterable[Tuple[str, URIRef]]: @@ -789,3 +802,41 @@ def get_longest_namespace(trie: Dict[str, Any], value: str) -> Optional[str]: from rdflib.namespace._VOID import VOID from rdflib.namespace._WGS import WGS from rdflib.namespace._XSD import XSD + +# prefixes for the core Namespaces shipped with RDFLib +NAMESPACE_PREFIXES_CORE = { + "owl": OWL, + "rdf": RDF, + "rdfs": RDFS, + "xsd": XSD, + # Namespace binding for XML - needed for RDF/XML + "xml": XMLNS, +} + + +# prefixes for all the non-core Namespaces shipped with RDFLib +NAMESPACE_PREFIXES_RDFLIB = { + "brick": BRICK, + "csvw": CSVW, + "dc": DC, + "dcat": DCAT, + "dcmitype": DCMITYPE, + "cdterms": DCTERMS, + "dcam": DCAM, + "doap": DOAP, + "foaf": FOAF, + "geo": GEO, + "odrl": ODRL2, + "org": ORG, + "prof": PROF, + "prov": PROV, + "qb": QB, + "sdo": SDO, + "sh": SH, + "skos": SKOS, + "sosa": SOSA, + "ssn": SSN, + "time": TIME, + "vann": VANN, + "void": VOID, +} diff --git a/rdflib/plugins/stores/auditable.py b/rdflib/plugins/stores/auditable.py index 6de79ccd3..8fc048e47 100644 --- a/rdflib/plugins/stores/auditable.py +++ b/rdflib/plugins/stores/auditable.py @@ -130,8 +130,8 @@ def contexts(self, triple=None): for ctx in self.store.contexts(triple): yield ctx - def bind(self, prefix, namespace): - self.store.bind(prefix, namespace) + def bind(self, prefix, namespace, override=True): + self.store.bind(prefix, namespace, override=override) def prefix(self, namespace): return self.store.prefix(namespace) diff --git a/rdflib/plugins/stores/berkeleydb.py b/rdflib/plugins/stores/berkeleydb.py index a19a21f09..331157adc 100644 --- a/rdflib/plugins/stores/berkeleydb.py +++ b/rdflib/plugins/stores/berkeleydb.py @@ -467,11 +467,11 @@ def __len__(self, context=None): cursor.close() return count - def bind(self, prefix, namespace): + def bind(self, prefix, namespace, override=True): prefix = prefix.encode("utf-8") namespace = namespace.encode("utf-8") bound_prefix = self.__prefix.get(namespace) - if bound_prefix: + if override and bound_prefix: self.__namespace.delete(bound_prefix) self.__prefix[namespace] = prefix self.__namespace[prefix] = namespace diff --git a/rdflib/plugins/stores/memory.py b/rdflib/plugins/stores/memory.py index 5a137b5ac..edb255778 100644 --- a/rdflib/plugins/stores/memory.py +++ b/rdflib/plugins/stores/memory.py @@ -148,7 +148,10 @@ def __len__(self, context=None): i += 1 return i - def bind(self, prefix, namespace): + def bind(self, prefix, namespace, override=True): + bound_prefix = self.__prefix.get(namespace) + if override and bound_prefix: + del self.__namespace[bound_prefix] self.__prefix[namespace] = prefix self.__namespace[prefix] = namespace @@ -399,7 +402,10 @@ def triples(self, triple_pattern, context=None): if self.__triple_has_context(triple, req_ctx): yield triple, self.__contexts(triple) - def bind(self, prefix, namespace): + def bind(self, prefix, namespace, override=True): + bound_prefix = self.__prefix.get(namespace) + if override and bound_prefix: + del self.__namespace[bound_prefix] self.__prefix[namespace] = prefix self.__namespace[prefix] = namespace diff --git a/rdflib/plugins/stores/regexmatching.py b/rdflib/plugins/stores/regexmatching.py index f831d2476..d1920620d 100644 --- a/rdflib/plugins/stores/regexmatching.py +++ b/rdflib/plugins/stores/regexmatching.py @@ -155,8 +155,8 @@ def contexts(self, triple=None): def remove_context(self, identifier): self.storage.remove((None, None, None), identifier) - def bind(self, prefix, namespace): - self.storage.bind(prefix, namespace) + def bind(self, prefix, namespace, override=True): + self.storage.bind(prefix, namespace, override=override) def prefix(self, namespace): return self.storage.prefix(namespace) diff --git a/rdflib/plugins/stores/sparqlstore.py b/rdflib/plugins/stores/sparqlstore.py index 241511cfa..d209cd3dc 100644 --- a/rdflib/plugins/stores/sparqlstore.py +++ b/rdflib/plugins/stores/sparqlstore.py @@ -365,7 +365,10 @@ def contexts(self, triple=None): return (row.name for row in result) # Namespace persistence interface implementation - def bind(self, prefix, namespace): + def bind(self, prefix, namespace, override=True): + bound_prefix = self.prefix(namespace) + if override and bound_prefix: + del self.nsBindings[bound_prefix] self.nsBindings[prefix] = namespace def prefix(self, namespace): diff --git a/rdflib/store.py b/rdflib/store.py index 9a9c17f66..f890bf5e8 100644 --- a/rdflib/store.py +++ b/rdflib/store.py @@ -6,7 +6,7 @@ if TYPE_CHECKING: from rdflib.graph import Graph - from rdflib.term import IdentifiedNode, Node + from rdflib.term import IdentifiedNode, Node, URIRef """ ============ @@ -364,13 +364,15 @@ def update(self, update, initNs, initBindings, queryGraph, **kwargs): # Optional Namespace methods - def bind(self, prefix, namespace): - """ """ + def bind(self, prefix: str, namespace: "URIRef", override: bool = True) -> None: + """ + :param override: rebind, even if the given namespace is already bound to another prefix. + """ - def prefix(self, namespace): - """ """ + def prefix(self, namespace: "URIRef") -> Optional["str"]: + """""" - def namespace(self, prefix): + def namespace(self, prefix: str) -> Optional["URIRef"]: """ """ def namespaces(self): diff --git a/rdflib/term.py b/rdflib/term.py index 7b57585d5..e6a7d4168 100644 --- a/rdflib/term.py +++ b/rdflib/term.py @@ -244,7 +244,15 @@ def toPython(self) -> str: class URIRef(IdentifiedNode): """ - RDF URI Reference: http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref + RDF 1.1's IRI Section https://www.w3.org/TR/rdf11-concepts/#section-IRIs + + .. note:: Documentation on RDF outside of RDFLib uses the term IRI or URI whereas this class is called URIRef. This is because it was made when the first version of the RDF specification was current, and it used the term *URIRef*, see `RDF 1.0 URIRef `_ + + An IRI (Internationalized Resource Identifier) within an RDF graph is a Unicode string that conforms to the syntax defined in RFC 3987. + + IRIs in the RDF abstract syntax MUST be absolute, and MAY contain a fragment identifier. + + IRIs are a generalization of URIs [RFC3986] that permits a wider range of Unicode characters. """ __slots__ = () @@ -408,8 +416,21 @@ def _generator(): class BNode(IdentifiedNode): """ - Blank Node: http://www.w3.org/TR/rdf-concepts/#section-blank-nodes + RDF 1.1's Blank Nodes Section: https://www.w3.org/TR/rdf11-concepts/#section-blank-nodes + + Blank Nodes are local identifiers for unnamed nodes in RDF graphs that are used in + some concrete RDF syntaxes or RDF store implementations. They are always locally + scoped to the file or RDF store, and are not persistent or portable identifiers for + blank nodes. The identifiers for Blank Nodes are not part of the RDF abstract + syntax, but are entirely dependent on particular concrete syntax or implementation + (such as Turtle, JSON-LD). + --- + + RDFLib's ``BNode`` class makes unique IDs for all the Blank Nodes in a Graph but you + should *never* expect, or reply on, BNodes' IDs to match across graphs, or even for + multiple copies of the same graph, if they are regenerated from some non-RDFLib + source, such as loading from RDF data. """ __slots__ = () @@ -470,12 +491,20 @@ def skolemize( class Literal(Identifier): __doc__ = """ - RDF Literal: http://www.w3.org/TR/rdf-concepts/#section-Graph-Literal - The lexical value of the literal is the unicode object. - The interpreted, datatyped value is available from .value + RDF 1.1's Literals Section: http://www.w3.org/TR/rdf-concepts/#section-Graph-Literal + + Literals are used for values such as strings, numbers, and dates. + + A literal in an RDF graph consists of two or three elements: + + * a lexical form, being a Unicode string, which SHOULD be in Normal Form C + * a datatype IRI, being an IRI identifying a datatype that determines how the lexical form maps to a literal value, and + * if and only if the datatype IRI is ``http://www.w3.org/1999/02/22-rdf-syntax-ns#langString``, a non-empty language tag. The language tag MUST be well-formed according to section 2.2.9 of `Tags for identifying languages `_. + + A literal is a language-tagged string if the third element is present. Lexical representations of language tags MAY be converted to lower case. The value space of language tags is always in lower case. - Language tags must be valid according to :rfc:5646 + --- For valid XSD datatypes, the lexical form is optionally normalized at construction time. Default behaviour is set by rdflib.NORMALIZE_LITERALS @@ -1928,7 +1957,7 @@ def bind( class Variable(Identifier): """ A Variable - this is used for querying, or in Formula aware - graphs, where Variables can stored in the graph + graphs, where Variables can be stored """ __slots__ = () diff --git a/test/test_namespacemanager.py b/test/test_namespacemanager.py new file mode 100644 index 000000000..d5dc78e33 --- /dev/null +++ b/test/test_namespacemanager.py @@ -0,0 +1,80 @@ +import sys +from pathlib import Path + +from rdflib.term import URIRef + +sys.path.append(str(Path(__file__).parent.parent.absolute())) +from rdflib import Graph +from rdflib.namespace import ( + NAMESPACE_PREFIXES_CORE, + NAMESPACE_PREFIXES_RDFLIB, + OWL, + RDFS, +) + + +def test_core_prefixes_bound(): + # we should have RDF, RDFS, OWL, XSD & XML bound + g = Graph() + + # prefixes in Graph + assert len(list(g.namespaces())) == len(NAMESPACE_PREFIXES_CORE) + pre = sorted([x[0] for x in list(g.namespaces())]) + assert pre == ["owl", "rdf", "rdfs", "xml", "xsd"] + + +def test_rdflib_prefixes_bound(): + g = Graph(bind_namespaces="rdflib") + + # the core 5 + the extra 23 namespaces with prefixes + assert len(list(g.namespaces())) == len(NAMESPACE_PREFIXES_CORE) + len( + list(NAMESPACE_PREFIXES_RDFLIB) + ) + + +def test_cc_prefixes_bound(): + pass + + +def test_rebinding(): + g = Graph() # 'core' bind_namespaces (default) + print() + # 'owl' should be bound + assert "owl" in [x for x, y in list(g.namespaces())] + assert "rdfs" in [x for x, y in list(g.namespaces())] + + # replace 'owl' with 'sowa' + # 'sowa' should be bound + # 'owl' should not be bound + g.bind("sowa", OWL, override=True) + + assert "sowa" in [x for x, y in list(g.namespaces())] + assert "owl" not in [x for x, y in list(g.namespaces())] + + # try bind srda with override set to False + g.bind("srda", RDFS, override=False) + + # binding should fail because RDFS is already bound to rdfs prefix + assert "srda" not in [x for x, y in list(g.namespaces())] + assert "rdfs" in [x for x, y in list(g.namespaces())] + + +def test_replace(): + g = Graph() # 'core' bind_namespaces (default) + + assert ("rdfs", URIRef(RDFS)) in list(g.namespaces()) + + g.bind("rdfs", "http://example.com", replace=False) + + assert ("rdfs", URIRef("http://example.com")) not in list( + g.namespace_manager.namespaces() + ) + assert ("rdfs1", URIRef("http://example.com")) in list( + g.namespace_manager.namespaces() + ) + + g.bind("rdfs", "http://example.com", replace=True) + + assert ("rdfs", URIRef("http://example.com")) in list( + g.namespace_manager.namespaces() + )