Grizzly is a Pandas based package working with RDF graphs.
Please check the release notes for detailed changes by version.
Grizzly is a ready to install Python package. You need to have Python 3 installed (not tested with Python 2) and the packages listed in the requirements file.
pip install -r requirements.txtYou can install Grizzly directly from this repository. On the command line type
pip install git+https://github.com/elsevierlabs-os/grizzly.gitTo upgrade Grizzly to a new version run:
pip install --upgrade grizzlyClone the repository to a convenient location for you.
git clone https://github.com/elsevierlabs-os/grizzly.gitThen install Grizzly without transferring it to the Python site-packages directory. This way any change to the source files is immediately reflected when you use the package.
cd grizzly
pip install -e .Load triples from a CSV file:
import grizzly as gz
graph = gz.read_csv('filename.csv')Load a graph from a remote graph store (replace the text between asterisks):
creds = {'base_url': 'http://localhost:8000',
'name': 'example',
'username': **username**,
'password': **password**}
store = gz.Repository(**creds)
graph = store.query('select * {?s ?p ?o .} limit 100')Run a SPAQRL query:
graph.query('''
select ?sub
where {
?sub ?pre ?obj .
}
''')Run a query using the GraphFrames DSL:
graph.find('(a)-[b]->(c);(c)-[d]->(e)').filter('b = skos:broader')You can run update queries directly. This returns a new graph with the updates:
updated_graph = graph.query('''
delete {
?parent skos:narrower ?child.
}
insert {
?parent skos:broader ?child .
}
where {
?parent skos:narrower ?child.
}
''')It is also possible to delete or insert triples that had previously been generated through a CONSTRUCT query:
new_triples = graph.query('''
construct {
?parent skos:broader ?child .
}
where {
?parent skos:narrower ?child.
}
''')
updated_graph = graph.insert(new_triples)Serialize the graph to plain triples that can be uploaded to a graphstore:
triples = graph.to_triples()-
read_turtle (string, custom_prefixes={}):
Reads a file or string in ttl format and returns a
Graph. You can supplycustom_prefixesthat should be taken into account. -
read_csv *(string, args, custom_prefixes={}, keep_default_na=False, **kwargs):
Reads a file or string in CSV format and returns a
Graph. You can supplycustom_prefixesthat should be taken into account. -
read_json (string, custom_prefixes={}):
Reads a file or graph store response in the JSON format. If it is RDF/JSON it returns a typed Graph, if it has the MIME type 'application/sparql-results+json' it return a typed DataFrame. You can supply
custom_prefixesthat should be taken into account. -
read_ntriples (string, custom_prefixes={}):
Reads a file or string of N-triples and returns a
Graph. This might be slow as it uses the query parser at the moment.You can supplycustom_prefixesthat should be taken into account. -
read_remote (query, endpointURL, repository, custom_prefixes={}, username='', password=''):
Queries a SPARQL endpoint at
endpointURLwithquery. The preferred way at the moment is to instead call a query on aRepositoryinstance.
-
parse (query, rule_name='query', parser='SPARQL'):
Parses a string
querywith the parser indicated inparser. By default assumes to start with the start rulequerybut other rules can be as argument torule_name. -
term *(query, args, ast=False, **kwargs):
No description yet
-
variable *(query, args, ast=False, **kwargs):
No description yet
-
resource *(query, args, ast=False, **kwargs):
No description yet
-
literal *(query, args, ast=False, **kwargs):
No description yet
-
expression *(query, args, ast=False, **kwargs):
No description yet
-
call *(query, args, ast=False, **kwargs):
No description yet
-
constrain (constraints, parts):
No description yet
-
filter (expression):
Filter values according to expression following the GraphFrames pattern.
-
copy_metadata (source):
No description yet
-
decode (value=None):
No description yet
-
encode (value=None):
No description yet
-
encode_value (value):
No description yet
-
to_pandas ():
No description yet
-
to_koalas ():
No description yet
-
to_csv *(args, index=False, decode=True, **kwargs):
No description yet
This class represents an RDF graph in memory.
-
find (query_string):
Query the graph with the motif patterns used in GraphFrame, like e.g.
graph.find('(sub)-[pre]->(obj)'). Not fully implemented at the moment. -
query (query_string):
Query the graph with a SPARQL query. Currently supported are
CONSTRUCT,SELECT,DELETEandINSERTqueries. Returns aGraphinstance or aTableinstance depending on the type of the query. -
where *(query, args, ast=False, **kwargs):
No description yet
-
triple *(query, args, ast=False, **kwargs):
No description yet
-
delete (triples, name=''):
Deletes
triplesfrom the graph.triplesmust be aDataFrameor of a subclassed class with three columns. -
insert (triples, name=''):
Adds
triplesto the graph.triplesmust be aDataFrameor of a subclassed class with three columns. -
get_graph (name=''):
No description yet
-
get_encoded_triples (triples):
No description yet
-
to_table ():
No description yet
-
to_triples ():
No description yet
-
to_turtle (filename):
No description yet
-
to_json (filename=None):
Serializes the graph to RDF/JSON. Returns a string if no
filenameis given, otherwise save the result as file. -
to_property_graph (filename=None, property_names={}, directed=False, multigraph=False, graph={}):
No description yet
-
to_remote (endpointURL, repository, graph, username=None, password=None):
No description yet
-
select *(query, args, ast=False, **kwargs):
No description yet
-
construct *(query, args, ast=False, **kwargs):
No description yet
-
modify_solution (query):
No description yet
-
groupby *(args, **kwargs):
No description yet
-
sort_columns (col_a, col_b):
No description yet
-
to_graph ():
No description yet
This class represents a graph store repository as an abstraction over a REST API. The API is designed to mirror the API of the local Graph class. All methods take instances of Graph as arguments and return instances of Graph or Table. Initialize with the base_url (e.g. http://localhost:7200). In the case of an RDF4J endpoint either select the repository by providing it as repository_name or included it in the base_url (e.g. http://localhost:7200/repositories/example). Additionally custom_prefixes can be supplied which will be used to generate prefix defintions in queries. For authentication you can provide username and password.
-
query (query):
Query the repository with a SPARQL query. Automatically adds known prefix defintions to the query. Currently supported are
CONSTRUCT,SELECT,DELETEandINSERTqueries. Returns aGraphinstance or aTableinstance depending on the type of the query. -
insert (graph, name):
Inserts
graphinto the repository.graphshould be aGraphinstance. The graph will be inserted intonamewhich can either be a named graph or'default'. ReturnsTrueif succesful. -
get_graph (name):
Gets an entire graph from the repository and returns it as
Graphinstance. The graphnamecan either be a named graph or'default'. -
list_graphs ():
Returns a
Tablecontaining all named graphs in the repository. -
size ():
Returns the number of triples in the repository.
