Browse Source

Parses rdf vocabularies, and processes them as a new media type

This plugin offers a flexible mechanism to write lightweight documentation
about rdf vocabularies, based on sparql queries.
Nicolas Seydoux 6 years ago
parent
commit
00d52f692b

+ 2 - 0
Readme.rst

@@ -192,6 +192,8 @@ Pelican Page Order        Adds a ``page_order`` attribute to all pages if one is
 
 Pelican Themes Generator  Generates theme screenshots from the Pelican Themes repository
 
+pelican-rdf                Allows the processing of .rdf vocabularies, and the generation of a lightweight documentation.
+
 pelican-toc               Generates a Table of Contents and make it available to the theme via article.toc
 
 Pelican Vimeo             Enables you to embed Vimeo videos in your pages and articles

+ 107 - 0
pelican-rdf/ReadMe.md

@@ -0,0 +1,107 @@
+pelican-rdf plugin
+==================
+
+A plugin for rdf vocabularies providers
+---------------------------------------
+
+# Overview
+ 
+This plugin is intended at easing the lightwheight description of vocabularies online, in the fashion of http://vocab.linkeddata.es/. It offers a new media type, the Vocabulary, and a flexible mechanism to gather metadata about said vocabulary based on sparql queries.
+
+# How it works
+
+## Required configuration
+
+### Description of the variables
+Your pelicanconf.py should include new options : 
+- **VOC_PATHS**: A list of paths to a local folders containing vocabularies. If all your vocabularies are remote, set its value to an empty list.
+- **VOC_EXCLUDES**: A list of paths to folders where you don't want vocabularies to be processed. 
+- **VOC_URIS** = A list of URLs pointing to dereferencable vocabularies. Content is negociated to retrieve RDF/XML.
+- **VOC_QUERIES_PATH** = Path th the folder containing the sparql queries to collect metadata about the vocabulary.
+- **VOCABULARY_URL**= How the generated document URL should look like
+- **VOCABULARY_SAVE_AS**= How the generated document should be named
+
+### Default configuration
+```
+VOC_PATHS=['ontologies']
+VOC_EXCLUDES=[]
+VOC_URIS = ["https://www.irit.fr/recherches/MELODI/ontologies/IoT-O",]
+VOC_QUERIES_PATH = "plugins/pelican-rdf/sparql-queries"
+VOCABULARY_URL= '{slug}.html'
+VOCABULARY_SAVE_AS= '{slug}.html'
+```
+
+## Accessing the vocabulary metadata
+
+### First, a simple example...
+The following snippet of code outputs a description of the vocabularies that have been processed : 
+```
+<h1 class="page-header">
+    Ontology repository
+</h1>
+{% if vocabularies %}
+    <table class="table table-striped">
+        <thead>
+          <tr>
+            <th>Title</th>
+            <th>Description</th>
+            <th>License (if any)</th>
+          </tr>
+        </thead>
+        <tbody>
+            {% for voc in vocabularies %}
+                <tr>
+                    <td><a href="{{ voc.iri }}">{{ voc.title }}</a></td>
+                    <td>
+                        <button class="btn btn-primary" type="button" data-toggle="collapse" data-target="#{{ voc.title }}description" aria-expanded="false" aria-controls="{{ voc.title }}description">
+                            {{ voc.title }} description
+                        </button>
+                        <div class="collapse" id="{{ voc.title }}description">
+                            <div class="card card-block">
+                                {{ voc.description }}
+                          </div>
+                        </div>
+                    </td>
+                    <td>{{ voc.lov_metadata.license }}</td>
+                <tr>
+            {% endfor %}
+        </tbody>
+      </table>
+{% endif %}
+```
+
+### ... but how does it work ?
+The required properties (iri, title, version and description) are directly available in the vocabulary metadata. Notice that the license is accessed through a compound notation, explained in the next paragraph.
+
+### Another example...
+The following snippet outputs a list of the classes defined by the ontology, as well as its superclass (limited to one for the time being) and potential description (the comment).
+```
+{% for class in voc.classes %}
+    <div>
+        <h2>{{ class.class }}</h2>
+        <h3>{{ class.superclass }}</h3>
+        <p>{{ class.comment }}</p>
+    </div>
+{% endfor %}
+```
+
+### ... with custom metadata
+To understand the example, one must look at the classes.sparql (the name is important) query : 
+```
+PREFIX owl: <http://www.w3.org/2002/07/owl#>
+PREFIX dc:  <http://purl.org/dc/elements/1.1/>
+PREFIX cc:  <http://creativecommons.org/ns#>
+PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
+
+SELECT ?class ?comment ?label ?superclass
+WHERE {
+    ?class rdf:type owl:Class.
+    OPTIONAL { ?class rdfs:comment ?comment.}
+    OPTIONAL { ?label rdfs:label ?label.}
+    OPTIONAL { ?class rdfs:subClassOf ?superclass.}
+} GROUP BY ?class
+```
+
+Each graph binding matching this sparql query is returned as a dictionnary in the vocabulary context, with the sparql projection attributes (here, class, comment, label and superclass) as keys. Then, this list of results is stored in the vocabulary metadata under the name of the query, here "classes". 
+
+**NOTE**: The management of multiple values such as the multiple superclasses for a class is not yet handled correctly, it is a work in progress.

+ 1 - 0
pelican-rdf/__init__.py

@@ -0,0 +1 @@
+from .pelican_rdf import *

+ 175 - 0
pelican-rdf/pelican_rdf.py

@@ -0,0 +1,175 @@
+from pelican.readers import BaseReader
+from pelican.generators import CachingGenerator
+from pelican.contents import Page, is_valid_content
+from pelican import signals
+import logging
+from blinker import signal
+import requests
+from os import listdir
+from os.path import isfile, join
+
+"""
+pelican-rdf
+===============
+
+This plugin integrates to pelican a new type of media, the vocabulary.
+Vocabularies are .rdf or .owl files, and metadata about them is collected
+through sparql queries.
+"""
+
+
+try:
+    import rdflib
+    from rdflib.query import Processor
+    rdflib_loaded=True
+except ImportError:
+    rdflib_loaded=False
+
+logger = logging.getLogger(__name__)
+
+voc_generator_init = signal('voc_generator_init')
+voc_generator_finalized = signal('voc_generator_finalized')
+voc_writer_finalized = signal('voc_writer_finalized')
+voc_generator_preread = signal('voc_generator_preread')
+voc_generator_context = signal('voc_generator_context')
+
+class VocabularyGenerator(CachingGenerator):
+    """Generate vocabulary descriptions"""
+
+    # temporary file where the vocabulary is dereferenced to
+    # when collected online
+    _local_vocabulary_path = "/tmp/"
+    
+    def __init__(self, *args, **kwargs):
+        logger.debug("Vocabulary generator called")
+        self.vocabularies =[]
+        super(VocabularyGenerator, self).__init__(*args, **kwargs)
+    
+    # Called both for local and remote vocabulary context creation.
+    # Performs the actual Vocabulary generation.
+    def generate_vocabulary_context(
+            self, vocabulary_file_name, path_to_vocabulary):
+        logger.debug("Generating__ vocabulary context for "+
+            path_to_vocabulary+"/"+vocabulary_file_name)
+        voc = self.get_cached_data(vocabulary_file_name, None)
+        if voc is None:
+            try:
+                voc = self.readers.read_file(
+                    base_path=path_to_vocabulary,
+                    path=vocabulary_file_name,
+                    content_class=Vocabulary,
+                    context=self.context,
+                    preread_signal=voc_generator_preread,
+                    preread_sender=self,
+                    context_signal=voc_generator_context,
+                    context_sender=self)
+            except Exception as e:
+                logger.error(
+                    'Could not process %s\n%s', vocabulary_file_name, e,
+                    exc_info=self.settings.get('DEBUG', False))
+                self._add_failed_source_path(vocabulary_file_name)
+            
+            if not is_valid_content(voc, vocabulary_file_name):
+                self._add_failed_source_path(vocabulary_file_name)
+    
+            self.cache_data(vocabulary_file_name, voc)
+        self.vocabularies.append(voc)
+        self.add_source_path(voc)
+    
+    
+    def generate_local_context(self):
+        for f in self.get_files(
+                self.settings['VOC_PATHS'],
+                exclude=self.settings['VOC_EXCLUDES']):
+            self.generate_vocabulary_context(f, self.path)
+    
+    def dereference(self, uri, local_file):
+        logger.debug("Dereferencing "+uri+" into "+local_file)
+        headers={"Accept":"application/rdf+xml"}
+        r = requests.get(uri, headers=headers)
+        with open(self._local_vocabulary_path+local_file, 'w') as f:
+            f.write(r.text)
+    
+    def generate_remote_context(self):
+        for uri in self.settings["VOC_URIS"]:
+            logger.debug("Generating context for remote "+uri)
+            local_name = uri.split("/")[-1]+".rdf"
+            self.dereference(uri, local_name)
+            self.generate_vocabulary_context(
+                local_name,
+                self._local_vocabulary_path)
+    
+    def generate_context(self):
+        self.generate_local_context()
+        self.generate_remote_context()
+        self._update_context(('vocabularies',))
+        self.save_cache()
+        self.readers.save_cache()
+    
+    def generate_output(self, writer):
+        for voc in self.vocabularies:
+            writer.write_file(
+                voc.save_as, self.get_template(voc.template),
+                self.context, voc=voc,
+                relative_urls=self.settings['RELATIVE_URLS'],
+                override_output=hasattr(voc, 'override_save_as'))
+        voc_writer_finalized.send(self, writer=writer)
+
+class RdfReader(BaseReader):
+    
+    file_extensions = ['rdf', 'owl']
+    enabled = bool(rdflib_loaded)
+    
+    def __init__(self, *args, **kwargs):
+        super(RdfReader, self).__init__(*args, **kwargs)
+
+    def read(self, source_path):
+        """Parse content and metadata of an rdf file"""
+        logger.debug("Loading graph described in "+source_path)
+        graph = rdflib.Graph()
+        graph.load(source_path)
+        meta = {}
+        queries = [
+            f for f in listdir(self.settings["VOC_QUERIES_PATH"])
+            if (isfile(join(self.settings["VOC_QUERIES_PATH"], f)) 
+                and f.endswith(".sparql"))]
+        for query_path in queries:
+            query_file_path = self.settings["VOC_QUERIES_PATH"]+"/"+query_path
+            with open(query_file_path, "r") as query_file:
+                query = query_file.read()
+
+                # The name of the query identifies the elements in the context
+                query_key=query_path.split(".")[0]
+                result_set = graph.query(query)
+                # Each query result will be stored as a dictionnary in the
+                # vocabulary context, referenced by the query name as its key.
+                # Multiple results are stored in a list.
+                for result in result_set:
+                    if not query_key in meta.keys():
+                        meta[query_key]=result.asdict()
+                    elif type(meta[query_key]) == list:
+                        meta[query_key].append(result.asdict())
+                    else:
+                        meta[query_key]=[meta[query_key], result.asdict()]
+        meta["iri"] = meta["lov_metadata"]["iri"]
+        meta["description"] = meta["lov_metadata"]["description"]
+        meta["version"] = meta["lov_metadata"]["version"]
+        meta["title"] = meta["lov_metadata"]["title"]
+        return "", meta
+
+class Vocabulary(Page):
+    mandatory_properties = ('iri','description','version', 'title')
+    default_template = 'vocabulary'
+
+def add_reader(readers):
+    for ext in RdfReader.file_extensions:
+        readers.reader_classes[ext] = RdfReader
+
+def add_generator(pelican_object):
+    print("Adding the generator")
+    return VocabularyGenerator
+
+
+def register():
+    signals.get_generators.connect(add_generator)
+    signals.readers_init.connect(add_reader)

+ 12 - 0
pelican-rdf/sparql-queries/classes.sparql

@@ -0,0 +1,12 @@
+PREFIX owl:	<http://www.w3.org/2002/07/owl#>
+PREFIX dc:	<http://purl.org/dc/elements/1.1/>
+PREFIX cc:	<http://creativecommons.org/ns#>
+PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
+
+SELECT ?class ?comment ?label
+WHERE {
+	?class rdf:type owl:Class.
+	OPTIONAL { ?class rdfs:comment ?comment.}
+	OPTIONAL { ?label rdfs:label ?label.}
+	OPTIONAL { ?class rdfs:subClassOf ?superclass.}
+} GROUP BY ?class

+ 11 - 0
pelican-rdf/sparql-queries/lov_metadata.sparql

@@ -0,0 +1,11 @@
+PREFIX owl:	<http://www.w3.org/2002/07/owl#>
+PREFIX dc:	<http://purl.org/dc/elements/1.1/>
+PREFIX cc:	<http://creativecommons.org/ns#>
+SELECT ?iri ?license ?description ?version ?title
+WHERE {
+	?iri rdf:type owl:Ontology;
+		cc:license ?license;
+		dc:description ?description;
+		dc:title ?title;
+		owl:versionInfo ?version.
+} LIMIT 1