Query XML content over CMIS
I am evaluating Nuxeo as a CMIS compliant repository for a large and important project in the Netherlands. One of our project goals is to have a repository with medical content. We have selected DITA (=an xml standard) as our basis for structuring the content, and we are tagging these docs with semantically linked keywords from a domain specific ontology. This is all done inside the XML content via a custom editor, so independent of the repository, which is vital for our architecture. The editor maintains the repository over CMIS. One of the requirements there is to be able to list all documents tagged with a certain ontology keyword. So in a nutshell, when I have repository document with a DITA xml file like this:
<?xml version="1.0"?>
<topic xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:a="http://dita.oasis-open.org/architecture/2005/" id="dita_topic" xml:lang="en-us" xsi:noNamespaceSchemaLocation="urn:oasis:names:tc:dita:xsd:topic.xsd" xml:base="http://localhost:3000/documents/new_topic.xml">
<title>My Title</title>
<shortdesc>My description</shortdesc>
<prolog>
<metadata>
<keywords>
<keyword rel="http://dbpedia.org/resource/Paris">
Paris
</keyword>
<keyword rel="http://dbpedia.org/resource/Rome">
Rome
</keyword>
</keywords>
</metadata>
</prolog>
<body>
...
</body>
</topic>
I would like to do this:
curl -u un:pw "http://localhost:8080/nuxeo/atom/cmis/default/query?q=SELECT+cmis:objectId,+dc:title+
FROM+cmis:folder+WHERE
+my:keyworduri+=+'http://dbpedia.org/resource/Paris'&searchAllVersions=true"
and find my document.
My best guess is I need to extract the xml fields I want to query when creating/updating documents and set them as custom metadata. I thought this was a fairly common use case, but the information I have been able to find on metadata extraction is either outdated or pretty scarce. So can this be done in a fairly straightforward way (I am not a Java programmer) with Nuxeo? If so, how? Any other ways of satisfying my requirements?
TIA.
This is fairly straightforward to do, the idea is to write a Java EventListener
that reacts on the documentCreated
and documentModified
events, does the metadata extraction according to your logic (using some XPath processor for instance), and stores it in the resulting document as Nuxeo metadata so that it can be queried easily.
That's just a one- or two- page method and a few supporting XML files to register the listener as a new plugin.