Exploring Virtualization in Knowledge Graphs: Key Technologies and Criteria | The Knowledge Graph Conference

The Knowledge Graph Conference Icon

The Knowledge Graph Conference

François S.
·
can AWS glue be used to virtualize data in Neptune Ora L. Charles I.?
Wolfgang S.
·
We (metaphacts) also provide some abstraction layer for federation and virtualization with our metaphactory product: Ephedra can be used to expose REST-based APIs and SQL databases wrapped as SPARQL SERVICEs. FedX (now part of RDF4J) provides federation using SPARQL and can make use of Ephedra to combine data from multiple data sources in a single query via joins. See https://metaphacts.com/ephedra and https://help.metaphacts.com/resource/Help:Federation for details. We also have a blog post at https://blog.metaphacts.com/federation-in-metaphactory.
François S.
·
Interesting. How would you say this characterize Ephedra vs Ontop or Stardog virtualization?
François S.
·
Is there a Federation vs Integration distinction here?
Wolfgang S.
·
They are independent but you can combine them. Federation basically allows you to combine multiple sources assuming these are available via SPARQL using standard SPARQL federation using the SERVICE keyword. For non-RDF datasources (or those not being available via SPARQL) you need some wrapper (i.e. the integration) which exposes it as such.
Wolfgang S.
·
How would you say this characterize Ephedra vs Ontop or Stardog virtualization?
In general, federation is ideally handled as closely as possible to the database, as they can do joins directly with your data. So if you use Ontotext GraphDB or Stardog with their respective virtualization, that leads to better performance. If you do not have those possibilities you can externalize this, e.g. using FedX (which was originally built by people now working at metaphacts, this has a long history…) and is also used e.g. in GraphDB. I cannot speak for Stardog virtualization, but besides the virtualization and observations regarding joining with local data described above, another factor is whether you do on-the-fly configuration of SPARQL-to-SQL (Ontop) or rather a pre-defined mapping (e.g. using R2RML (also Ontop or rml-mapper) or RML (rml-mapper). Besides interfacing with SQL-databases we also allow wrapping REST operations which (AFAIK) is not possible with Ontop, but might be using Stardog. Another factor is how this is integrated into your data application, i.e. operational aspects, security, access control, deployment, querying, exposure via SPARQL endpoint, etc. This is one aspect where metaphactory gives you a lot of convenience, control and capabilities.
François S.
·
That's great, thank you Wolfgang
naren m.
·
How we do the virtualization component at QUIPU: We define a top ontology covering the whole of the data fabric (of the user interest) touching various data heterogeneous sources. Our requirement is to build multiple perspectives (logical sub graphs views) on the fabric creating an underlying abstraction of a connected data fabric. We use Ontop R2RML mappings for multiple SQL data sources connections which will be exposed through a SPARQL endpoint. Adding further to combine other RDF sources we use the SPARQL SERVICE clause. As our use case is to build multiple perspectives on the fabric, our abstraction is done at two stages, one at the data fabric layer and the other at the perspective layer where we limit the data access to the sub-ontology globally, so that the user universe is restricted. Example of Customer 360*: Source A: SQL DB with KYC data Source B: SQL DB with Securities Transaction Data Source C: RDF Store with GLEIF Data TOP Ontology: Master Ontology linking all the 3 sources. Perspective #1: Customer_CRM Perspective #2: Corporation Perspective #3: Customer_Corporation So an entity 360* for the Customer_CRM perspective would be limited to the sub-ontology covering part of Source A & Source B. Similarly Customer_Corporation perspective would be a combination of all three. We are also looking for a direct data catalog to ontology mapping solutions to streamline the process, please suggest if any.
Wolfgang S.
·
naren m.
We are also looking for a direct data catalog to ontology mapping solutions to streamline the process, please suggest if any.
Not sure what exactly you need, but metaphactory provides ontology modeling, vocabulary maintenance and data catalog, all integrated as described in these blog posts:
https://blog.metaphacts.com/data-in-context-with-metaphactory-s-flexible-data-cataloging-capabilities
https://blog.metaphacts.com/connecting-the-best-of-both-worlds-ontologies-and-vocabularies-in-metaphactory
https://blog.metaphacts.com/visual-ontology-modeling-for-domain-experts-and-business-users-with-metaphactory
If you need this integrated with maintaining/authoring of R2RML mapping, we do not currently have that available, but that could possibly be built by yourself on top of metaphactory graph application platform using the functionality described above as a base.
naren m.
·
Thanks Wolfgang S. for the reply. To be more clear on the ask, an integrated approach to utilize the data catalog and vocab for ontology modeling and respective mapping (R2RML or otherwise) rather than achieving the same with different tools & frameworks. However, your reply has provided us with the details to pursue & explore it further and it looks exciting.
Benjamin C.
·
Hi François S., Ontop translates each SPARQL query into one SQL query. When you have only one source, and this source speaks SQL, then Ontop can directly speak to it (as long as it knows its dialect). Otherwise, if you have multiple sources or if your source don't speak SQL (e.g. MongoDB, data lake files or a WebAPI), you need to use a database federator/data virtualization platform that makes all the data looking relational and virtually appearing as coming from "one single source". Denodo is particularly good at that (especially when you have WebAPIs) and there is also open-source alternatives like Dremio and Teiid. Unfortunately, to the best of my knowledge, these database federators do no support graph databases. So if you need to federate with a SPARQL endpoint, you currently need to do it at the SPARQL level.
👍1
François S.
·
That is helpful, thank you Benjamin
Bob DuCharme
·
Also check out the open source http://d2rq.org/.
👍1
François S.
·
Ora L. Charles I. I see AWS advertises Denodo as a virtualization partner. Are there use-cases involving Denodo and Neptune?
Ora L.
·
Not that I know of, but I can look into this.