The Knowledge Graph Conference Icon
The Knowledge Graph Conference
  • 🏠Home
  • 📅Events
  • 👤Members
  • 🔵Announcements
  • 🔵Ask
  • 🔵Ask The Ontologists
  • 🔵Events
  • 🔵Jobs
  • 🔵Promotions
  • 🔵Share
Powered by Tightknit
Ask
Ask

Exploring Virtualization in Knowledge Graphs: Key Technologies and Criteria

Avatar of François S.François S.
·Jan 31, 2023 10:42 AM

Hi everyone, I am looking for resources around virtualization for knowledge graphs. Several graph databases include a virtualization component that allow to expose a relational DB as a graph. Stardog is known for this, The Ontop technology is included in GraphDb, and Virtuoso has virtualization as well. And I certainly miss other technologies. I wonder how all these technologies vary, what are the relevant criteria to look at when evaluating a vendor. And how is it to consume the virtual graph. One question especially is around entity data integration. Can I have a resource that shares data from several sources? Eg a person takes her name from data source 1, and her current employment from data source 2, and behind the scenes, either a graph DB or a relational db are queried. As a bonus question, are there similar technologies to virtualize an API? cc Atanas Kiryakov Deni J. Michael G. Vassil M.

👀1

21 comments

· Sorted by Oldest
      • Avatar of François S.
        François S.
        ·

        can AWS glue be used to virtualize data in Neptune Ora L. Charles I.?

      • Avatar of Wolfgang S.
        Wolfgang S.
        ·

        We (metaphacts) also provide some abstraction layer for federation and virtualization with our metaphactory product: Ephedra can be used to expose REST-based APIs and SQL databases wrapped as SPARQL SERVICEs. FedX (now part of RDF4J) provides federation using SPARQL and can make use of Ephedra to combine data from multiple data sources in a single query via joins. See https://metaphacts.com/ephedra and https://help.metaphacts.com/resource/Help:Federation for details. We also have a blog post at https://blog.metaphacts.com/federation-in-metaphactory.

      • Avatar of François S.
        François S.
        ·

        Interesting. How would you say this characterize Ephedra vs Ontop or Stardog virtualization?

      • Avatar of François S.
        François S.
        ·

        Is there a Federation vs Integration distinction here?

      • Avatar of Wolfgang S.
        Wolfgang S.
        ·

        They are independent but you can combine them. Federation basically allows you to combine multiple sources assuming these are available via SPARQL using standard SPARQL federation using the SERVICE keyword. For non-RDF datasources (or those not being available via SPARQL) you need some wrapper (i.e. the integration) which exposes it as such.

      • Avatar of Wolfgang S.
        Wolfgang S.
        ·

        How would you say this characterize Ephedra vs Ontop or Stardog virtualization?

        In general, federation is ideally handled as closely as possible to the database, as they can do joins directly with your data. So if you use Ontotext GraphDB or Stardog with their respective virtualization, that leads to better performance. If you do not have those possibilities you can externalize this, e.g. using FedX (which was originally built by people now working at metaphacts, this has a long history…) and is also used e.g. in GraphDB. I cannot speak for Stardog virtualization, but besides the virtualization and observations regarding joining with local data described above, another factor is whether you do on-the-fly configuration of SPARQL-to-SQL (Ontop) or rather a pre-defined mapping (e.g. using R2RML (also Ontop or rml-mapper) or RML (rml-mapper). Besides interfacing with SQL-databases we also allow wrapping REST operations which (AFAIK) is not possible with Ontop, but might be using Stardog. Another factor is how this is integrated into your data application, i.e. operational aspects, security, access control, deployment, querying, exposure via SPARQL endpoint, etc. This is one aspect where metaphactory gives you a lot of convenience, control and capabilities.

      • Avatar of François S.
        François S.
        ·

        That's great, thank you Wolfgang

      • Avatar of naren m.
        naren m.
        ·

        How we do the virtualization component at QUIPU: We define a top ontology covering the whole of the data fabric (of the user interest) touching various data heterogeneous sources. Our requirement is to build multiple perspectives (logical sub graphs views) on the fabric creating an underlying abstraction of a connected data fabric. We use Ontop R2RML mappings for multiple SQL data sources connections which will be exposed through a SPARQL endpoint. Adding further to combine other RDF sources we use the SPARQL SERVICE clause. As our use case is to build multiple perspectives on the fabric, our abstraction is done at two stages, one at the data fabric layer and the other at the perspective layer where we limit the data access to the sub-ontology globally, so that the user universe is restricted. Example of Customer 360*: Source A: SQL DB with KYC data Source B: SQL DB with Securities Transaction Data Source C: RDF Store with GLEIF Data TOP Ontology: Master Ontology linking all the 3 sources. Perspective #1: Customer_CRM Perspective #2: Corporation Perspective #3: Customer_Corporation So an entity 360* for the Customer_CRM perspective would be limited to the sub-ontology covering part of Source A & Source B. Similarly Customer_Corporation perspective would be a combination of all three. We are also looking for a direct data catalog to ontology mapping solutions to streamline the process, please suggest if any.

      • Avatar of Wolfgang S.
        Wolfgang S.
        ·

        naren m.

        We are also looking for a direct data catalog to ontology mapping solutions to streamline the process, please suggest if any.

        Not sure what exactly you need, but metaphactory provides ontology modeling, vocabulary maintenance and data catalog, all integrated as described in these blog posts:

        • https://blog.metaphacts.com/data-in-context-with-metaphactory-s-flexible-data-cataloging-capabilities

        • https://blog.metaphacts.com/connecting-the-best-of-both-worlds-ontologies-and-vocabularies-in-metaphactory

        • https://blog.metaphacts.com/visual-ontology-modeling-for-domain-experts-and-business-users-with-metaphactory

        If you need this integrated with maintaining/authoring of R2RML mapping, we do not currently have that available, but that could possibly be built by yourself on top of metaphactory graph application platform using the functionality described above as a base.

      • Avatar of naren m.
        naren m.
        ·

        Thanks Wolfgang S. for the reply. To be more clear on the ask, an integrated approach to utilize the data catalog and vocab for ontology modeling and respective mapping (R2RML or otherwise) rather than achieving the same with different tools & frameworks. However, your reply has provided us with the details to pursue & explore it further and it looks exciting.

      • Avatar of Benjamin C.
        Benjamin C.
        ·

        Hi François S., Ontop translates each SPARQL query into one SQL query. When you have only one source, and this source speaks SQL, then Ontop can directly speak to it (as long as it knows its dialect). Otherwise, if you have multiple sources or if your source don't speak SQL (e.g. MongoDB, data lake files or a WebAPI), you need to use a database federator/data virtualization platform that makes all the data looking relational and virtually appearing as coming from "one single source". Denodo is particularly good at that (especially when you have WebAPIs) and there is also open-source alternatives like Dremio and Teiid. Unfortunately, to the best of my knowledge, these database federators do no support graph databases. So if you need to federate with a SPARQL endpoint, you currently need to do it at the SPARQL level.

        👍1
      • Avatar of François S.
        François S.
        ·

        That is helpful, thank you Benjamin

      • Avatar of Bob DuCharme
        Bob DuCharme
        ·

        Also check out the open source http://d2rq.org/.

        👍1
      • Avatar of François S.
        François S.
        ·

        Ora L. Charles I. I see AWS advertises Denodo as a virtualization partner. Are there use-cases involving Denodo and Neptune?

      • Avatar of Ora L.
        Ora L.
        ·

        Not that I know of, but I can look into this.

      👍1