The Knowledge Graph Conference Icon
The Knowledge Graph Conference
  • 🏠Home
  • 📅Events
  • 👤Members
  • 🔵Announcements
  • 🔵Ask
  • 🔵Ask The Ontologists
  • 🔵Events
  • 🔵Jobs
  • 🔵Promotions
  • 🔵Share
Powered by Tightknit
Ask
Ask

Exploring Data Integration and OLAP Queries with Spark

Avatar of Douglas MooreDouglas Moore
·Dec 04, 2020 04:11 PM

Yong Tang. Great set of new references for me to look at, thank you! I’m on a similar quest… for data integration and OLAP type queries, powered by Spark over data in a data lake.

  • https://github.com/SANSA-Stack/SANSA-Stack is powered by Spark RDDs. This seems active but built on the older Spark 1.x RDD structure

  • OnTop VKG promises to map Sparql to SQL (thus query Big Data sql engines like Spark SQL)

  • S2RDF and some other papers on optimizing Sparql queries over Spark

👍2

7 comments

· Sorted by Oldest
  • Avatar of Daniel C.
    Daniel C.
    ·

    Douglas Moore give the Grakn database a look, Graql (Grakn’s query language) handles, natively, both deductive reasoning via backward chaining (OLTP) and distributed analytics (OLAP) at the database level. Providing strong abstraction over low-level constructs and complex relationships. https://github.com/graknlabs/grakn

  • Avatar of Yong Tang
    Yong Tang
    ·

    "for data integration and OLAP type queries, powered by Spark over data in a data lake." ---amazing! semantic data lake for connecting RDB worlds with semantic!

  • Avatar of Yong Tang
    Yong Tang
    ·

    semantic data lake is very interesting and I am also building a inner BI project wishing to use semantic data warehouse or datalake. A recent paper for this topic is from the following https://upcommons.upc.edu/bitstream/handle/2117/188695/SETLBI.pdf

  • Avatar of Yong Tang
    Yong Tang
    ·

    The overall semantic data integration process(SETL) from the same authors: https://arxiv.org/pdf/2006.07180.pdf

  • Avatar of Yong Tang
    Yong Tang
    ·

    Thanks for mentioning SANSA Douglas Moore, the project is promising and I will see how it works.

  • Avatar of Yong Tang
    Yong Tang
    ·

    As for the recent develop version and 0.8-snapshot , scala binary version 2.12 corresponding to spark-core_2.12 has been added into pom.xml. And it is changed into building on spark 2.

  • Avatar of Douglas Moore
    Douglas Moore
    ·

    Ok, I should have been more precise, it’s built on RDDs and not Dataframes.