Publishing Knowledge Graph Datasets in HDT Format: Insights?

Hi. has anyone published KG datasets using the HDT format (https://www.rdfhdt.org/ ) Thoughts?

· Sorted by Oldest

Hi. has anyone published KG datasets using the HDT format (https://www.rdfhdt.org/ ) Thoughts?

· Sorted by Oldest

Ryan B.
·
Looks really interesting. It reminds me a lot of modern arco approaches (kerchunk). There is quite a bit of interest in tests and studies of federated query over file serializations (vs managed database solutions) (e.g. parquet, zarr) with contextual headers storing mapping files R2RML/RML, kerchunk). I could see a lot of performance potential here vs say virtualized queries via athena over json-ld files, while still maintaining benefits of file-store data vs managed service. Thanks for the share.
Andrew P.
·
yep reminds me of parquet
Andrew P.
·
good start towards separating storage and compute for graphs .
🙌1
Ryan B.
·
it’s definitely a piece of the puzzle
Denny V.
·
One issue we had was that HDT wasn't great in supporting dynamic KGs, which changed often, because it needed a complete recalculation, IIRC. For reasonably static datasets this sounds like a good efficient way.
👍1

Ryan B.
·
Looks really interesting. It reminds me a lot of modern arco approaches (kerchunk). There is quite a bit of interest in tests and studies of federated query over file serializations (vs managed database solutions) (e.g. parquet, zarr) with contextual headers storing mapping files R2RML/RML, kerchunk). I could see a lot of performance potential here vs say virtualized queries via athena over json-ld files, while still maintaining benefits of file-store data vs managed service. Thanks for the share.
Andrew P.
·
yep reminds me of parquet
Andrew P.
·
good start towards separating storage and compute for graphs .
🙌1
Ryan B.
·
it’s definitely a piece of the puzzle
Denny V.
·
One issue we had was that HDT wasn't great in supporting dynamic KGs, which changed often, because it needed a complete recalculation, IIRC. For reasonably static datasets this sounds like a good efficient way.
👍1