Optimizing Knowledge Graph Storage: Best Practices and Tools
I want to say a background: My original data firstlly is stored on a property graph db, then, I load the data into CSV or json from the graph db. Then, I use https://github.com/SDM-TIB/SDM-RDFizer to convert the csv/json into RDF knowledge graphs. I recommend SDM-RDFizer becuase it implements optimized data structures and relational algebra operators that enable an efficient execution of RML triple maps even in the presence of Big data. Then, I use https://github.com/RDFLib/rdflib to operate SPARQL 1.1 and others. However, the whole flow missed a critical step: how to store the KG? I have some ideas: 1) directly store KG into files, however, this is not a idea way in landing industry. 2) using a distributed KG store based on python(noting the whole flow is based on python) 3) extend rdflib to add a store plugin using the property graph. (rdflib:store implementations for in-memory storage and persistent storage on top of the Berkeley DB)