The Knowledge Graph Conference Icon
The Knowledge Graph Conference
  • 馃彔Home
  • 馃搮Events
  • 馃懁Members
  • 馃數Announcements
  • 馃數Ask
  • 馃數Ask The Ontologists
  • 馃數Events
  • 馃數Jobs
  • 馃數Promotions
  • 馃數Share
Powered by Tightknit
Ask
Ask

Efficiently Parsing Large YAGO Triple Files: Best Methods Explained

Avatar of Gautam K.Gautam K.
路Oct 23, 2023 09:10 AM

Hello, I would like to parse a YAGO file which is large like 20 GB, how do I parse a large triple file efficiently? Currently, I tried RDFLIB but didn't work well. Is there any other option?

4 comments

路 Sorted by Oldest
  • Avatar of Wolfgang S.
    Wolfgang S.
    路

    When using RDF4J (Java-based, see https://rdf4j.org/), you can use the Rio parsers and write your own RDFHandler which is called for each triple. In your implementation you can e.g. perform analytics like statistics of predicates or classes, or collect N triples and write them to a DB or do whatever you want. No need to buffer the whole file in memory. See https://rdf4j.org/documentation/programming/rio/ for some pointers.

  • Avatar of Wolfgang S.
    Wolfgang S.
    路

    Note: the same approach might of course be possible with RDFLib, but I鈥檓 not familiar with that.

  • Avatar of Gautam K.
    Gautam K.
    路

    Wolfgang S. thanks for your kind suggestion. Is there something for Python?

  • Avatar of Wolfgang S.
    Wolfgang S.
    路

    There is PyRDF: https://pyrdf.readthedocs.io/en/latest/ You need to check whether you can also read RDF data triple by triple, but I would assume that there is such functionality available as well.