The Knowledge Graph Conference Icon
The Knowledge Graph Conference
  • 🏠Home
  • 📅Events
  • 👤Members
  • 🔵Announcements
  • 🔵Ask
  • 🔵Ask The Ontologists
  • 🔵Events
  • 🔵Jobs
  • 🔵Promotions
  • 🔵Share
Powered by Tightknit
Share
Share

Introducing bbw: Open Source Tool for CSV to Wikidata Matching

Avatar of Renat S.Renat S.
·Nov 18, 2020 12:38 PM

I would like to share our open source semantic annotator bbw for matching CSV-files without metadata to the Wikidata knowledge graph (https://github.com/UB-Mannheim/bbw). Using bbw we won recently third place in “Semantic Web Challenge on Tabular Data to Knowledge Graph Matching” (www.cs.ox.ac.uk/isg/challenges/sem-tab/2020) collocated with the 19th International Semantic Web Conference and the 15th International Workshop on Ontology Matching. The bbw-tool annotates tabular data with the entities, types and properties in Wikidata. A raw table can be very easily annotated with bbw as illustrated in Jupyter Notebook: https://github.com/UB-Mannheim/bbw/blob/main/bbw.ipynb. 🤗

👍1
👏2
🙂3

2 comments

· Sorted by Oldest
  • Avatar of Gregory Saumier-Finch
    Gregory Saumier-Finch
    ·

    Hi. Thx for sharing. Could you comment on how it compares to OpenRefine with the Wikidata reconciliation service that enables you to add Wikidata entities to a plain CSV?

  • Avatar of Renat S.
    Renat S.
    ·

    In the initial version of bbw we used the OpenRefine Reconciliation API (https://wikidata.reconci.link/en/api). However, due to a very noisy dataset in the SemTab2020 competition the OpenRefine Reconciliation API was simply unable to handle with those very difficult mistakes in the words and it returned often no results at all. We decided to disable a lookup over the OpenRefine Reconciliation API. Instead of it we used a meta-lookup over many search and metasearch engines. This allowed us to resolve even tricky spelling mistakes. OpenRefine is a great software, but if I would have noisy data, I would use bbw. By the way, I do not know how these two would compare at a clean dataset, it has to be tested. I can only add that bbw achieved F1-scores above 0.99 for properties and roughly 0.98 for types and entities even at a very noisy dataset.

    🙌1