So I heard today that inverse properties should be avoided in ontologies with lots of instance data. Anyone what to share their thoughts on this?
I would not think that “lots of data” makes a difference. What could possibly make a difference is lots of classes and lots of object-type properties (“relations,” the kind that can have inverses). I would think that if there is likely use case for the inverse (users would go from the other direction), then create it. The inverse might be used significantly less compared to the initial direction, but at least enough that its absence would be noticed by users.
Where did you hear that? From whom? What was the context? What were the reasons stated? {Ontologist actively searching for employment, or phd study options worldwide. Direct hire for consulting at this link]
Inverse properties can signal sloppy modeling. With triples, linking in “both directions” is unnecessary because you can match on triple patterns directly — you don’t need to navigate “from” a known entity “to” an unknown entity. Yes, reasoning with inverse properties can strain query performance, but I think the zeroth order issue is that it may signal sloppy modeling, a “let’s toss in anything anyone could possible want” approach rather than a model with clear conceptual integrity and thus in good service as data documentation. So, there may be a good reason for given inverse properties, but I’d consider it a “model smell” that signals a need for closer examination (and again, the result of examination may be that the inverses are justified)
Thank you, Donny. Helpful points to think about. I appreciate it!
There are several reasons to avoid inverses - both practical and conceptual. I'm a bit slammed for time right now, but I know both TopQudrant and the open source gist project (managed by Semantic Arts) have dropped them: LLMK if you want references. Off the top of my head here are some of the issues:
depending on the triple/quad store used there can be a space or a time penalty. If all inferred triples are stored then it at least (depending on other inferences) doubles the storage requirement; if not then there will be a runtime query penalty (likely to be less)
in either case, it does require the use of OWL reasoning, i.e. beyond the lowest level of RDFS reasoning, which is not always available or gives a penalty (more than just for the inverses)
inverses can become very problematic if wanting to use RDF-star to annotate triples with e.g. provenance information: because there are potentially 2 triples representing the same business fact
(minor) when visualizing an ontology it makes things look more messy with the extra lines
for preparing, loading, or interchanging data it results in either unnecessary inconsistency or decision making - do you allow a mixture of triples in either direction? This in turn puts a burden on the receiver
often the reason for an inverse is for a more business user friendly name to query, as opposed to using ^ in SPARQL. Personally I'm not convinced it's a good idea to expect business users to use SPARQL and there are other ways to document names using annotation properties, which can be picked up by user interfaces. Further, the inverses IMO add a cognitive burden to users by providing 2 ways to query the same relationship
in many cases the inverse names aren't actually that business friendly, often with names ending in prepositions e.g. isManagerOf as inverse of hasManager
and you need to ensure that the textual definitions of the 2 mutual inverses are exactly consistent, and remain so after any changes; since they are supposed to represent the same business relationship
Hope that helps, happy to discuss further/provide links in a day or so after I've caught up from my vacation.