Sometimes some additional tooling helps.
We at metaphacts provide comprehensive tooling for Knowledge Graph Management and Modeling and have good support for OntoTextās GraphDB.
There is a free trial available that you can run on your own system using Docker which may help you with loading, exploring, visualizing, searching and modeling RDF data.
See this blog post for an example:
https://blog.metaphacts.com/generating-value-from-your-knowledge-graph-in-days
the two different occurences should not return different values. Essentially it shouldn't matter what you replace it with, i.e. whether to use 1, 2, or x as replacement value, if the only purpose is counting the length of the resulting string
No idea, could be to "normalize" some identifier to a single digit string to determine how many there are in a source value?
Do you have an example value?
My guess would be:
Regular expression
(see https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html#ucc)
\\p{L} any unicode character
0-9 a digit in the range 0-9
+ at least one
* zero or more
[....] any of the characters mentioned in the brackets
[^...] anything but the characters mentioned in the brackets
so [\\p{L}0-9]+[^\\p{L}0-9]* should match a text starting with at least one unicode character or a digit followed by zero or more symbols which do not fall in that category
replace
(see https://www.w3.org/TR/sparql11-query/#func-replace)
any part in the provided ?text matching the expression above will be replaced by the digit 1 (or 2 in the second variant)
The overall result is the string length of the result of that replace operation.
You might have reached that same conclusion already, so not sure if I could provide any help in interpreting?
There is PyRDF: https://pyrdf.readthedocs.io/en/latest/
You need to check whether you can also read RDF data triple by triple, but I would assume that there is such functionality available as well.
When using RDF4J (Java-based, see https://rdf4j.org/), you can use the Rio parsers and write your own RDFHandler which is called for each triple. In your implementation you can e.g. perform analytics like statistics of predicates or classes, or collect N triples and write them to a DB or do whatever you want. No need to buffer the whole file in memory.
See https://rdf4j.org/documentation/programming/rio/ for some pointers.