The Knowledge Graph Conference Icon
The Knowledge Graph Conference
  • 🏠Home
  • πŸ“…Events
  • πŸ‘€Members
  • πŸ”΅Announcements
  • πŸ”΅Ask
  • πŸ”΅Ask The Ontologists
  • πŸ”΅Events
  • πŸ”΅Jobs
  • πŸ”΅Promotions
  • πŸ”΅Share
Powered by Tightknit
Share
Share

Comparing String Lengths with Different Replacements in PHP

Avatar of Vladimir A.Vladimir A.
Β·Mar 01, 2024 05:32 PM

who can tell what does this do:

strlen(replace(?text,"[\\p{L}0-9]+[^\\p{L}0-9]*","1"))

And compare it against

strlen(replace(?text,"[\\p{L}0-9]+[^\\p{L}0-9]*","2"))

9 comments

Β· Sorted by Oldest
    • Avatar of Wolfgang S.
      Wolfgang S.
      Β·

      My guess would be: Regular expression (see https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html#ucc) \\p{L} any unicode character 0-9 a digit in the range 0-9 + at least one * zero or more [....] any of the characters mentioned in the brackets [^...] anything but the characters mentioned in the brackets so [\\p{L}0-9]+[^\\p{L}0-9]* should match a text starting with at least one unicode character or a digit followed by zero or more symbols which do not fall in that category replace (see https://www.w3.org/TR/sparql11-query/#func-replace) any part in the provided ?text matching the expression above will be replaced by the digit 1 (or 2 in the second variant) The overall result is the string length of the result of that replace operation. You might have reached that same conclusion already, so not sure if I could provide any help in interpreting?

    • Avatar of Vladimir A.
      Vladimir A.
      Β·

      So, what is the purpose of it?

    • Avatar of Wolfgang S.
      Wolfgang S.
      Β·

      No idea, could be to "normalize" some identifier to a single digit string to determine how many there are in a source value? Do you have an example value?

    • Avatar of Vladimir A.
      Vladimir A.
      Β·

      "Is foo+bar = baz?" -> 4

    • Avatar of Wolfgang S.
      Wolfgang S.
      Β·

      maybe kind of a "word count"?

      πŸ‘1
    • Avatar of Wolfgang S.
      Wolfgang S.
      Β·

      the two different occurences should not return different values. Essentially it shouldn't matter what you replace it with, i.e. whether to use 1, 2, or x as replacement value, if the only purpose is counting the length of the resulting string

    • Avatar of Vladimir A.
      Vladimir A.
      Β·

      This for Ontotext KG, to populate for articles: s:wordCount and s:timeRequired (at 284 wpm, assumed rate)

    • Avatar of Phil T.
      Phil T.
      Β·

      Wolfgang S.’s assumption appears correct. [\\p{L}0-9]+ means at least one letter from any natural language code-set (i.e. alphabet) or number in any order. So, foo, fo9o, foo9, and 9foo will match. [^\\p{L}0-9]* means anything else (whitespace, symbols) and is optional. So the pairing produces a single match for β€˜foo $/*. ’ and replace will return β€˜1’ Thus, β€˜ThΓ© quick br0wn fΓΈx leaped 9 times.’ would produce β€˜1111111’ whose string length is 7. So yes, appears to be a word count that supports world languages.

    • Avatar of Marcelo Barbieri
      Marcelo Barbieri
      Β·

      Can try it here with some test data, too πŸ˜‰ https://regex101.com

      πŸ‘1