The Knowledge Graph Conference Icon
The Knowledge Graph Conference
  • 馃彔Home
  • 馃搮Events
  • 馃懁Members
  • 馃數Announcements
  • 馃數Ask
  • 馃數Ask The Ontologists
  • 馃數Events
  • 馃數Jobs
  • 馃數Promotions
  • 馃數Share
Powered by Tightknit
Ask
Ask

Comparing ElasticSearch and Solr for Search Engine Development

Avatar of Marios T.Marios T.
路Nov 06, 2020 02:49 PM

Hello everyone. I was wondering if anyone here is experienced with building a search engine from scratch. I have some questions when it comes to choosing ElasticSearch vs Solr and also performance evaluation (document ranking, query expansion, metadata analytics and more). Thank you!

馃憤3

17 comments

路 Sorted by Oldest
    • Avatar of Ellie Y.
      Ellie Y.
      路

      we had a lively discussion about this once with Huda K.. Ashleigh F. and others chimed in on LinkedIn: https://www.linkedin.com/posts/sellieyoung_knowledgegraphs-search-kgs-activity-6704812840198443008-QAD_

      馃憤4
    • Avatar of Ellie Y.
      Ellie Y.
      路

      Matthias S. is currently working to build one as well

      馃敟2
    • Avatar of Matthias S.
      Matthias S.
      路

      Hi Marios T., I am not experienced with building search engines but I am indeed working to enhanced one that is based on Elasticsearch. However, I would be glad to discuss the subject with you 馃檪 For now the only thing I can do is to tell you look for the differences between Solr and Lucene since Elasticsearch is built on top of Lucene 馃槈

      馃憤1
    • Avatar of Michael G.
      Michael G.
      路

      go with elastic

      馃憤2
    • Avatar of Craig T.
      Craig T.
      路

      I think Solr is built on top of LUcene too

    • Avatar of Craig T.
      Craig T.
      路

      I haven't used Solr since 2015; but back then the choice was Solr if you are using Java and E/S if you are using Python

      馃憤1
    • Avatar of Ashleigh F.
      Ashleigh F.
      路

      Agree with Michael G., Elastic is doing much better in multilingual and working with KG data, among other things

      馃憤2
    • Avatar of Vikram
      Vikram
      路

      I agree with others Elastic is preferred. On a side note the latest version supports sql queries, the native query language EQL is a not easy to use.

      馃憤1
      馃檶1
    • Avatar of Scott H
      Scott H
      路

      Our company (LexisNexis - something like 3 petabytes of data) went with solr - it took a while for us to decide. I think our main driver was flexibility but we do have some services in elastic. This is a great resource and community: https://opensourceconnections.com/.

      馃憤3
    • Avatar of Paco N.
      Paco N.
      路

      Another approach, which I've seen used as scale: analyze the incoming searches for the use case and pre-compute/cache the most common ones. Our team used Redis for this, in a use case (200+ publishers' content) where 80% of all queries were among a few dozen keywords. We could precompute results for those and increase performance dramatically. In that project we also used a KG-based graph widget in React, where users typically spent 90% of their time after initial search. For the remaining (very small) percentage that required full-text search, we used RediSearch which turned out to be faster and more reliable than the Lucence-based alternative mentioned above.

      馃憤1
    • Avatar of Phil G.
      Phil G.
      路

      Hi Marios and everyone on this thread, just wanted to shamelessly self-promote www.entityze.com that can be used in conjunction with Solr/Lucene and Elastic Search to automatically add a layer of normalized semantic metadata for each document... the granularity of the semantic metadata can be finetuned.

      馃憤1
    • Avatar of Abhay Ratnaparkhi
      Abhay Ratnaparkhi
      路

      We went with solr from legacy proprietary search engine for our intranet search at IBM. Some other factors like internal expertise etc played a key role in choosing solr over elastic.

      馃憤2
    • Avatar of Yong Tang
      Yong Tang
      路

      if combined with KG, I can not use ES or solr, instead, I will select vector search for larger use spaces for different types of nodes. And, I will first build KG, then, I will do KG embeddings and this will run into vector world, finally, I will integrate embedding results with vector search engines, eg.https://milvus.io/

      馃憤1
    • Avatar of Yong Tang
      Yong Tang
      路

      There is an aricle about Graph Recommedation with Milvus https://medium.com/unstructured-data-service/graph-based-recommendation-system-with-milvus-c40b3aafd295

      馃敟1
    • Avatar of Matthias S.
      Matthias S.
      路

      Paco N. I have some further questions on the approach you mentionned. Could you (or anyone) expend on it? I have experienced with Elasticsearch for my company search engine and I am now exploring possible other solutions. Other than enhancing the results relevance, they are also interested in system performance and hardware costs (expecially this one). Below are some specific questions:

      • "(200+ publishers' content)" --> could you give an idea of the DB size? (are we speaking about Terabytes or Gigabytes of data?

      • "For the remaining (very small) percentage that required full-text search" --> Do you have an idea of your data size? My concern is on the hardware cost side there since, from what I understand, REDIS requires either a lot of RAM or at least disks with high I/O (like SSDs) which are also quite expensive.

      • "80% of all queries were among a few dozen keywords" --> does it means that you used REDIS to cache the sets of relevant documents associated with each keyword?

    馃憤馃徏5
    馃憤2