Hello everyone. I was wondering if anyone here is experienced with building a search engine from scratch. I have some questions when it comes to choosing ElasticSearch vs Solr and also performance evaluation (document ranking, query expansion, metadata analytics and more). Thank you!
we had a lively discussion about this once with Huda K.. Ashleigh F. and others chimed in on LinkedIn: https://www.linkedin.com/posts/sellieyoung_knowledgegraphs-search-kgs-activity-6704812840198443008-QAD_
Hi Marios T., I am not experienced with building search engines but I am indeed working to enhanced one that is based on Elasticsearch. However, I would be glad to discuss the subject with you 馃檪 For now the only thing I can do is to tell you look for the differences between Solr and Lucene since Elasticsearch is built on top of Lucene 馃槈
I think Solr is built on top of LUcene too
Agree with Michael G., Elastic is doing much better in multilingual and working with KG data, among other things
Our company (LexisNexis - something like 3 petabytes of data) went with solr - it took a while for us to decide. I think our main driver was flexibility but we do have some services in elastic. This is a great resource and community: https://opensourceconnections.com/.
Another approach, which I've seen used as scale: analyze the incoming searches for the use case and pre-compute/cache the most common ones. Our team used Redis for this, in a use case (200+ publishers' content) where 80% of all queries were among a few dozen keywords. We could precompute results for those and increase performance dramatically. In that project we also used a KG-based graph widget in React, where users typically spent 90% of their time after initial search. For the remaining (very small) percentage that required full-text search, we used RediSearch which turned out to be faster and more reliable than the Lucence-based alternative mentioned above.
Hi Marios and everyone on this thread, just wanted to shamelessly self-promote www.entityze.com that can be used in conjunction with Solr/Lucene and Elastic Search to automatically add a layer of normalized semantic metadata for each document... the granularity of the semantic metadata can be finetuned.
We went with solr from legacy proprietary search engine for our intranet search at IBM. Some other factors like internal expertise etc played a key role in choosing solr over elastic.
if combined with KG, I can not use ES or solr, instead, I will select vector search for larger use spaces for different types of nodes. And, I will first build KG, then, I will do KG embeddings and this will run into vector world, finally, I will integrate embedding results with vector search engines, eg.https://milvus.io/
There is an aricle about Graph Recommedation with Milvus https://medium.com/unstructured-data-service/graph-based-recommendation-system-with-milvus-c40b3aafd295
Paco N. I have some further questions on the approach you mentionned. Could you (or anyone) expend on it? I have experienced with Elasticsearch for my company search engine and I am now exploring possible other solutions. Other than enhancing the results relevance, they are also interested in system performance and hardware costs (expecially this one). Below are some specific questions:
"(200+ publishers' content)" --> could you give an idea of the DB size? (are we speaking about Terabytes or Gigabytes of data?
"For the remaining (very small) percentage that required full-text search" --> Do you have an idea of your data size? My concern is on the hardware cost side there since, from what I understand, REDIS requires either a lot of RAM or at least disks with high I/O (like SSDs) which are also quite expensive.
"80% of all queries were among a few dozen keywords" --> does it means that you used REDIS to cache the sets of relevant documents associated with each keyword?