
Michael StrelanSenior Developer
Keyword search can miss the mark. So instead, we turned to semantic and hybrid search, built with Drupal, OpenSearch and Skpr. The results were impressive.
Search engines work by keeping an index of all the words and phrases in your content. When you perform a keyword search, the engine looks up that index to find content that matches, assigning each result a relevance score and ordering the results from most to least relevant.
The quality of your search experience comes down to two things: how the query is interpreted, and how relevance is calculated. For example, the query might be interpreted as an exact phrase or as a set of separate terms. Relevance might be influenced by how often a keyword appears, or whether it’s in the title or body content. OpenSearch, for example, uses the BM25 algorithm, which scores relevance based on keyword frequency and position.
But even a perfectly tuned keyword search doesn’t understand the context of the query or the data. That’s where semantic (or neural) search comes in.
Semantic search uses machine learning (ML) models to interpret the meaning behind a query. These models understand relationships between words - for example, that "guitar", "piano" and "music" are conceptually related, whereas "pizza" and "rock band" are not.
The relationships are stored in the search index as vector representations of the content, allowing the search engine to quickly retrieve relevant results by creating a vector representation of the search query and returning results based on vector similarity. I.e. content that is conceptually relevant, even if the exact words don’t match.
For example, a query like “pop songs” or “rock band” would surface content about guitars, pianos and music, without needing those exact words to appear.
To demonstrate the difference, we built a prototype using the IMDb Top 1000 Movies dataset. The index includes only the movie title and a one-sentence overview, no metadata, plot summary or genre tags.
Searching for “space travel” in this index of movies, we expect results about travelling on rocket ships, exploring other planets or galaxies. Let’s take a look at what we get for a standard keyword search.
Keyword search results:
Some of the results, like 2001: A Space Odyssey, Interstellar, and Gattaca, are solid matches. But Office Space ranked first? That’s a clear miss. It only matches the word “Space” in the title, not the intended meaning of the query.
The top results when using semantic search are all directly related to space travel, clearly showing the advantage of meaning-based search over keyword matching.
Semantic search results:
While semantic search often gives better results, hybrid search can offer the best of both worlds. It combines keyword and semantic search, recalculating the final relevance score based on a configured weighting.
Let’s see how that compares when searching for "children's magical adventure”.
The keyword search finds Children of Men and The Adventures of Robin Hood, neither of which is what we expect, before the first relevant result, Mary Poppins.
Keyword search results:
The semantic search results are much more relevant overall. But interestingly, Mary Poppins doesn’t top the list, even though it contains both “magical” and “adventure”.
Semantic search results:
Then we incorporated hybrid ranking, and in this scenario, Mary Poppins rises to the top. The hybrid model recognises both the contextual and keyword relevance, delivering a better balance.
Hybrid search results (40% keyword, 60% semantic):
What we’ve shown so far is just the tip of the iceberg. There are many ways to tune and enhance semantic search, including:
For this prototype, our stack included:
One of the challenges with Semantic Search is managing the infrastructure it requires: vector databases, machine learning models, and securing it all within a unified platform.
We’re working to make this easier with Skpr.
Soon, you’ll be able to:
Keep an eye on the Skpr blog for updates as we continue exploring how semantic search and modern AI tools can be integrated directly into your Drupal applications, and hosted, managed, and optimised on Skpr.