Skip to main content
Start of main content.

Semantic Search: More than just keywords

by michael.strelan /

Share this post on social media

Semantic Search - Header

Keyword search can miss the mark. So instead, we turned to semantic and hybrid search, built with Drupal, OpenSearch and Skpr. The results were impressive.

How search typically works

Search engines work by keeping an index of all the words and phrases in your content. When you perform a keyword search, the engine looks up that index to find content that matches, assigning each result a relevance score and ordering the results from most to least relevant.

The quality of your search experience comes down to two things: how the query is interpreted, and how relevance is calculated. For example, the query might be interpreted as an exact phrase or as a set of separate terms. Relevance might be influenced by how often a keyword appears, or whether it’s in the title or body content. OpenSearch, for example, uses the BM25 algorithm, which scores relevance based on keyword frequency and position.

But even a perfectly tuned keyword search doesn’t understand the context of the query or the data. That’s where semantic (or neural) search comes in.

What makes Semantic Search different?

Semantic search uses machine learning (ML) models to interpret the meaning behind a query. These models understand relationships between words - for example, that "guitar", "piano" and "music" are conceptually related, whereas "pizza" and "rock band" are not.

The relationships are stored in the search index as vector representations of the content, allowing the search engine to quickly retrieve relevant results by creating a vector representation of the search query and returning results based on vector similarity. I.e. content that is conceptually relevant, even if the exact words don’t match.

For example, a query like “pop songs” or “rock band” would surface content about guitars, pianos and music, without needing those exact words to appear.

To demonstrate the difference, we built a prototype using the IMDb Top 1000 Movies dataset. The index includes only the movie title and a one-sentence overview, no metadata, plot summary or genre tags.

Searching for “space travel” in this index of movies, we expect results about travelling on rocket ships, exploring other planets or galaxies. Let’s take a look at what we get for a standard keyword search.

Search: “space travel”

Keyword search results:

Semantic Search - Space Travel Incorrect

Some of the results, like 2001: A Space OdysseyInterstellar, and Gattaca, are solid matches. But Office Space ranked first? That’s a clear miss. It only matches the word “Space” in the title, not the intended meaning of the query.

The top results when using semantic search are all directly related to space travel, clearly showing the advantage of meaning-based search over keyword matching.

Semantic search results:

Semantic Search - Space Travel Better

While semantic search often gives better results, hybrid search can offer the best of both worlds. It combines keyword and semantic search, recalculating the final relevance score based on a configured weighting.

Let’s see how that compares when searching for "children's magical adventure”.

The keyword search finds Children of Men and The Adventures of Robin Hood, neither of which is what we expect, before the first relevant result, Mary Poppins.

Search: “children’s magical adventure”

Keyword search results:

Semantic Search - Magical - Incorrect

The semantic search results are much more relevant overall. But interestingly, Mary Poppins doesn’t top the list, even though it contains both “magical” and “adventure”.

Semantic search results:

Semantic Search - Magical - Better

Then we incorporated hybrid ranking, and in this scenario, Mary Poppins rises to the top. The hybrid model recognises both the contextual and keyword relevance, delivering a better balance.

Hybrid search results (40% keyword, 60% semantic):

Semantic Search - Magical - Best

Fine-tuning your search experience

What we’ve shown so far is just the tip of the iceberg. There are many ways to tune and enhance semantic search, including:

  • Use of domain-specific models – The Amazon Titan Text Embeddings model works well for general content. But for more specific domains (e.g. medical or legal), targeted models like MedEmbed can deliver better results. You can even train your own model using your dataset.
  • Improving content structure – Indexing isolated paragraphs can lead to a loss of context. Including additional hierarchical or surrounding information can improve semantic understanding and retrieval accuracy.
  • Customising search behaviour – You can adjust the balance between keyword and semantic relevance, enable features like stemming or fuzzy matching, and apply reranking algorithms to suit your use case better.

How it works

For this prototype, our stack included:

  • OpenSearch – As the primary backend, handling indexing and result retrieval.
  • Drupal – Our content source, connected to OpenSearch via the Search API OpenSearch module.
  • Search UI – Simple searches run via Views to query the index via Search API, for more complex searches, we prefer to query OpenSearch directly from the browser.
  • Amazon Bedrock – OpenSearch integrated with a remote ML model (Titan) through Bedrock, enabling us to run inference without managing model infrastructure.

Indexing content

Semantic Search - Indexing

Searching content

Semantic Search - Searching workflow

Using Semantic Search on Skpr

One of the challenges with Semantic Search is managing the infrastructure it requires: vector databases, machine learning models, and securing it all within a unified platform.

We’re working to make this easier with Skpr.

Soon, you’ll be able to:

  • Run OpenSearch 3.1 with enhanced semantic search features and native vector database support alongside your applications.
  • Connect via a private network to AWS Bedrock for semantic search using Amazon Titan or other foundation models.
  • Deploy and test search prototypes quickly, without the overhead of managing complex infrastructure.

Keep an eye on the Skpr blog for updates as we continue exploring how semantic search and modern AI tools can be integrated directly into your Drupal applications, and hosted, managed, and optimised on Skpr.