I'm a creator, product manager and programmer based in London

Writing

Writings and musings about history, technology, development and anything else that seems interesting.

Indexing and Searching an Azure SQL database with Azure Search

The project I'm currently working on, SailSys, is a site for sailing clubs to manage race entries, competitors and results. Sailors using the site to enter a race can search for their boat by name, sail number or manufacturer in our extensive database of boats. We noticed a lot of users adding duplicate boats, clearly unable to find their own due to the slow queries and inaccurate results. Our database is hosted on Azure SQL so utilising Azure Search was an easy choice to improve our search result quality.

In general the process of creating the resources you need in Azure is simple and easy to follow but there were a few things I didn't realise along the way which meant I had to start over several times, I hope this post will prevent you or, more likely, a future me having to go through the same thing again.

The Azure Search Building Blocks

Azure Search consists of 3 main parts:

  • Data Source - where your application gets its data from. It can be an Azure SQL Database, SQL Database, Cosmos DB, Azure Blob Storage container or Azure Table Storage. Whilst creating the data source you’ll be asked to select the table in your database you want to be searchable.

  • Index - Azure Search will create an index with the data from your chosen table. An index is "a persistent store of documents and other constructs used for filtered and full text search". Using an Azure SQL data source you can customise each property in your table and set them as retrievable (they'll be returned with your query), filterable (can be used in filter queries), sortable (search results will be sorted using this field), facetable (available for use in faceted navigation structure) and searchable.

    You can also set an analyzer for each searchable field - essentially the search library you want to use. The default is Apache Lucene.

  • Indexer - An automated crawler that extracts the data from your data source and inserts it into your index at predefined intervals (the minimum is 5 minutes). Here you can also define how your index will detect changes: either by SQL Integrated Change Tracking or using a High Water Mark Change Detection Policy (which involves adding a time column to indicate when each entry in a table was last edited). More on this below.

After creating each of these parts you can edit them or add others to be used later. As you'll see next though, there's an important decision to be made before setting up any of these.

Choosing Your Pricing Tier

The good news is that Azure Search has a free tier for up to three indexes. The bad news is that unlike every other Azure Service I've used so far you can't currently move between tiers. Once you've used up your free indexes and want to move to four you'll need to create a new search service and build all your indexes again which is not too tricky but quite an inconvenience.

Choose carefully, at the moment you can’t change this later.

Choose carefully, at the moment you can’t change this later.

Prices here are in Australian Dollars but the price adds up quickly. I chose to start with the free tier as I only needed it for the boat search initially.

Creating An Index

After you have connected your database, you’ll need to select the fields from your table you'd like to be returned, searchable, sortable, etc. If you'd like to create a new index after creating your Azure Search resource then I found it was best to use the "import data” menu option rather than "add index" as the former prepopulates a form with all the columns in your chosen table whereas the latter makes you add them all manually.

Using “import data” rather than “add index” is a lot quicker.

Using “import data” rather than “add index” is a lot quicker.

I simply made the Id and Name of my boats retrievable and the Name searchable before moving on to creating an indexer.

Select which fields from your database table you want to include in the index

Select which fields from your database table you want to include in the index

Creating an Indexer and Database Change Tracking

There are two ways the SQL Server Indexer can track which rows in your table have changed and if your index needs to be updated. You can either enable change tracking for the database and let it keep track of all changes over a certain period of time or create a high water mark column (a version number or a time changed field). The easiest way is to use the automated change tracking as it does not require any code or schema changes but I found, painfully, this should be configured before trying to create the index. After you have connected your data source and set up the index Azure has already established you have not set up change tracking and asks you to provide a high water mark column instead.

To set up SQL Integrated Change Tracking you can configure the database via SQL or, as I did, use SQL Server Management Studio.

  1. Right click on your database and open up Properties.

  2. Select Change Tracking from the left-hand side menu and set Change Tracking to be enabled. You can configure how many days you want the history to be retained here too.

  3. Now find the table you wish to index, follow the same process as above and inside Change Tracking enable tracking for this table as well.

Now when you create your index it will notice you have change tracking set up automatically.

Enabling SQL Azure change tracking on the database (left) and the table (right) using SQL Server Management Studio.

Enabling SQL Azure change tracking on the database (left) and the table (right) using SQL Server Management Studio.

As I wrongly configured mine, I had to manually change my index to use automated change tracking using the Azure Search API as I could not find a way to change it after setting up the index on the Azure site. Here is an example of the request I used:

POST https://[your_search].search.windows.net/datasources/[datasource_name]?api-version=2017-11-11
Header: api-key [your_key]
{
    "name": "name",
    "description": null,
    "type": "azuresql",
    "subtype": null,
    "credentials": {
        "connectionString": "..."
    },
    "container": {
        "name": "Boats",
        "query": null
    },
    "dataChangeDetectionPolicy" : {
       "@odata.type" : "#Microsoft.Azure.Search.SqlIntegratedChangeTrackingPolicy"
    },
    "dataDeletionDetectionPolicy": null
}

DataChangeDetectionPolicy is the section that needs changing.

After this step you’ll be walked through setting up your indexer schedule and naming it. You may want to open the advance options section here as I found I wanted to disable Base64 encoding on my index keys. This is done if you use strings with illegal characters as keys but since I was using integer Ids from my boat table I disabled this preventing an unnecessary step of decoding the keys before getting the required entities from my database.

Turn off key encoding if you are using ids as your index keys to avoid having to write unnecessary code

Turn off key encoding if you are using ids as your index keys to avoid having to write unnecessary code

Testing the Search

Thankfully you can test your search index and indexer straight away (once the data has synced for the first time) using the Azure Search Explorer. Enter your query and see the JSON response including your retrievable fields and a score indicating the row's relevancy.

Using Search Explorer to test my index

Using Search Explorer to test my index

Using the Search Results

After set up and testing I configured my application to use the index. Here's some code showing how I was previously searching my database for boats (badly):

My new service for querying Azure Search:

Here I have one method which allows paging through results given a search term. I added "*" to the end of the term so it would look for any boat names containing the search term in any position, without it the boat "Hell Razer" was only returned when you typed in "Hell" but not "Hel". This method just returns a list of ids as that is all the information I need.

I then call this code from my rewritten boat search method to get the full boat information and associated details from my database:

I retrieve a page of results from the search API and then use the list of ids to get the relevant boats from the database. When retrieved these are not guaranteed to be in the order of relevancy for the search term so just before returning the response to the user I need to order the entities by the order they are returned by Azure Search (the order of their score). I feel like there must be a better way of doing this than ordering the entities in memory but I haven’t yet thought of it!

So, after several false starts I finally got the the hang of how to create an index and was reminded just how powerful Azure Search is in enabling significant query improvements in well under a day's work. I do think the Search UI on the Azure website could really do with some love to make it easier for developers. It also really needs the ability to change tiers easier. Automatic real-time indexing with the database would also be very useful to prevent old or incorrect data being contained within the search index.

In the future I'll be writing other queries for other parts of SailSys, I also hope to look into scoring profiles which allow you to dictate the importance of certain fields containing the search term.