Non-breaking Elasticsearch Mappings Updates

Savan Nahar
2 min readFeb 17, 2024

--

LOL: ImageGen is bad at spellings

A recent challenge I tackled was updating the mapper for a data field in the _source of an Elasticsearch document on an active index, without affecting read and write paths. Aka, doing reindexing on the index on which read and writes were happening.

The task at hand was not just about adding a mapper; it was about doing so without any service disruption. Since Elasticsearch indexes are immutable, any change in the mapping requires the creation of a new index and the migration of data from the old one. This becomes a complex process, especially if the index is in read and write path.

Popular approach

The first solution that comes to most minds is creating a parallel index, NEW_INDEX, with the updated mapping from the existing active CURRENT_INDEX.

Post creating the NEW_INDEX, data is reindexed _reidnexfrom CURRENT_INDEX to NEW_INDEX, while writes continue to happen on both indexes. Once NEW_INDEX catches up with CURRENT_INDEX in terms of data, reads are gracefully switched from the latter to the former.

POST _reindex
{
"source": {
"index": "order-2024-02"
},
"dest": {
"index": "order-2024-02-re"
}
}

However, this approach can result in transient data loss and requires multiple code changes 😫.

Unexplored approach

There is another approach, not well-known to many, which offers a simpler solution for non-breaking changes. This involves using the _update_by_query operation.

This powerful Elasticsearch operation updates documents that match a specified query—in our case, providing the updated mappings. If no query is provided, it updates every document in the data stream or index, effectively reindexing within the same index. It's important to clarify that _update_by_query does not change the mappings but helps you adopting the new mappings.

POST order-2024-02/_update_by_query?wait_for_completion=true
{
"query": {
"term": {
"item_type": "TEST"
}
},
"script" : {
"source": "ctx._source.version += 1",
"lang": "painless"
}
}

Let’s break it down:

  1. You update the _mapping for your index directly on Elasticsearch.
  2. You then execute update_by_query on your index. This operation goes through every document in your index, checks it against your updated mappings, and writes it back to the index in the corrected form, all in an atomic operation.

Please note that this method works for adding new fields or updating existing analyzer for field, but it won’t work for changing the field type.

In this manner, you can update Elasticsearch mappings without disrupting your service, helping you to add a mapper to an existing field more smoothly and efficiently. This method not only eliminates the need for multiple deployments but also avoids the potential data leak that can occur when migrating to a new index and without any code changes.

--

--

Savan Nahar
Savan Nahar

Written by Savan Nahar

Building software that scales!

No responses yet