Non-breaking Elasticsearch Mappings Updates
A recent challenge I tackled was updating the mapper for a data field in the _source
of an Elasticsearch document on an active index, without affecting read and write paths. Aka, doing reindexing on the index on which read and writes were happening.
The task at hand was not just about adding a mapper; it was about doing so without any service disruption. Since Elasticsearch indexes are immutable, any change in the mapping requires the creation of a new index and the migration of data from the old one. This becomes a complex process, especially if the index is in read and write path.
Popular approach
The first solution that comes to most minds is creating a parallel index, NEW_INDEX
, with the updated mapping from the existing active CURRENT_INDEX
.
Post creating the NEW_INDEX
, data is reindexed _reidnex
from CURRENT_INDEX
to NEW_INDEX
, while writes continue to happen on both indexes. Once NEW_INDEX
catches up with CURRENT_INDEX
in terms of data, reads are gracefully switched from the latter to the former.
POST _reindex
{
"source": {
"index": "order-2024-02"
},
"dest": {
"index": "order-2024-02-re"
}
}
However, this approach can result in transient data loss and requires multiple code changes 😫.
Unexplored approach
There is another approach, not well-known to many, which offers a simpler solution for non-breaking changes. This involves using the _update_by_query
operation.
This powerful Elasticsearch operation updates documents that match a specified query—in our case, providing the updated mappings. If no query is provided, it updates every document in the data stream or index, effectively reindexing within the same index. It's important to clarify that _update_by_query
does not change the mappings but helps you adopting the new mappings.
POST order-2024-02/_update_by_query?wait_for_completion=true
{
"query": {
"term": {
"item_type": "TEST"
}
},
"script" : {
"source": "ctx._source.version += 1",
"lang": "painless"
}
}
Let’s break it down:
- You update the _mapping for your index directly on Elasticsearch.
- You then execute update_by_query on your index. This operation goes through every document in your index, checks it against your updated mappings, and writes it back to the index in the corrected form, all in an atomic operation.
Please note that this method works for adding new fields or updating existing analyzer for field, but it won’t work for changing the field type.
In this manner, you can update Elasticsearch mappings without disrupting your service, helping you to add a mapper to an existing field more smoothly and efficiently. This method not only eliminates the need for multiple deployments but also avoids the potential data leak that can occur when migrating to a new index and without any code changes.