Understanding Elasticsearch Sharding and Routing: A Deep Dive into Reindexing

Savan Nahar
2 min readFeb 20, 2024

--

imageGen on it’s work

Recently, I’ve worked a lot on Elasticsearch and optimising Elasticsearch in general. One integral component of Elasticsearch’s impressive performance is its sharding and routing mechanism. This article will delve into the nitty-gritty of how Elasticsearch handles routing, specifically during reindexing.

Uncovering Elasticsearch Routing

Through extensive time spent on Elasticsearch, several pivotal insights were gained into Elasticsearch’s routing mechanism during indexing and reindexing activities. Here they are:

Insight 1:

The first discovery is that should you opt not to pass routing during an indexing activity, Elasticsearch will take care of it for you. If you’ve set routing as a part of your _ingest pipeline, then, surprisingly, it won’t be treated as a priority during indexing. Instead, Elasticsearch will follow a default process of routing based on the document’s ID.

Insight 2:

The second loophole was found during document reindexing. If your document has already been indexed using a routing process, it will have information about routing in its _routing field. During reindexing, Elasticsearch uses this information to ensure that the document lands in the same shard as it originally resided in.

Insight 3:

The final intriguing revelation came about while using _update_by_query. This Elasticsearch operation allows you to reindex all documents matching your query. In this situation, if _routing is already present during primary indexing, then that information is preserved even after reindexing, ensuring that your read and write operations are not affected.

The Code Behind the Curtain

Exploring the source code of _update_by_query clarifies some of these nuances. This function copies the routing of the newly searched document to its corresponding new `IndexRequest`.
Here’s the transpiring code: copyMetadata

This abstract is further overridden in the `TransportUpdateByQueryAction` file. See the source code

Conclusion
A deep understanding of Elasticsearch’s routing mechanisms is essential for optimized usage and overall better performance. As this exploration shows, the operations around indexing and reindexing particularly offer insights into the nitty-gritty of how routing is handled. Such insights can prove beneficial for developers and data engineers who regularly deal with Elasticsearch.

--

--

Savan Nahar
Savan Nahar

Written by Savan Nahar

Building software that scales!

No responses yet