What is online sharding?

Sharding refers to an architecture where data is distributed across commodity (cheap) computers. Sharding, or scale-out, compares favorably to scale-up which requires to replace existing servers with powerful but very expensive servers in dealing with very large data sets. The former is usually executed at a small fraction of the latter.

When more processing capacity is needed, more shard nodes need to be added, and vice versa. In adding or reducing the nodes, the existing data should be redistributed. This is called resharding.

When resharding can allow for non-stop data redistribution without pausing or interfering with existing applications, the resharding is called online resharding.

Most of online resharding is provided by NoSQL database vendors. It is very rare that relational database vendors provide online resharding. In case of relational databases, they provide either client-side sharding or server-side sharding, not both.

To provide the two way sharding requires a very sophisticated transaction sharing technology. Transaction sharing technology combines and processes transactions requested by both client-side sessions and coordinator/server-side sessions so that those multiple transactions can be handled as one logically shared transaction.

Altibase, a relational database, is unique in that it provides both client-side and server-side sharding. It is called hybrid sharding.

It analyzes SQL in the application program and automatically executes in the optimal path by analyzing whether to execute automatically by client-side sharding or server-side sharding.

Altibase’s hybrid sharding takes advantage of server-side sharding that can process complex data, while minimizing performance bottlenecks caused by coordinators, and taking advantage of client-side sharding that is advantageous in terms of expansion and performance. In doing so, it can mitigate the performance burden on coordinators, and the net result is to increase the overall performance of the scale-out by minimizing coordinator-related bottlenecks.