Posts

Altibase announces that Seoul Transportation adopts Altibase for real time subway traffic control management

Altibase announces that Seoul Transportation adopts Altibase for real time subway traffic control management

24/7 uninterrupted operation and 99.999% high availability were the key reasons in its choice of database.

Use Case: Traffic Signal Control System

Current Customers: Seoul Transportation Corporation

Industry: Public

Who

Seoul Transportation Corporation is a public entity responsible for the city’s subway transit. The daily average number of subway transit passengers operated by the entity is 6.8 million vs. New York (5.6 million) & Paris (4.2 million). It has a total subway length of 300 km, making it the fourth longest in the world.

Problem

The corporation was in the process of overhauling its subway traffic system with a view to improving the safety and the ability to better act on emergency situations. The corporation looked for a database that could meet the following requirements:

  • Since subway traffic facilities should not tolerate any system disruption/failure, they should be supported by databases with the highest degree of product stability
  • 24/7 uninterrupted operation and 99.999% high availability
  • Even in the event of the system going down abnormally, all data should be automatically saved and protected.
  • Data should be collected and stored in real time so that prompt actions can be taken in case of system abnormalities/failures.
  • The database should provide and adhere to strict technical SLAs.
  • The database should have a track record of having been deployed for other mission critical applications.
  • The database should be relational as all of the corporation’s systems are built on relational DBMSs.
  • The database costs should be within its IT budget.

Solution

In 2018, the corporation chose Altibase after a series of BMTs of various databases.

  • Altibase’s previous deployments in other subway traffic systems was a major appeal.
  • Altibase’s in-memory capabilities allow for real time data collection and anaylsis.
  • Using Altibase’s move technology, data is stored on disk when it become stale and historical.
  • Database induced system downtime has been eliminated with Altibase’s HA and disaster recovery solutions that have features of in-memory based real time replication for synchronous/asynchronous data duplication and in-memory based real time backup and recovery.
  • Altibase’s pricing met the corporation’s budget requirement.
  • Altibase provides hybrid partitioned tables that can manage a large set of  data log while simultaneously ensuring high performance.

Results

  • Altibase has a demonstrated history of providing glitch free operation in the past 10 years in the all subway traffic signal control systems it has served.
  • Altibase’s active-active and active-standby replication function allows for stable and interruption-free operation through data duplication in the event of any system failure/outage.
  • The corporation is now able to back up its data through Altibase’s checkpoint image file and redo log file.
  • Data loss could be prevented via Altibase’s durability assurance even with abnormal DBMS shutdown.

After nearly 20 years as a closed source database, Altibase is now open source, and that includes its state-of-the-art sharding.

Learn more about Altibase at https://youtu.be/pooexk0glK8, and download its open source database including sharding at http://13.124.221.141.

Wikiwand explains what sharding is and enumerates examples of key sharding providers including Altibase

Shard (database architecture)

This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article’s tone or style may not reflect the encyclopedic tone used on Wikipedia. See Wikipedia’s guide to writing better articles for suggestions. (May 2010) (Learn how and when to remove this template message) This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (May 2010) (Learn how and when to remove this template message) (Learn how and when to remove this template message)

database shard is a horizontal partition of data in a database or search engine. Each individual partition is referred to as a shard or database shard. Each shard is held on a separate database server instance, to spread load.

Some data within a database remains present in all shards,[notes 1] but some appears only in a single shard. Each shard (or server) acts as the single source for this subset of data.[1]

Database architecture

Horizontal partitioning is a database design principle whereby rows of a database table are held separately, rather than being split into columns (which is what normalization and vertical partitioning do, to differing extents). Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.

There are numerous advantages to the horizontal partitioning approach. Since the tables are divided and distributed into multiple servers, the total number of rows in each table in each database is reduced. This reduces index size, which generally improves search performance. A database shard can be placed on separate hardware, and multiple shards can be placed on multiple machines. This enables a distribution of the database over a large number of machines, greatly improving performance. In addition, if the database shard is based on some real-world segmentation of the data (e.g., European customers v. American customers) then it may be possible to infer the appropriate shard membership easily and automatically, and query only the relevant shard.[2]

Disadvantages include:

Main section: Disadvantages

  • A heavier reliance on the interconnect between servers[citation needed]
  • Increased latency when querying, especially where more than one shard must be searched.[citation needed]
  • Data or indexes are often only sharded one way, so that some searches are optimal, and others are slow or impossible.[clarification needed]
  • Issues of consistency and durability due to the more complex failure modes of a set of servers, which often result in systems making no guarantees about cross-shard consistency or durability.[citation needed]

In practice, sharding is complex. Although it has been done for a long time by hand-coding (especially where rows have an obvious grouping, as per the example above), this is often inflexible. There is a desire to support sharding automatically, both in terms of adding code support for it, and for identifying candidates to be sharded separately. Consistent hashing is a technique used in sharding to spread large loads across multiple smaller services and servers.[3]

Where distributed computing is used to separate load between multiple servers (either for performance or reliability reasons), a shard approach may also be useful.

Shards compared to horizontal partitioning

Horizontal partitioning splits one or more tables by row, usually within a single instance of a schema and a database server. It may offer an advantage by reducing index size (and thus search effort) provided that there is some obvious, robust, implicit way to identify in which partition a particular row will be found, without first needing to search the index, e.g., the classic example of the ‘CustomersEast‘ and ‘CustomersWest‘ tables, where their zip code already indicates where they will be found.

Sharding goes beyond this: it partitions the problematic table(s) in the same way, but it does this across potentially multiple instances of the schema. The obvious advantage would be that search load for the large partitioned table can now be split across multiple servers (logical or physical), not just multiple indexes on the same logical server.

Splitting shards across multiple isolated instances requires more than simple horizontal partitioning. The hoped-for gains in efficiency would be lost, if querying the database required both instances to be queried, just to retrieve a simple dimension table. Beyond partitioning, sharding thus splits large partitionable tables across the servers, while smaller tables are replicated as complete units.[clarification needed]

This is also why sharding is related to a shared nothing architecture—once sharded, each shard can live in a totally separate logical schema instance / physical database server / data center / continent. There is no ongoing need to retain shared access (from between shards) to the other unpartitioned tables in other shards.

This makes replication across multiple servers easy (simple horizontal partitioning does not). It is also useful for worldwide distribution of applications, where communications links between data centers would otherwise be a bottleneck.

There is also a requirement for some notification and replication mechanism between schema instances, so that the unpartitioned tables remain as closely synchronized as the application demands. This is a complex choice in the architecture of sharded systems: approaches range from making these effectively read-only (updates are rare and batched), to dynamically replicated tables (at the cost of reducing some of the distribution benefits of sharding) and many options in between.

Notable implementations

  • Altibase: provides combined(client-side and server-side) sharding architecture transparent to client applications.
  • Apache HBase: HBase provides automatic sharding.[4]
  • Azure SQL Database Elastic Database tools: enables the data-tier of an application to scale out and in via industry-standard sharding practices[5]
  • Couchbase: provides automatic transparent sharding as well as extreme performance.
  • CUBRID: allows sharding from version 9.0
  • Elasticsearch: enterprise search server provides sharding capabilities.[6]
  • eXtreme Scale: a cross-process in-memory key/value datastore (a variety of NoSQL datastore). It uses sharding to achieve scalability across processes for both data and MapReduce-style parallel processing.[7]
  • Hibernate Shards: provides for shards, although there has been little activity since 2007.[8][9]
  • IBM Informix: IBM has allowed sharding in Informix since version 12.1 xC1 as part of the MACH11 technology. Informix 12.10 xC2 added full compatibility with MongoDB drivers, allowing the mix of regular relational tables with NoSQL collections, while still allowing sharding, failover and ACID properties.[10][11]
  • Kdb+: allows sharding from version 2.0
  • MonetDB: the open-source column-store MonetDB allows read-only sharding as its July 2015 release.[12]
  • MongoDB: allows sharding from version 1.6
  • MySQL Cluster: Auto-Sharding: Database is automatically and transparently partitioned across low-cost commodity nodes, allowing scale-out of read and write queries, without requiring changes to the application.[13]
  • MySQL Fabric (part of MySQL utilities) includes sharding capability.[14]
  • Oracle Database Sharding: introduced as new feature in Oracle RDBMS12c Release 2 and in one liner: Sharding is a data tier architecture in which data is horizontally partitioned across independent databases.[15]
  • Oracle NoSQL Database: has automatic sharding and elastic, online expansion of the cluster (adding more shards).
  • OrientDB: allows sharding from version 1.7
  • Solr enterprise search server: provides sharding capabilities.[16]
  • Spanner: Google’s global-scale distributed database, shards data across multiple Paxos state machines to scale to “millions of machines across hundreds of datacenters and trillions of database rows”.[17]
  • SQLAlchemy ORM: a data-mapper for the Python programming language that provides sharding capabilities.[18]
  • The DWH of Teradata: a massive parallel database

Disadvantages

Sharding a database table before it has been optimized locally causes premature complexity. Sharding should be used only when all other options for optimization are inadequate. The introduced complexity of database sharding causes the following potential problems:

  • Increased complexity of SQL – Increased bugs because the developers have to write more complicated SQL to handle sharding logic.
  • Sharding introduces complexity – The sharding software that partitions, balances, coordinates, and ensures integrity can fail.
  • Single point of failure – Corruption of one shard due to network/hardware/systems problems causes failure of the entire table.
  • Failover servers more complex – Failover servers must themselves have copies of the fleets of database shards.
  • Backups more complex – Database backups of the individual shards must be coordinated with the backups of the other shards.
  • Operational complexity added – Adding/removing indexes, adding/deleting columns, modifying the schema becomes much more difficult.

These historical complications of do-it-yourself sharding were addressed by independent software vendors who provided automatic sharding.

Etymology

The word “shard” in a database context may have been introduced by the CCA‘s “System for Highly Available Replicated Data”.[19] There has been speculation[20] that the term might be derived from the 1997 MMORPG Ultima Online and Richard Garriott, creator of Ultima Online claims as much,[21] but the SHARD database system predates this by at least nine years.

However, the SHARD system appears[22] to have used its redundant hardware only for replication and not for horizontal partitioning. Present-day use of the term “shard” may or may not derive from the CCA system, but in any case it refers to a different use of redundant hardware in database systems.

Altibase supports data partioning for easier management of a large dabase object

Does Altibase support data partitioning?

Yes. Data partitioning is the division of a large database object into several small pieces for easier management. Objects can be partitioned in three ways: list, range and hash.

Sep. 5. 2018

Altibase’s Partitioning Technology

These three ways of partitioning can be selectively used according to data size, data type, data usage, data density and other database requirements.

3 Types of Altibase’s Partitioning

 data-partitioning

Description:

  • List partitioning: Each partition is defined and selected based on the membership of the column value in one of lists composed of discrete values.
  • Range partitioning: Each partition contains rows for which the partitioning expression value lies within a given range.
  • Hash partitioning: Used to ensure an even distribution of data

 

Altibase scales vertically and horizontally. Altibase scales up via auto-extend in-memory tables and scales out via sharding

Does Altibase support vertical and horizontal scaling?

Sep. 4. 2018

Altibase can scale vertically and horizontally without any issues. Altibase can automatically account for the addition of new RAM and faster processors. In doing so, there will be an immediate performance increase. Altibase also can scale out horizontally via sharding. Please refer to the manual “Altibase Sharding Guide” for further information.

Vertical Scaling

Altibase’s products are designed for optimal vertical scalability. A key feature is Altibase’s ability to auto-extend in-memory tables.

10_How does Altibase handle scale-up and scale-out-1


Horizontal Scaling via Sharding 

  • Altibase sharding utilizes both of server-side and client-side sharding architecture.
    • Applications do not need to change anything.
    • Applications only need to know the IP and port of the meta DB of Altibase sharding.
  • Altibase sharding is fast.
    • A shard connection gives execution order of a distributed query directly to a shard node without any intervening middleware.
  • Altibase sharding supports SQL standard.
    • distributed query
    • non-distributed query

Sharding-Image-v1

Altibase – Downloading is Believing.

Altibase prevents data loss with 4 different durability levels so that users can control a balance between performance and durability

How does Altibase prevent data loss?

Aug. 30. 2018

Altibase provides multiple durability[1] levels so that a user can select an optimal durability with consideration of its business. For example, Altibase Enhanced Durability level guarantees no data loss with abnormal DBMS shutdown while Altibase Strict Durability level recovers 100% data without any data loss even with abnormal hardware shutdown including power failure.

Durability depends on how to write transaction redo log on disk. The four durability levels are decided on transaction redo log journal mechanism. The four durability levels are ‘No Durability’, ‘Relaxed Durability’, ‘Enhanced Durability’ and ‘Strict Durability’. With multiple durability levels, a user can control a balance between performance and durability.

No Durability

When a table is created in a volatile tablespace, this table works based on No Durability. A volatile tablespace is a kind of tablespaces Altibase supports in which all data is volatile so the data is not recoverable when Altibase restarts. No transaction redo log is written in No Durability. This shows the highest performance among the four durability levels.

Relaxed Durability

A transaction does not wait until logs have been written to disk and a memory log buffer is used

Set COMMIT_WRITE_WAIT_MODE and LOG_BUFFER_TYPE to 0 and 1, respectively. With this method, transactions store their update logs in a memory log buffer, and the log flush thread itself flushes the logs in the log buffer to the log file. Although there will be no data loss with normal shutdown, there could be a bounded data loss with abnormal DBMS shutdown using this durability level.

FAQ_Fig1

 

Enhanced Durability

A transaction does not wait until logs have been written to disk and a kernel log buffer is used.

Set COMMIT_WRITE_WAIT_MODE and LOG_BUFFER_TYPE to 0 and 0, respectively. With the default Altibase durability property settings, update logs are stored in the log buffer of the OS kernel area, and transactions do not wait until their update logs have been written to the log file. The Enhanced Durability level guarantees no data loss with abnormal DBMS shutdown, but there could be a bounded data loss when there is abnormal hardware shutdown such as power failure.

FAQ_Fig2

 

Strict Durability

A transaction waits until logs have been written to disk and a kernel log buffer is used.

Set COMMIT_WRITE_WAIT_MODE and LOG_BUFFER_TYPE to 1 and 0, respectively. With this method, transaction update logs are stored in the log buffer of the OS kernel area, and logs for committed transactions are written directly to a log file. The Strict durability level recovers all of data without any data loss even with abnormal hardware shutdown including power failure. This shows the lowest performance among the durability levels.

FAQ_Fig3

[1] Durability, which is one of the four properties of a transaction, means that after a transaction has been committed, the committed transaction must be guaranteed, even if a database failure occurs before the changed data are physically written to disk.

Altibase – Downloading is Believing.