What’s all the fuss about in-memory databases for IoT?

What’s all the fuss about in-memory databases for IoT?


Is sharding not as popular or more difficult with Relational/SQL databases?

Monitor MySQL metrics with Datadog.
Graph and set alerts on MySQL performance, plus data from the rest of your apps + infrastructure.
Janko Jerinic
Janko Jerinic, MSc Electrical Engineering & Computer Science, University of Belgrade, School of Electrical Engineering (2009)

Well, yes and no.

Most traditional RDBMS’s, like Oracle, SQL Server, MySql, Postgres, et al, are designed to be standalone, single servers and, as such, they do not have internalmechanisms that provide sharding functionality by default.

That doesn’t mean that application-level sharding is impossible with them and, in fact, many large distributed systems have done exactly that – companies like Quora, or Facebook. So, you can indeed horizontally scale storage and load across multiple RDBMS’s, it just doesn’t come out of the box, and the performance is fine. You can’t do joins across different “shards”, or servers, but neither can you using a NoSql, naturally sharded, database.

However, there actually are RDBMS’s who support sharding naturally, to some extent. Think about it – SQL is basically just an interface on top of a query optimizer, query executor, and a storage engine. There is nothing to prevent a SQL database from implementing internal sharding.

Two examples that come to my mind are – Amazon Redshift and Microsoft APS Parallel Data Warehouse. The first one I use quite a bit and the second one I actually worked on. Both of these systems will allow you to define partition keys which they will use to distribute data across so-called compute nodes. SQL queries are executed by collocating that distributed data as necessary and aggregating results of execution from multiple nodes. Now, these systems aren’t infinitely scalable – they come in predefined cluster sizes, but the data is, indeed, partitioned and there is nothing to prevent a relational system from being fully distributed. In fact, we do have one – Google’s Spanner.

Greg Kemnitz
Greg Kemnitz, Postgres internals, embedded device db internals, MySQL user-level

In order to support relational joins, sharding in distributed relational data-worlds is usually implemented at the application or “policy” level, as cross-instance joins are generally very expensive and slow.

A simple example would be that users whose names start with ‘A’ go to Instance 1, those with ‘B’, go to instance 2, etc. All the data associated with each group of users would live on the separate instances, with the idea that during normal production operations, there’s no need to execute cross-instance queries.

Most NoSQL databases essentially skip the whole concept of joins, and store all the data associated with a particular thing in a Document. Since all the data you need is in a particular document, you can shard based on one of the main attributes of your documents, and using this Shard Key, you can know which instance in your dataworld the document you care about happens to be in.

This data model works very well for certain types of applications, but less well for applications where the data is basically tabular and joining is routine. So, in very large data-worlds, you often end up with a hybrid of NoSQL and SQL databases.

But it is worth knowing that it is very important to choose shard keys carefully in both distributed relational and in multi-instance NoSQL deployments.

Daniel Kuck-Alvarez
Daniel Kuck-Alvarez, I hold a BS degree in Computer Science

Relational databases do not usually have a mechanism for sharding.

Sharding refers to a data storage practice in which groups of data are kept on separate systems. For example, you may choose to keep data on half of your users on one database and keep the data about the other half on a separate database stored on a separate machine.

The objective of sharding is to reduce the work each machine has to do. Many NoSQL databases have sharding built in and it is transparent to the software accessing the data. The software has one endpoint to reach the data and doesn’t know which database it is actually accessing.

Most relational databases do not have this capability built in (maybe none of them do). If software wants to shard across two relational databases, it has to keep track of the connections to each database separately.

The major relational databases have master-slave configuration options so that many separate databases may be kept in sync. In these configurations, each database has a full copy of the data, not just a portion of it. The software still has the task of deciding which database to access, but whichever it chooses, the database will have the complete data set. 

David Brower
David Brower, Husband, dad, programmer/architect, occasional blogger, onetime sound engineer.

Sharding is one form of partitioning, and most DBMS’s have partitioning. What’s interesting about sharding is use across multiple machines for horizontal scaling. Horizontal scaling in relational DBMS’s is difficult. Only a few do it well, and they are commercial and people don’t like to pay for it, so they’ll settle for the more relaxed semantics of sharded, eventually consistent distributed databases that don’t do transactions as well, or do joins nicely.

There’s a rich development field going right now trying to put decent enough SQL on top of weaker data storage engines.

David Kittrell
David Kittrell, Sr Consultant, E-discovery & Data Visualization

Read up on Teradata. While I agree with other responses regarding popularity and generic DBMS, Teradata technology from the 80s implemented physical sharding methods to support large-scale DBs (the “terabyte” size implied by the name) at a time when a few hundred megabytes were big commercial data stores. After bouncing around from AT&T, NCR. etc., they are still found in high-end analytics domains. Nice tech and a true DB innovator.

6 Ways Cloud Technology and Big Data are Changing Supply Chain Management


Intel officially launches its beta program in partnership with Altibase and other IMDBs

Intel Optane DC Persistent Memory, what it is and how it works. Go to the beta program

Intel officially launches the beta program: hardware manufacturers and partner companies will be able to preview the new Intel Optane DC memories. What they are and why they are destined to change the storage market by speeding up the processing in the cloud

We have often talked about Intel Optane : it is a technology on which the Santa Clara company has invested so much and is increasingly being offered in our country too.
As explained in the Intel Optane Memory article or how to accelerate the performance of the system those currently available on the market are memory modules that drastically accelerate the performance of systems based on traditional hard drives and SATA SSDs .

Intel Optane DC Persistent Memory, what it is and how it works.  Go to the beta program

Although Intel has recently decided to take a step back and let Micron develop the XPoint 3D technology on which Optane is based ( Xpoint 3D technology will pass into Micron’s hands ), the company will continue to develop new products.

Intel Optane DC Persistent Memory is a bit considered as the evolution of Intel Optane Memory because it also combines skills aimed at the persistent preservation of data : Intel presents Optane DC memories that can be used both as RAM modules and as SSD .Today Intel has announced that it has started the “beta program” focused on Intel Optane DC Persistent Memory : this means that every cloud provider and every partner company (OEM) will be able to use a preview of the new memories that will be officially brought to the market during the first half year 2019 .

Thanks to the new Intel Optane DC Persistent Memory, combined with the new Xeon processors of the next generation, companies can revolutionize the ways in which heavy workloads, cloud processing, databases and high performance computing are managed: it becomes possible to bring performance to a much higher level, thanks to the ability to store and move data in memory quickly .

Intel Optane DC Persistent Memory, what it is and how it works.  Go to the beta program

Used in the App Dire mode , applications can count on performances never seen before thanks to the possibility of keeping in store, in a persistent way, an important amount of information. The Memory mode mode , however, allows you to use Intel Optane DC as a volatile memory using it as additional memory capacity than that offered by the RAM modules installed at the motherboard level.
It will thus be possible to have an additional, extremely fast, large storage capacity up to 512 GB . All this without the need to change a line of software side code.

Intel partner companies that have embraced Intel Optane DC Persistent Memory right from the start are Alibaba, Cisco, Dell EMC, Fujitsu, Google Cloud, Hewlett Packard Enterprise, Huawei, Lenovo, Oracle and Tencent.
Intel is also working with the most important software developers to optimize their solutions so they can take full advantage of using Intel Optane DC. The companies that have decided to join are, for the moment, Aerospike, Altibase, Apache Spark, AsiaInfo, Cassandra, DataBricks, Gigaspaces, IBM, Microsoft, Red Hat, RedisLabs, RocksDB, SAS, SAP, Sunjesoft, SuSE, Ubuntu, Virtuozzo and VMWare.