An overview of the strengths and weaknesses of today’s cloud database management systems.
The fundamental principle of cloud systems is a focus on multiple, disposable, and replaceable machines. This has direct consequences for the implementation techniques, and therefore the capabilities of database systems implemented in the cloud.
Traditional databases can be roughly classified as parallel-first (for example, MongoDB or Teradata) or single-system first (for example, PostgreSQL or MySQL), often with scale later (for example, Redshift, Greenplum). Each category has limitations inherent to its core design. The extent of these limitations is partially a function of maturity. However, for certain core architectural decisions, particular features may not be efficiently supportable.
For example, Greenplum has sequences, but Redshift does not, despite both being PostgreSQL derivatives. BigQuery has no sequences, but Teradata does (although they aren’t truly sequential, in the traditional sense).
Cloud databases fall into the same categories, with a distinct bias towards parallel-first for new systems. The fundamental properties of cloud systems are parallelism for scale and replaceability of machines.
Within the single-system-first category, cloud instantiations tend to focus on managed cost, upgrade, and reliability (RPO/RTO) of the traditional single-machine product, such as Heroku PostgreSQL, Amazon Aurora (PostgreSQL/MySQL), Google Cloud SQL (PostgreSQL/MySQL), and Azure SQL (SQL Server).
Within the parallel category, there are effectively two subcategories: the SQL/relational category (BigQuery, Snowflake, Redshift, Spark, Azure Synapse) and the DHT/NoSQL (BigTable, Dynamo, Cassandra, Redis) category. This distinction has less to do with the presence or absence of a SQL-like language and more to do with whether the physical layout of the data within the system is tuned for single-row access by hashing for fast lookups on a key, or bulk access using sort-merge and filter operations.
Parallel-first relational databases will often rely on one or more native cloud storage systems. These storage systems are always built parallel-first, and expose a very limited get-object/put-object API, which typically allows for partitioning of data, but does not allow high performance random access. This limits the ability of the database to implement advanced persistent data structures such as indexes, or, in many cases, mutable data.
As a result, cloud implementations using native storage tend to rely on sequential reading and writing of micropartitions instead of indexes. There tends to be exactly one physical access path to a storage-level object, based on the object name. Indexes would have to be implemented externally to the underlying storage, and even when this is done, the underlying cloud storage API may make it hard to make practical use of an address or byte-offset into a storage-level object.
Strengths of the cloud
The infrastructure is managed for you. In the cloud, deployment, reliability, and administration are somebody else’s problem. All layers of the stack from the power, software installation, and hardware to operating system management and security (from hardening to intrusion detection) are managed by the cloud vendor.
The convenience of cloud vendor free trial offerings to get you up and running initial experiments and then gracefully scale up to massive scale if required is something that is difficult at best in traditional on-prem systems.
Another benefit is that cloud vendors offer many standardized processes to integrate with third party SaaS products. The result is the cloud vendor makes infrastructure somebody-else’s-problem so you can focus on your core business.
Efficiency. The cloud lives by maximizing resource utilization. It is far more common for a cloud system to expose the resource utilization controls to the database application than for a non-cloud system. Load can be smoothed, moved to low-demand time slots, and interactive and business-critical jobs can be prioritized.
Of course cloud vendors can exploit the efficiencies of purchasing at scale, load sharing, and very high utilization ratios. These scale arguments alone can make the case for moving to the cloud. Let alone the benefits of using the expertise of the vendor for hardening and intrusion detection.
Closely related to scale is the ability for cloud vendors to cheaply provision passive storage, which makes it easier to keep longer historical windows of data, whether for experimental or analytical reasons, or for backup or audit, and more cost-effective to implement features like time travel, where data may be inspected from a historical perspective.
And of course, heavy data processing loads can be solved by temporarily scaling out using the cloud vendor’s scale (at a cost to the user, of course).
Economics. Aside from the economics of scale and efficiency, the accounting mechanisms of cloud vendors tend to expose the cost data of storage and processing down to the individual query level. This allows the user to make a rational business decision about the cost-benefit of any given piece of analysis, and make optimization decisions accordingly. Indeed sometimes the business might decide it’s cheaper to use the scale of the cloud to be bigger and “simplistic” in how an analysis is structured rather than spending the time, and mental energy, to sculpt a “robust analysis” (one that is cheaper and maybe more accurate).
Weaknesses of the cloud
The infrastructure is managed for you. The cloud has a very different set of failure domains from, for example, a Z-series mainframe. Distributed computation on the cloud, which is a shared substrate (compute, storage, networking), is subject to many more perturbations, and any one of these may cause a failure of interactivity or a transient job failure. Even automated management by a cloud vendor can, on rare occasions, negatively impact a customer experience by changing the properties or behavior of a system.
Efficiency. Most cloud databases are still immature compared with traditional on-prem systems. Cloud databases lack features of more mature products. Some features may never be introduced because the concept of a fully distributed, failure-prone platform makes them impractical.
Many cloud-based parallel relational systems have a greatly reduced efficiency for specific database mutation (INSERT
, UPDATE
, DELETE
) operations, which can cause a problem in certain use cases.
Of course the additional latency between cloud and on-premises systems or systems hosted in other clouds will tend to force consolidation of cloud infrastructure. Users tend to be forced to choose a geographical location and provider first, and then are effectively limited to services within that provider.
Economics. The cost of cloud follows a very different curve from on-premises deployment: It is very easy to expand capacity. It is so easy that controlling cost becomes more difficult. On the other hand, if cost is capped, then interactive jobs submitted after a cost cap is reached may be rejected. This adds a layer of complexity that traditional database administrators will need to learn in order to create a successful deployment.
And, of course, vendor lock-in is just as prevalent in the cloud as elsewhere. Migration between clouds is no easier than migration between on-premises systems.
There are so many offerings to choose from and no single offering has all the features. The most important first steps are to identify the fundamental properties or behaviors of all the required workflows, and ensure that the cloud vendor chosen has the ability to provide them all—potentially each behavior from a different, but at least weakly integrated, product from their suite. Do not expect to see a single product like Oracle or Teradata that does “everything” for the price.
This article originally appeared on InfoWorld.