Several companies are developing new database technologies to solve what they see as the shortcomings of traditional, relational database management systems in a cloud environment. Four of them described the approaches theyโre taking during a panel at the GigaOm Structure conference on Thursday.
The basic problem theyโre trying to solve is the difficulty of scaling todayโs RDBMS systems across potentially massive clusters of commodity x86 servers, and doing so in a way thatโs โelastic,โ so that an organization can scale its infrastructure up and down as demand requires.
โThe essential problem, as I see it, is that existing relational database management systems just flat-out donโt scale,โ said Jim Starkey, a former senior architect at MySQL and one of the original developers of relational databases.
Starkey is founder and CTO of NimbusDB, which is trying to address those problems with a โradical restartโ of relational database technology. Its software has โnothing in common with pre-existing systems,โ according to Starkey, except that developers can still use the standard SQL query language.
NimbusDB aims to provide database software that can scale simply by โplugging inโ new hardware, and that allows a large number of databases to be managed โautomaticallyโ in a distributed environment, he said. Developers should be able to start small, developing an application on a local machine, and then transfer their database to a public cloud without having to take it offline, he said.
โOne of the big advantages of cloud computing is you donโt have to make all the decisions up front. You start with whatโs easy and transition into another environment without having to go offline,โ he said.
NimbusDBโs software is still at an โearly alphaโ stage, and Starkey didnโt provide a delivery date Thursday. The company expects to give the software away free โfor the first couple of nodes,โ and customers can pay for additional capacity, he said. Its product is delivered as software, rather than a service, but not open-source software, Starkey said.
Xeround aims to solve similar problems as NimbusDB but with a hosted MySQL service thatโs been in beta with about 2,000 customers and went into general availability last week, said CEO Razi Sharir. It, too, wants to offer the elasticity of the cloud with the familiarity of SQL coding.
โWeโre a distributed database that runs in-memory, that splits across multiple virtual nodes and multiple data centres and serves many customers at the same time,โ he said. โThe scaling and the elasticity are handled by our service automatically.โ
Xeround is designed for transactional workloads, and the โsweet spotโ for its database is between 2GB and 50GB, Sharir said.
Its service is available in Europe and the U.S., hosted by cloud providers including Amazon and Rackspace. While Xeround is โcloud agnostic,โ cloud database customers in general need to run their applications and database in the same data centre, or close to each other, for performance reasons.
โIf your app is running on Amazon East or Amazon Europe, youโd better be close to where weโre at. The payload [the data] needs to be in the same placeโ as the application, he said.
Unlike Xeround, ParAccelโs software is designed to run analytics workloads, and the sweet spot for its distributed database system is โaround the 25TB range,โ said CTO Barry Zane.
โWeโre the epitome of big data,โ he said. ParAccelโs customers are businesses that rely on analyzing large amounts of data, including financial services, retail and online advertising companies.
One customer, interclick, uses ParAccel to analyze demographic and click-through data to let online advertising firms know which ads to display to end users, he said. It has to work in near real-time, so interclick runs an in-memory database of about 2TB on a 32-node cluster, Zane said. Other customers with larger data sets use a disk-based architecture.
ParAccel also lets developers write SQL queries, but with extensions so they can use the MapReduce framework for big-data analytics.
โSQL is a really powerful language, itโs very easy to use for amazingly sophisticated stuff, but thereโs a class of things SQL canโt do,โ he said. โSo what youโve seen occurring at ParAccel, and frankly at our competitors, is the extensibility to do MapReduce-type functions directly in the database, rather than try to move terabytes of data in and out to server clusters.โ
Cloudant, which makes software for use on-premise or in a public cloud, was the only company on the panel that has developed a โnoSQLโ database. It was designed to manage both structured and unstructured data, and to shorten the โapplication lifecycle,โ said co-founder and chief scientist Mike Miller.
โApplications donโt have to go through a complex data modelling phase,โ he said. The programming interface is HTTP, Miller said. โThat means you can sign up and just start talking to the database from a browser if you wanted to, and build apps that way. So, weโre really trying to lower the bar and make it easier to deploy.โ
โWe also have integrated search and real-time analytics, so weโre trying to bring concepts from the warehouse into the database itself,โ he said.
The companyโs software is hosting โtens of thousands of applicationsโ on public clouds run by Amazon EC2 and SoftLayer Technologies, according to Miller.
Cloudant databases vary from a gigabyte all the way to 100TB, he said. Customers are running applications for advertising analytics, โdatamart-type applications,โ and โunderstanding the connections in a social graph โ not in an [extract, transform and load] workflow kind of way using Hadoop, but in real time,โ he said.
While cloud databases can solve scaling problems, they also present new challenges, the panelists acknowledged. The quality of server hardware in the public cloud is โoften a notch down,โ said Zane, so companies for whom high-speed analytics are critical may still want to buy and manage their own hardware, he said.
And while many service providers claim to be โcloud agnostic,โ the reality is often different, Miller said. Cloud software vendors need to do โa lot of reverse engineeringโ to figure out what the architectures at services like Amazon EC2 look like โbehind the curtain,โ in order to get maximum performance from their database software.
Still, Sharir and Zane were both optimistic that โbig data analyticsโ would be theโkiller applicationโ for their products. For Starkey it is simply โthe Web.โ
โEveryone on the Web has the same problem, this very thin pipe trying to get into database systems,โ he said. โDatabases donโt scale, and it shows up in a thousand places.โ