SUBSCRIBE

Four companies rethink databases for the cloud

Several companies are developing new database technologies to solve what they see as the shortcomings of traditional, relational database management systems in a cloud environment. Four of them described the approaches theyโ€™re taking during a panel at the GigaOm Structure conference on Thursday.

The basic problem theyโ€™re trying to solve is the difficulty of scaling todayโ€™s RDBMS systems across potentially massive clusters of commodity x86 servers, and doing so in a way thatโ€™s โ€œelastic,โ€ so that an organization can scale its infrastructure up and down as demand requires.

โ€œThe essential problem, as I see it, is that existing relational database management systems just flat-out donโ€™t scale,โ€ said Jim Starkey, a former senior architect at MySQL and one of the original developers of relational databases.

Starkey is founder and CTO of NimbusDB, which is trying to address those problems with a โ€œradical restartโ€ of relational database technology. Its software has โ€œnothing in common with pre-existing systems,โ€ according to Starkey, except that developers can still use the standard SQL query language.

NimbusDB aims to provide database software that can scale simply by โ€œplugging inโ€ new hardware, and that allows a large number of databases to be managed โ€œautomaticallyโ€ in a distributed environment, he said. Developers should be able to start small, developing an application on a local machine, and then transfer their database to a public cloud without having to take it offline, he said.

โ€œOne of the big advantages of cloud computing is you donโ€™t have to make all the decisions up front. You start with whatโ€™s easy and transition into another environment without having to go offline,โ€ he said.

NimbusDBโ€™s software is still at an โ€œearly alphaโ€ stage, and Starkey didnโ€™t provide a delivery date Thursday. The company expects to give the software away free โ€œfor the first couple of nodes,โ€ and customers can pay for additional capacity, he said. Its product is delivered as software, rather than a service, but not open-source software, Starkey said.

Xeround aims to solve similar problems as NimbusDB but with a hosted MySQL service thatโ€™s been in beta with about 2,000 customers and went into general availability last week, said CEO Razi Sharir. It, too, wants to offer the elasticity of the cloud with the familiarity of SQL coding.

โ€œWeโ€™re a distributed database that runs in-memory, that splits across multiple virtual nodes and multiple data centres and serves many customers at the same time,โ€ he said. โ€œThe scaling and the elasticity are handled by our service automatically.โ€

Xeround is designed for transactional workloads, and the โ€œsweet spotโ€ for its database is between 2GB and 50GB, Sharir said.

Its service is available in Europe and the U.S., hosted by cloud providers including Amazon and Rackspace. While Xeround is โ€œcloud agnostic,โ€ cloud database customers in general need to run their applications and database in the same data centre, or close to each other, for performance reasons.

โ€œIf your app is running on Amazon East or Amazon Europe, youโ€™d better be close to where weโ€™re at. The payload [the data] needs to be in the same placeโ€ as the application, he said.

Unlike Xeround, ParAccelโ€™s software is designed to run analytics workloads, and the sweet spot for its distributed database system is โ€œaround the 25TB range,โ€ said CTO Barry Zane.

โ€œWeโ€™re the epitome of big data,โ€ he said. ParAccelโ€™s customers are businesses that rely on analyzing large amounts of data, including financial services, retail and online advertising companies.

One customer, interclick, uses ParAccel to analyze demographic and click-through data to let online advertising firms know which ads to display to end users, he said. It has to work in near real-time, so interclick runs an in-memory database of about 2TB on a 32-node cluster, Zane said. Other customers with larger data sets use a disk-based architecture.

ParAccel also lets developers write SQL queries, but with extensions so they can use the MapReduce framework for big-data analytics.

โ€œSQL is a really powerful language, itโ€™s very easy to use for amazingly sophisticated stuff, but thereโ€™s a class of things SQL canโ€™t do,โ€ he said. โ€œSo what youโ€™ve seen occurring at ParAccel, and frankly at our competitors, is the extensibility to do MapReduce-type functions directly in the database, rather than try to move terabytes of data in and out to server clusters.โ€

Cloudant, which makes software for use on-premise or in a public cloud, was the only company on the panel that has developed a โ€œnoSQLโ€ database. It was designed to manage both structured and unstructured data, and to shorten the โ€œapplication lifecycle,โ€ said co-founder and chief scientist Mike Miller.

โ€œApplications donโ€™t have to go through a complex data modelling phase,โ€ he said. The programming interface is HTTP, Miller said. โ€œThat means you can sign up and just start talking to the database from a browser if you wanted to, and build apps that way. So, weโ€™re really trying to lower the bar and make it easier to deploy.โ€

โ€œWe also have integrated search and real-time analytics, so weโ€™re trying to bring concepts from the warehouse into the database itself,โ€ he said.

The companyโ€™s software is hosting โ€œtens of thousands of applicationsโ€ on public clouds run by Amazon EC2 and SoftLayer Technologies, according to Miller.

Cloudant databases vary from a gigabyte all the way to 100TB, he said. Customers are running applications for advertising analytics, โ€œdatamart-type applications,โ€ and โ€œunderstanding the connections in a social graph โ€” not in an [extract, transform and load] workflow kind of way using Hadoop, but in real time,โ€ he said.

While cloud databases can solve scaling problems, they also present new challenges, the panelists acknowledged. The quality of server hardware in the public cloud is โ€œoften a notch down,โ€ said Zane, so companies for whom high-speed analytics are critical may still want to buy and manage their own hardware, he said.

And while many service providers claim to be โ€œcloud agnostic,โ€ the reality is often different, Miller said. Cloud software vendors need to do โ€œa lot of reverse engineeringโ€ to figure out what the architectures at services like Amazon EC2 look like โ€œbehind the curtain,โ€ in order to get maximum performance from their database software.

Still, Sharir and Zane were both optimistic that โ€œbig data analyticsโ€ would be theโ€killer applicationโ€ for their products. For Starkey it is simply โ€œthe Web.โ€

โ€œEveryone on the Web has the same problem, this very thin pipe trying to get into database systems,โ€ he said. โ€œDatabases donโ€™t scale, and it shows up in a thousand places.โ€

Tech Jobs

Categories