Microsoft will soon offer three additional ways for enterprises to store data on Azure, making the cloud computing platform more supportive of big data analysis.
Azure will have a data warehouse service, a “data lake” service storing large amounts of data, and an option for running “elastic” databases that can store sets of data that vary greatly in size, explained Scott Guthrie, Microsoft executive vice president of the cloud and enterprise group, who unveiled these new services at the company’s Build 2015 developer conference, held this week in San Francisco.
The Azure SQL Data Warehouse, available later this year, will give organizations a way to store petabytes of data so it can be easily ingested by data analysis software, such as the company’s Power BI tool for data visualization, the Azure Data Factory for data orchestration, or the Azure Machine Learning service.
Unlike traditional in-house data warehouse systems, this cloud service can quickly be adjusted to fit the amount of data that actually needs to be stored, Guthrie said. Users can also specify the exact amount of processing power they’ll need to analyze the data. The service builds on the massively parallel processing architecture that Microsoft developed for its SQL Server database.
The Azure Data Lake has been designed for those organizations that need to store very large amounts of data, so it can be processed by Hadoop and other “big data” analysis platforms. This service could be most useful for Internet of Things-based systems that may amass large amounts of sensor data.
“It allows you to store literally an infinite amount of data, and it allows you to keep data in its original form,” Guthrie said. The Data Lake uses Hadoop Distributed File System (HDFS), so it can be deployed by Hadoop or other big data analysis systems.
A preview of the Azure Data Lake will be available later this year.
In addition to these two new products, the company has also updated its Azure SQL Database service so customers can pool their Azure cloud databases to reduce storage costs and prepare for bursts of database activity.
“It allows you to manage lots of databases at lower cost,” Guthrie said. “You can maintain completely isolated databases, but allows you to aggregate all of the resources necessary to run those databases.”
The new service would be particularly useful for running public-facing software services, where the amount of database storage needed can greatly fluctuate. Today, most Software-as-a-Service (SaaS) offerings must overprovision their databases to accommodate the potential peak demand, which can be financially wasteful. The elastic option allows an organization to pool the available storage space for all of its databases in such a way that if one database rapidly grows, it can pull unused space from other databases.
The new elastic pooling feature is now available in preview mode.