Amazon Web Services hopes to entice more Hadoop users to its Elastic MapReduce service with new virtual servers, one of which has 262GB of memory and 6.4TB of storage for big-data analytics.
On Tuesday, the company launched 12 new virtual servers or instances that organizations can use to run their applications using Elastic MapReduce clusters. Potential applications include Web indexing, data mining, log file analysis, financial analysis, scientific simulation and bioinformatics research.
Hadoop is an open-source platform that allows for the distributed processing of large data sets across clusters of computers. The MapReduce framework assigns work to nodes in the cluster.
Amazon’s compute-optimized c3.8xlarge virtual server is aimed at tasks such as image processing. It has 32 vCPUs (virtual CPUs), 64GB of memory, two times 320GB of SSD storage and 10Gbps network connectivity. The price tag is US$0.270 per hour, plus from $1.680 for the corresponding EC2 (Elastic Compute Cloud) server.
The storage-optimized i2.8xlarge instance type is a good fit for analytics applications Impala, Spark and HBase, Amazon said. It has 32 vCPUs , 262GB of memory, eight times 800GB of SSD storage, and 10Gbps network connectivity. The cost is $0.270 per hour and from $6.820 per hour for the EC2 capacity.
One effective way to determine the most appropriate instance type is to launch several small clusters and benchmark them, according to Amazon.
In total, Amazon now has 25 Elastic MapReduce servers for users to choose between, which cost from $0.011 to $0.270 per hour plus the charge for EC2. Users are limited to 20 servers across all their clusters in the standard configuration. Those that want more need to ask Amazon for permission.
On Tuesday, Amazon also lowered the cost of existing virtual Elastic MapReduce servers by 27 percent to 61 percent. The price change is part of a general price drop that Amazon announced last week after Google cut the cost of its services.
The price war between public cloud providers shows no signs of abating, as Microsoft on Monday cut Azure pricing and also introduced a new basic service configuration.
Users who want to run Hadoop in a hosted environment have alternatives to Amazon’s Elastic MapReduce, including running Microsoft’s HDInsight on top of the company’s Azure cloud and Rackspace’s Cloud Big Data Platform.
An HDInsight system includes a head node and one or more compute nodes, and Microsoft offers one size of each type. The head node is available in the Extra Large (A4) size and costs $0.64 per hour, while the compute node runs on the Large (A3) virtual server and is priced at $0.32 per hour. The latter has 4 vCPUs and 7GB of memory, according to Microsoft.
Rackspace has joined forces with Hadoop specialist Hortonworks to offer a service that can compete with Amazon. The Cloud Big Data Platform is currently in a so-called Limited Availability program, which is the last step before Rackspace makes a service generally available. Prospective users have two options: a shared virtual server with 2 vCPUs, 7.5GB of memory and 1.3TB of storage or a dedicated node with 16 vCPUs, 60GB of memory and 11TB of storage. They cost $0.37 per hour or $2.96 per hour, respectively.
The thinking behind all these offerings is to take care of deployment and maintenance, making it easier for organizations to start using Hadoop. Companies can then focus on the core task, analyzing and extracting value from large amounts of data.