Big Data and Cloud Computing
The rise of Big Data would not have been possible without the on-demand offerings of Cloud Computing. Both virtualization and cloud computing made it possible to scale capacity and reduce costs of data storage and processing. Clusters of computing and storage nodes are deployed within a private or public cloud to analyze large volumes of data streams that fluctuate with time and may occasionally explode. Cloud computing offers the possibility of deploying additional virtual machines thereby ensuring that Big Data analysis is not delayed by even unusually huge datasets, while avoiding having to upgrade the database or permanently add expensive hardware.
While virtualization abstracts the underlying physical hardware, offering higher level services such as cloning a data node, high availability to a specific node, or user controlled provisioning, the clouds offer a collection of virtualized hardware with additional services such as additional resources on-demand (IaaS), various computing platforms (PaaS) or catalogs of software (SaaS). A single Hadoop image can be easily cloned, and the storage needed and computing resources expand ondemand. Public clouds differ from private clouds in that they may provide scale cost benefits but at the detriment of loss of control, privacy, security or data custody. Examples of public cloud offerings for Big Data are the Amazon Elastic MapReduce and the solutions-google-compute-engine-cluster-for-hadoop.