Probably not. As Hadoop is an open source software project that can be used to efficiently process large datasets. Instead of using one large computer to process and store the data, Hadoop allows clustering commodity hardware together to analyse massive data sets in parallel. There are many applications and execution engines in the Hadoop ecosystem, providing a variety of tools to match the needs of your analytics workloads. Whereas AWS is a cloud computing service offered by Amazon. AWS allows users to set up HADOOP environment in their rented infrastructure to perform the required activities.
It’s like renting a commercial space and then using it for office or shop or warehouse or anything commercial you want. But in case of storage, AWS is far more ahead of Hadoop. Following are the points which help S3 in succeeding over HDFS:
Cost: - So in terms of storage cost alone, S3 is 5X cheaper than HDFS. Based on our experience managing petabytes of data, S3’s human cost is virtually zero, whereas it usually takes a team of Hadoop engineers or vendor support to maintain HDFS. Once we factor in human cost, S3 is 10X cheaper than HDFS clusters on EC2 with comparable capacity.
Elasticity: - S3 is elastic, HDFS is not. SLA (Availability and Durability): - With cross-AZ replication that automatically replicates across different data centres, S3’s availability and durability is far superior to HDFS’.
To summarize, S3 and cloud storage provide elasticity, with an order of magnitude better availability and durability and 2X better performance, at 10X lower cost than traditional HDFS data storage clusters. Hadoop and HDFS commoditized big data storage by making it cheap to store and distribute a large amount of data. However, in a cloud native architecture, the benefit of HDFS is minimal and not worth the operational complexity. That is why many organizations do not operate HDFS in the cloud, but instead use S3 as the storage backend.
For more such updates, stay tuned to RightCloud Blog.