Greenplum vs Hadoop Disk Space

Posted by scottk on May 24, 2010 in Ramblings | ∞

I’ve been spending a whole lot of time calculating Greenplum vs Hadoop disk usage. So here the general equation

(MaxAllocFactor * DiskSize * ( #Disk – RaidDisks ) ) / ReplicationFactor

MaxAllocFactor = Max recommended allocation. 70% for Greenplum and 75% for Hadoop

DiskSize = Size of your drive

#Disk = Number of drives

RaidDisks = Number disk eaten up by RAID, for Hadoop this is 0

ReplicationFactor = Greenplum everything is mirrored for replication factor is 2. Hadoop recommends three copies of data thus it gets a replication factor of 3.

So let’s look at a 24 drive array attached storage, we’ll use 500GB drives.

(MaxAllocFactor * DiskSize * ( #Disk - RaidDisks ) ) / ReplicationFactor

Greenplum: ( .70 * 500GB * ( 24 - 4 ) ) / 2 = 3.5 TB effective space

Hadoop:Â ( .75 * 500GB * ( 24 - 0 ) ) / 3 = 3.0 TB effective space

Next we’ll look at single server, let’s say a 1U with 4 3.5″ 2TB drives

Greenplum: ( .70 * 2TB * ( 4 - 1 ) ) / 2 = 2.1 TB effective space

Hadoop: ( .75 * 2TB * ( 4 - 0 ) ) / 3 = 2 TB effective space

How about a single 2U server with 12 1TB drives

Greenplum: ( .70 * 1TB * ( 12 - 2 ) ) / 2 = 3.5 TB effective space

Hadoop: ( .75 * 1TB * ( 12 - 0 ) ) / 3 = 3 TB effective space

So what does this mean? It means that you shouldn’t run laughing to the bank on your backend savings by choosing Hadoop over Greenplum, given you plan to use the same storage architecture. Greenplum and Hadoop are two very different technologies so comparing the two is kind of silly in the first place. They fall into the same category of processing large datasets in the same manner that a Ford F350 and Mazda Miata are both cars. They will both get you down that road, but in an entirely different manner.

Don’t talk to me about compression factors, everyone wants to say how their grandmother in Pensacola got 20x compression on system X. System X never happens to be my system, so I’ve stopped drinking the compression factor koolaid.

... other posts by S.Kahler

Tags: BigSpace, Greenplum, Hadoop

1 Comment

SimpIT.com » Blog Archive » Gluster’s got it wrong says:

August 25, 2011 at 7:34 am

[…] saving is going to be very small and going to be similar to the exercise I went through comparing Greenplum to Hadoop disk usage, which is really not that much. So this as a selling point of using Gluster as a replacement for […]

Comments are closed. Would you like to contact the author directly?

SimpIT.com

Greenplum vs Hadoop Disk Space

1 Comment

Search

Junk Pile

Misc