It’s all about the Big Data

Posted by scottk on March 1, 2012 in Ramblings |

From what I’m seeing it looks like the big companies out there have snatched up about every person out there that have managed to touch the Hadoop stack and the tech marketing machine is in overdrive. I’m spending a majority of my days (and some nights) dealing with Hadoop and Greenplum so I’ve got some skin in the big data market. Things are going a little bit crazy. I had someone mention that they are having a consultant bring big data to their company.. on a 1U server with two 146GB drives. Come on now, someone needs to slow this bus down before it goes over the cliff. We don’t need every hardware vendor to have a server setup that’s optimized for Hadoop. Cisco announced it now has a server targeted at Hadoop, are you kidding me? Let’s remember Hadoop is really aimed to run on a large cluster of the cheapest set of servers you can put together from your local computer recycle center. Is your cluster not running fast enough, drag that C64 out of the closet to add some task capacity and be sure to add in the tape drive as storage for a datanode. There is huge buzz around Hadoop VM and storage appliances for data stores. WAIT! The reason we are doing this in the first place is so the processing is close to the data and you don’t have to push data all around the network. Sure I see the advantage of scaling NAS appliances, believe me the ability to snap only block level changes on a petabyte of data for backups would be REALLY nice. Doesn’t that kill the whole idea of gaining speed by colocating the data and the processing in the same hardware? Sure it does, but I’m sure the vendors selling those solutions don’t want to go to deeply into that conversation. They want to show you that $100k rack of disks with a really cool face plate on it.

Big data is here and not only can we process data we thought was previously unprocessable, but we can also persist data that we was such a firehose previously we didn’t even think it was possible to hold on to it. This is all really cool shit, but it’s not the second coming and the hype machine needs to pop a pill and calm down a little. Most companies will never collect data on that level. There’s going to be a lot of small to midsize companies out there investing in big data infrastructure when a 2U opensource sql server with a ton of SSD drives (which is actually getting to be a reasonable price) will do everything they need much quicker than a distributed data warehouse.

Copyright © 2006-2024 SimpIT.com All rights reserved.
This site is using the Desk Mess Mirrored theme, v2.5, from BuyNowShop.com.