End of Days
Greenplum Days has come to a close and the last two sessions showed they saved the best for last. The Expansion/Fault Tolerance talk was far and away the pinnacle learning session, very much highlighting how 4.0 fixes some of the pain points in 3.3.x version and you can see the foundational building blocks for some interesting things coming into place. The following talk was the MAD Skills: Advanced Analytics – Driving the Future of Data Warehousing and Analytics. I had the notion I was biting off way more than I could chew by stepping into those session, since I address thing more from a systems side. I was surprised to find though that I followed right along with how scalable vectors and K-means clustering could be applied to immediately to the business we do at Adknowledge in a variety of ways.
I’m also on board with the tenets of the MAD (Magnetic, Agile, Deep) skills ideology, the Magnetic tenet being the one I identify with the most. This tenet really thrives on the idea of an approachable omnivorous database that ingests all data that is thrown at it. Not because you bend and shape the data to shoehorn it into the database, but that the database is large and powerful enough to soak up all the data as is and internalize it and allow people to manipulate it. As someone who really likes data I’m always suspect of data that I know has been massaged before it’s put anywhere. I want my data in the rawest form so I can massage and decide what is important and what isn’t, not be held to what someone wrote as an ETL process six months ago thought was important. Once you have all that this raw data in a system it is hard to resist getting in and screwing with it and trying to pick out insights. If the bar to access this data is low those of us who like to find insight the data it becomes a magnet. In order to build even deeper insight these victims that have been sucked in want to add more data to enhance their new models. This new data and insights derived from it adds to the magnetism and you fall into a nice circular cycle. Eventually nobody knows which came first, the people and their insights or the data. It doesn’t really matter because they keep piling on and feeding off each other. I’ve seen this work and I’ve seen this happen on a much smaller scale. It’s only now as the processing power, storage, bandwidth and great database implementations form a magical brew can we now see this begin to take life in a much grander scheme.
Data manipulation, storage and sharing are going to see some big changes over the next few years. Core beliefs in how we store things and what we do with them are going to be repeatedly challenged. It’s going to be a very exciting few years for data. I’m looking forward to it.