by Justin Hayes (Red Hat)
There is a lot of buzz these days around Big Data, and rightfully so. The volume of data produced and the number of sources producing it are growing faster and faster. Similarly, the potential for organizations large and small to harness these data cannot be understated, and should not be overlooked.
There is also a lot of noise when you look closer at the Big Data question, or to get right to the point, when you decide what your organization’s Big Data strategy should be. Here are some things to think about as you navigate the Big Data waters.
You are not Twitter
Do you really need to store and process data on the order of hundreds of TB or more? Many IT organizations easily convince themselves that they just cannot do without the hot new technology du jour. By all means, evaluate the applicability of Big Data to your environment. But realize that with adoption of radical new technology comes many costs – both actual and opportunity – in terms of infrastructure, skillset development, and IT process redesign. Only make those investments if you can show that you really need to store and process vast amounts of data, and that doing so will drive value.
Did you Really Mean to say NoSQL?
Do you need to store and retrieve a lot of data (e.g. audit logging) or do you need to do heavy analytics on your large data sets? Do not spend your time and resources developing a world-class Hadoop infrastructure and MapReduce/Hive/Pig/etc development expertise if you don’t need analytics. And don’t buy expensive Business Intelligence software if it solves a problem you don’t have. Why not opt instead for a NoSQL implementation that might offer less in terms of analytics and querying, but be entirely suitable for simpler storage and retrieval use cases.
Big Data Standalone or Big Data Integrated?
Some Big Data use-cases require just the infrastructure (ie vanilla Hadoop, MongoDB, Cassandra, etc installs), on which developers and data scientists can run their programs. Other use cases require integration of Big Data components into a larger, possibly legacy architecture. The former approach is certainly simpler, but it also leaves your Big Data infrastructure largely on its own island. The latter approach exposes more opportunities to leverage your investments, but it also introduces many significant challenges. Which existing components should you integrate with, how do you integrate them, are the technologies involved standards-based or proprietary? Using middleware definitely helps here, and we of course prefer JBoss. The bottom line is that you should clearly lay out a plan for capitalizing on your Big Data investments, and that plan must take into account your existing systems. Otherwise you risk missing out on further opportunities to drive value.
Today’s Big Data environment is not unlike Cloud circa 2009 – lots of promise, lots of murkiness, lots of players, lots of questions, lots of ill-informed decisions, and lots of spectacular successes. For that reason, your IT decision makers must filter out the noise, understand what Big Data actually is, and get to the heart of what it really means to you. Whether you do this completely in-house or with the assistance of someone like Red Hat Consulting, that is the best way forward with Big Data. In fact, it’s the only prudent way to make IT decisions regardless of the topic. Absent that, all you have is wasted effort and unfulfilled promises.