Big Data is one of those concepts that has been floating around for a few years now. Without the need to fully understand how it works, everyone grasps that the technology has huge promise: conceptually we all realize that there is a lot of data been generated in our day to day jobs, and it would be great if we could utilize it more effectively. Perhaps because of this promise, Big Data has become a “buzz word” that sales personnel will throw at their customers to project an image of “cool” and “fresh” tech companies, and this is contributing to massive confusion about what the technology can achieve. In fact, not without reason, in many cases customers roll their eyes at the very mention of the concept.
There are many factors that help create confusion in the space:
- First, Big Data is a big space. There is not one singular technology – there are over a hundred different tools: Hadoop, Mongo, Spark, Impala, Storm…you name it. Many of them offer the same functions, and it’s hard to differentiate and keep up with the pace if you’re not following the Big Data world every day.
- The technology is accessible and it’s relatively easy to build a prototype, but not so easy to productize. Almost anyone with some IT skills can deploy a “Big Data” cluster, but then its performance might be quite poor. Or, it simply might not be maintainable. Proper design and implementation requires a good understanding of what tools to use, how to tune the cluster and how the data will be used.
- Big Data doesn’t do magic. Many people believe that just by hoarding data they will magically see fantastic results popping up the minute they deploy these solutions; however, this is a misconception.
There are two ways to use Big Data. The first is the one-off analysis to discover insights. The second way is having continuous operations, where insights are generated automatically. Most of the Big Data players operate in the first category. The second category nowadays requires significant development work and subject matter expertise. And, it is precisely the most relevant use case for telecom engineers.
Many service providers we speak with identify Big Data with the Data Lake concept. In other words, their use for Big Data is to store more data, more efficiently. This is a good step in the right direction, but it’s not a goal in itself. In an article published by MIT Technology Review, they noted that 99.5% of the newly generated data is never analysed. The value is not so much in collecting data, but in how to use that data.
As CSPs mature in their use of Big Data, they realize that they need to process this data continuously, and do so fast. They realize they need to implement mechanisms to automatically detect complex situations, and react to them in (near) real-time. The relevant data must be ready when a problem arises. The business cannot afford to wait for multiple minutes, or even hours to get the result from a query. Data needs to be pre-processed to make this possible.
Generic Big Data distribution packages can’t offer much help at this stage. You still need to gather the data, parse it and create your processes around it. That means custom development work and services, which are expensive – if you’re lucky to find the resources that have the knowledge about all the different components.
At this moment, Big Data will no longer be a tool used by a few specialized gurus, but a tool for a large number of users that are trying to solve business problems. In summary, a solution that operates at this stage would have the following requirements:
- A continuous processing framework that analyses data as it arrives, not on demand
- Embedded specialized knowledge of how to interpret the underlying data
- The ability to respond automatically to special situations in the data such as anomalies, faults and operational events
- Simplified management and operation
Reaching this maturity level in the telecom space is much more complicated than in other disciplines, due to the complexity, volume and the variety of the data sources. Typical data scientists are at a loss when it comes to interpreting protocol level information, or understanding what a performance counter might be telling you. Telecom vendors, on the other hand, possess the specialized knowledge to understand the data, but can’t adopt the necessary solutions and architectures quickly enough.
Connecting the two worlds
In Tupl, we have built a platform that creates a bridge between the traditional IT and traditional telecom space. Our crew of engineers have that rare mix of Big Data + Telecom expertise that is required to bring these technologies to life in an operational environment. And, we are planning to help develop these competencies through a combination of R&D training programs and the creation of support solutions for engineers.
Stay tuned for our next blog, where we will be writing about our University Collaboration programs.