Framsteg Think Tank

Big Data in easy words

Written by Tristan Poetzsch

font size decrease font size increase font size
Print
Email

Preface

Big Data is a Buzzword and everybody got something to say about it. But to gain an understanding of what really lies behind this empty phrase, I would like to take you on a journey. Don’t be afraid, I will try to make everything understandable and memorable, even if you don’t know a single line of code. Just to be clear here: This article is written for Non-IT people who are interested in this topic.

The first step on our journey is to understand what Big Data actually means for the most part of the people out there. Fact is, that the amount of digitally computable data is increasing very fast (nothing new here). Understanding and utilizing even just a small part of this data can be quite an opportunity for business. This data may let you understand customers better and provide them with fitting products or even open up entirely new business models. And that is why Big Data is such a big deal. A 37 billion dollar market big deal.
Notice that I have not given a definition of Big Data, but rather a glimpse at its desired results. Most IT guys would define Big Data like this: “So much data it can’t be easily processed with state-of-the-art technology.” But that doesn’t help us much. All it tells us is that state-of-the-art technology has to change (which is exactly what is happening). These technological development will be explained further down the line, but let us take a step at a time.

Laying out our travel plans – Gaining an overview on Big Data

Going forward, we first need to establish an understanding of the basics when talking about data in general. Let me share an anecdote from my experience on why even a simple distinction may add extreme value to your understanding of Big Data and your business decisions: Once upon a time in a major company, the executive board wanted to forge a ring to rule them all and make big money (obviously some ominous Big Data project). They hired an expensive computer company to help them. So the company came in on a big project, charged many hours and concluded in a simple statement: “You don’t have the data sources to provide the data that you want to be computed!” Millions were wasted, just because the companies executives didn’t know that they lacked the data sources and don’t need a computation company. And sadly, no ring was forged. Sorry about that.

To not let that costly error happen to you, think about data and it’s surrounding as a five-fold framework:

To understand what the five parts are made of, let me explain them a little bit further to you.

Data Source: The data we want to analyse has to come from somewhere. Classically someone types data points into an excel sheet and then sends that off into the nirvana of a SAP system. But in times of the Internet and the Internet of Things (IoT), data can come from a lot of spaces, like online databases, production machines, cameras or even space robots! Want to read more?
- Don’t know what to cook today? Ask your Fridge! About the Internet of Things and its drivers.

The actual data: Pretty much everything can be data. Literally, every piece of digitalized information is data. But there are very different kinds of data, for example an Excel sheet is something entirely else than a video file. To differentiate very roughly, we speak of structured and unstructured data. The first is what most people mean when they refer to data: sheets with numbers, neatly ordered in columns and rows. The latter one is roughly everything else, reaching from magazine articles to patient records, video and audio files or reports of any kind. Want to read more?
- Numbers would never lie, would they? Yes, they would! The GIGO principle and statistics.

Data memory: Here is where it gets technical, but I won't go into details. You have to understand that all the data has to be stored somewhere to make it available for computing. These storage systems are called databases. The different techniques of how data is stored and ordered in these databases are currently one of the hottest themes in IT (just to do some required namedropping: SAP HANA, Hadoop and all that stuff falls under this category). Want to read more?
- Anarchy in the UK – Understanding database technologies and why NoSQL became an alternative to relational databases

Computation: All the data in the world is worth nothing if you don’t know what to do with it. Therefore the goal is to read through the data and make some sense out of it. This is where math and statistics come into play. And since you don’t want to do all the nasty numbers stuff yourself, data gets computed by statistical algorithms on computers. Here is where all that terminator-style AI stuff is happening (with buzzwords like neuronal networks or machine learning flying around like crazy).

Output & Visualization: Results of the computation have to be presented in some way. Depending on your software, this can be quite different. Whilst a statistical program like R or SPSS only provides you with cryptic tables, modern business software tries to make the data understandable in an easy-to-read dashboard.

This concludes our little introduction into this matter. Further down the line, we will discuss different parts and their subdomains in more detail, understand current trends and think about their implications.

Thanks to 8icons for the free icons!