Visualize an iceberg; the data problem to be solved is like the iceberg. The solution, or part of the story, is the words that are above the water and the data along with the rest of the story are the larger part that’s underneath the water. What’s underwater is the part that the analyst knows but no one else does – yet. The data and the story are both desirable and both are needed.
Analysis
Big Data: Trough of Disillusionment
Gartner’s Hype Cycle for Emerging Technologies report gives us an evaluation of a range of technologies that either have the potential to, or are already, transforming the way organizations do business. It is a careful analysis of how business views each technology, and once it makes its way through the entire cycle its either consigned to the scrapheap or becomes an accepted part of the business tech landscape (think Windows, Office, Smartphones, etc.).
The report covers dozens of far out technologies ranging from robots to 3D bio-printing systems to ‘software-defined anything’, but one of the biggest technological advances in recent years, Big Data, demands a closer look.
According to Gartner, Big Data has now (2014) officially passed the “peak of inflated expectations”, and is now on a one-way trip to the “trough of disillusionment”. Gartner says it’s done so rather rapidly, because we already have consistency in the way we approach this technology, and because most new advances are additive rather than revolutionary. In 2014 Gartner estimated it would take 5-10 years for Big Data to reach the “plateau of productivity.”
Gartner Hype Cycle: Interpreting Technology Hype
When new technologies make bold promises, how do you discern the hype from what’s commercially viable? And when will such claims pay off, if at all? Gartner Hype Cycles provide a graphic representation of the maturity and adoption of technologies and applications, and how they are potentially relevant to solving real business problems and exploiting new opportunities. Gartner Hype Cycle methodology gives you a view of how a technology or application will evolve over time, providing a sound source of insight to manage its deployment within the context of your specific business goals. –Gartner.com
Data and Your Gut May Both Be Needed for Decision Making
Have you ever experienced a nagging feeling before you’re about to finally make a big decision? You’ve weighed all the data, you’ve considered every angle, but something is keeping you from moving forward. Rather than ignore that nagging feeling and forge ahead, Shelley Row says we need to get to the bottom of it.
Row, an author and expert on executive decision-making, addressed a group of loss prevention professionals gathered for a recent NRF Conference. In researching and interviewing executives about their decision-making process, Row related what she heard to neuroscience and the mechanics of how we use different parts of our brain to make different kinds of decisions.
For most of us in this data-driven world, the biggest problem in the workplace is overthinking decisions, which wastes time and compounds stress. The key to relieving this pressure, Row said, is to find the balance between using data and intuition to make decisions.
Row explained that as we weigh all the data that goes into making a complex decision, the logic and language part of our brain is working hard, accessing our working memory, which is limited to a handful of things. But that nagging feeling? That’s a different part of the brain that’s shut down while we agonize over the pros and cons and facts and stats. Row described that nagging feeling as all the other experience and intelligence we’ve gained that we’re not able to access simultaneously.
This is not saying to throw out the data just because it says something you don’t like. Intuition based on solid experience though is something you shouldn’t ignore.
The important thing is to be self-aware enough to know when this is happening so you can discover the insight behind a gut feeling. “You have to resolve the nagging feeling to solve overthinking,” Row said. Investigate the root of that nagging feeling — does it feel like there’s something you’re missing, or is it simply that you’re afraid of something? It could be a fear worth overcoming or an invaluable insight locked away in your brain.
The next time you’re circling a complex decision with a lot of data involved, resist the urge to stall by gathering one more data point, and instead probe deeper to understand what your gut feeling is all about.
Based on an article by Jennifer Overstreet, June 2016
Reporting and Analysis Are Not the Same Thing
Businesses often confuse analysis with reporting. Some invest in complex analytical tools when they really need to streamline the reporting process; others pursue reporting capabilities when they should be seeking in-depth analysis for their data.
The reality is that both reporting and analysis are critical to the success of any business, and deploying an integrated approach to both is key to ensuring they complement each other in a way that generates the most valuable results.
| Reporting | Analysis |
| Reporting is a way of providing information about what’s happening in your business. Good reports enable you to ask the right questions about your business. | Analysis helps you answer questions by enabling a deeper dig into your data to understand the drivers and root causes behind performance metrics. |
| Reporting provides a 30,000-foot view of set metrics. | Analysis is flexible and gets deep into the weeds to uncover valuable insights. |
| Reporting takes time. Many companies allocate the first few days of their month and first few weeks of their quarter for many of their employees to assemble reports. | Analysis requires more energy and motivation than time and can be more valuable. Successful analysis deserves an exhaustive dig through data to find hidden opportunities and areas needing improvement. |
| Reports are often considered urgent. Without careful planning, reporting often crowds out analysis. | Reporting may be urgent, but analysis is important, but because it often involves going the extra mile to find the hidden insights, there’s rarely the same urgency to perform analysis on similar deadlines. |
| Often businesses confuse reporting for analysis, and vice-versa. | Reporting and analysis must work together for a profitable business. |
| Reporting is routine. If you spend ten hours working on a report, the report gets finished. You have something to show for it at the end of the day. | Analysis, on the other hand requires innovative thought, anticipation of future outcomes and confidence to strive for change. If you spend ten hours on analysis, there’s no guarantee you will have an answer. |
Because companies are busy and often pressured into constant action, they trend toward the easy and familiar way out with reporting. This translates into spending time churning out report after report rather than trying to actually analyze what’s going on and make a difference within the company. Reporting raises questions; analysis answers them. You can’t have one without the other.
Reproducibility of Results
Many analytical models are built with the idea that the model built today will be good in the future. If the results are not reproducible then the predictive models are worthless. It is essential for the project team to identify reasons why their models will or will not work in the future. Moreover, it is also a good idea to define boundaries within which the model will operate properly.
For instance, consider this fictitious model of salary of professionals
Salary = 1,000 * Years of Experience + 5,000
This mathematical equation says that if someone has infinite years of experience they will have the infinite salary. We know this is incorrect. The above model for salary is possibly correct in the boundary of 0 to 30 years of experience. Yet, most models in business systems are implemented without defining the boundaries for the effectiveness of those models.
Missing Data & Outliers
Missing data is a reality of virtually every business data set. In statistics classes, you are told to replace missing data with either the average or some other sophisticated value generated through regression or other techniques. At times, this process of replacing missing values becomes so mechanical that the analysts tend to forget that there could be a reason why data is missing.
Missing data or absence of something in certain cases can be strong evidence in itself. This is particularly true in risk and fraud analytics. At the beginning of the analytics projects, it is a good idea to scrutinize missing data and identify if there are compelling clues hiding within them.
Another problem for analysis, highlighted by every statistics textbook, is outliers. Outliers are the observations that are extremely dissimilar to the studied population. For instance, if you are studying the net wealth of individuals on the planet then Bill Gates is an outlier.
One of the strategies to deal with outliers is data transformation i.e. taking the log or the square root of all the observations. This narrows the data to a ‘normal’ range. At other times, outliers can be removed from the data being analyzed. This is a good strategy in many cases but is equally ineffective in several others. For example, in several marketing analytics applications, it is a good idea to create different segments of the population and create a separate model for each segment.
Not Identifying the Right Variables
After identification of the right question(s) for a business analytics problem, the next step is to identify the right data and variables to work with.
“Assume you want to build a model to predict job satisfaction for employees. In any human resources system, the easily available and highly quantifiable metrics are income, bonus, levels, promotions, etc. But we all know from our experience that job satisfaction is a highly complicated phenomenon and can barely be predicted with just these variables. However, when one builds this model there is a greater temptation to just use the easily available variables. The ability to identify the right set of variables at the beginning of the project differentiates a good analyst from the rest. Identification of variables requires a good understanding of the domain and lots of creativity. Creativity helps in generating derived variables from the available data in the business systems.”
– Roopam Upadhyay May 2, 2016
Eagerness to Solve Problems
Thinking about data in a scientific way is at the core of a successful analytics project that produces a competitive edge for the organization. Yet, there are several reasons why analytics projects fail to create sound outcomes for an organization.
On Facebook people post something like:
Identify a word that starts and ends with the letter ‘r’
Almost always hundreds of users on this social media site immediately start answering this question. Every now and then someone asks, “Why is this an important question?” In this setting, if someone does ask this, he or she is considered a spoilsport.
Still, there is something extremely interesting happening here. Humans are wired, particularly by schooling, to answer questions without questioning the question. We see a problem and we need to solve it. This is a dangerous strategy for analytics projects.
Identification of the right business problem is at the core of successful analytics projects. Not every business problem is equality important, and many problems are not even worth putting any effort into. Always ask why the problem you are solving is important and don’t start your project until you have a satisfactory answer.
Data Analytics is About Finding Facts
Data analytics is about finding facts. What we do with those facts is comparable to what we do with any other tool. I have a good hammer and a great screwdriver in my tool chest. They don’t make me a carpenter. The best pots and pans won’t make me a chef either.
A manager who gets a data report showing that he or she needs more sales reps can’t blame the data if he or she then hires poor sales reps. Another manager might ignore the portions of an analyst’s report that don’t fit his or her own preconceived notions and biases. And, there are bound to be managers who blindly follow the data without really testing the findings for validity and scalability.
These are human failings that could lead us to blame the data when, in fact, they are the mistakes that people make. It’s not about the data; it’s often about the people who are not strong quantitatively.
As more companies embark on big data analytics strategies, have mercy on the good data that ends up in the hands of bad managers. Some of these projects will fail in grand fashion. Some will soar to mediocrity. Some will be home runs. Business is still populated by fallible humans, at least for today.
