Professor Ram Gopal, a visiting faculty at the ISB, talks to Galit Shmueli, tenured Associate Professor of Statistics and Information Systems and SRITNE Chaired Professor of Data Analytics.
There is some confusion regarding the terminologies: data analytics, data mining, business analytics and business intelligence. Can you clarify if they are synonyms or if they have different meanings?
When I was growing up, there were no such words as there was hardly any data. The term “data mining” started creeping in only as more data started coming in. That is when computer scientists took up data mining, which involved writing more algorithms for sifting through the data. With progress occurring in data mining, the business community began to wonder about what they could do with all the data they had with them. This led to the coining of terms such as “business analytics” and “business intelligence”. Traditionally, data analytics encompassed fields such as statistics, data mining, artificial intelligence, and related areas. The area of business analytics brings all these fields together in a manner that relates directly to businesses and how to add value to them. If we look at “Google Insights for Search” and search these terms over the last few years, we find some very interesting trends. The first thing you will find is that basically, the terms “business intelligence” and “data mining” fluctuate almost together. Both these terms are also very highly searched. Other terms such as “business analytics” and “data analytics”, which are less popular, are also increasingly searched today. Geographically, India is number one in online search for these terms. Singapore takes the second spot. The United States has already got its feet into data analytics. There is huge potential in Asia because there is a lot of data that you can sift through and not too many people are doing that yet.
One debate that I see in the literature and in the community is that there seem to be two classes: One, the business intelligence, which is more backward-looking as it is based on understanding what has transpired in the past, whereas business analytics is more forward-oriented as it tells us how we can improve our business processes in terms of strategy and operations. Is that a distinction that you see?
It is indeed a big distinction both in the academic world and in the practical world. For instance, if you think about the tools that are out there, for example, software tools, and those that are geared more towards exploration – they are used to find descriptions of what is going on and for reporting the findings. In businesses at present, if you ask people what kind of analytics they use in their companies, 99% of them will say Excel tables and graphs. So reporting has been there for a long time. But there is a lot more that you can do even just at the level of describing what is going on. This is definitely one part of data analytics or business analytics. The other part is about making predictions, and that is where the term “predictive analytics” comes in. A lot of software vendors use “predictive analytics” when they are incorporating data mining and statistical methods. Statistics actually started with people asking questions about “what causes what?” and how we can figure out which factors cause which outcome. This was the focus of statistics for a very long time. As we move into data analytics, we not only want stronger tools to ask these questions but we also want predictive tools for asking, for example, what the next best movie is going to be. A very interesting quote talks about this distinction. “Statistics is about proving what you expect. Visualisation, and I will add, business analytics and data mining, etc., are about discovering what you didn’t expect.”
Statistics is about proving what you expect. Visualisation, and I will add, business analytics and data mining, etc., are about discovering what you didn’t expect.
One thought that comes to my mind is about the emerging trends of prediction. Is prediction somewhat different from forecasting? What is the relationship between the two?
Forecasting is basically a fancy word for prediction into the future. When we look at time series or a series of events over time or some particular measurements over time, we are trying to predict into the future. This would be termed “forecasting.” How forecasting differs from prediction is in the structure of the data, because in forecasting we are looking at a single, or sometimes, multiple series of measurements over time and predicting into the future. In contrast, “prediction” is typically used to denote modeling data from one sample of people to make predictions about another sample. This is called a cross-sectional data set. Since the data sets are of such different nature, we must have different types of algorithms and methods to carry out the prediction or the forecasting. Although forecasting and prediction may require the same approach since both fall under the same hat, the methodologies are going to be slightly different, taking into account the different nature of data.
I think in the business community and also amongst business students, there is a realisation that moving forward, data is going to be one of the key business resources. In fact, I have read reports that in most organisations, data is the most important resource after human resources. I see that the business community and business students see a lot of value in business analytics. The question that often comes up is whether these skills are technical skills that require companies to hire or consult with some technical experts, or are these business skills that business students and leaders need to have some knowledge of?
In the old days, you would have statisticians sitting in-house and collaborating with the local domain experts. But it was the statistician who took care of the data, especially in terms of analysis. That has been changing over time. In spite of all the rankings saying that statistics is one of the most desired jobs in the world, what businesses look for are people on the interface – who can basically see the analytics part and also understand the whole business context of things. From my experience with students at the University of Maryland, MBA students, who have taken a few courses in data analytics and data mining, are seen as unique in interviews, since they have business knowledge and familiarity with data analytics. Technical ability is required in data analytics but not the hard-core computer science type of technical ability. What is more crucial is an understanding of when things are going to work, what is out there, what are the options, etc. MBAs and business students can definitely gain a lot from taking data analytics courses – not the old paper-and-pencil type of courses but applied courses that involve real data sets and work on real projects. Through this, students will understand where their own limitations lie and the value they can bring to organisations with some knowledge in data analytics.
Essentially what you are saying is that students need to understand the domain so that they know how to leverage upon the analytics tools and figure out what kind of business issues can be dealt with using these tools. My follow-up question to that is are there particular job functions, organisations and industries where these skills are particularly useful or is the need more widespread?
I would say that today, almost every business, whether medium or large, in every possible area, has data. If you have the data, then it is worth seeing whether the data can bring value to the organisation. With that in mind, if you look at the types of jobs that are being offered in almost every industry, they are looking for data-savvy people. So I think it does not really matter which sector you are going into. It could be the government agencies, private sector, NGOs, etc – there is no distinction there. All these sectors have interesting data and there is almost always knowledge to be gained from the data, given that you have the skills. Because data by itself is not knowledge – it is a lot of noise and a little bit of signal.