In this article, Professor Galit Shmuéli explores the salient characteristics of predictive analytics in the context of retail, debunks certain myths and suggests untapped regionally-specific new possibilities.
Where Business Meets Statistics
Technological advancements and the availability and accessibility of data within companies are facilitating fact-based and insightful decision-making. With closer connections among business experts, statisticians and data engineers, advanced analytics can be implemented effectively in the business context. A major business analytics technology is predictive analysis.
What exactly is predictive analytics and what distinguishes and underscores its recent prominence and usage in driving business success? We have business intelligence, which includes reporting tools and dashboards that leverage data by making it more visible to decision makers. Yet, business intelligence is about looking at the past or the present. The key distinguishing factor of predictive analytics is its forward-looking approach. In predictive analytics, we use past data for the purpose of predicting future events.
A second distinguishing feature of predictive analytics is the focus on micro-level decisions. Unlike the focus of statistical models on “average behaviour” or a high-level estimation of aggregate patterns, predictive analytics are designed to work on the granular level of individual customers, individual transactions, individual suppliers, individual employees, etc. Will a particular employee churn? Will a particular customer redeem an offer? Will a particular supplier fail to ship in time? Predictive analytics applied to large pools of customer data holds the promise of personalisation. We can predict future behaviour, and based on such predictions, customise the best offers, timing and other parameters of customer interaction and care. In a diverse country such as India, where variability is at its core, such personalisation to customer preferences carries especially great weightage as compared to some of its western counterparts.
Granularity of data in performing predictive analytics is not just restricted to understanding diverse customer needs, but also transcends to the market side into store-level data, SKU level in retail and individual transaction level in services in retail operations. Mobile, e-commerce and even modern brick and mortar retailing are data intensive already, and that makes it easier to work on predictive analytics, as data generation and collection are now more feasible.
To gain useful results, it is critical to have a close connection among predictive analytic techniques, data, and business needs and requirements. The process of predictive analytics implementation, therefore, begins with the most difficult step of translating a business problem or challenge into a predictive analytics problem. Let us look at a schematic of the entire process that a company would go through in implementing a predictive analytics project. To make the process more concrete, consider the case of an online retailer such as Flipkart.com who offers cash on delivery (COD).
Problem Identification: Once a business challenge or opportunity is identified, it must be translated into a precise predictive analytics formulation. This means defining specific desired measurable outcomes. Recall the online retailer offering COD. Needless to say, the risk of COD to the retailer is higher than other payment options. Business goals might include reducing the number of COD transactions (by converting to another mode of payment), reducing failed cash collections, reducing turnaround time or even reducing defective or wrong deliveries. Each of these objectives would require formulating a different predictive analytics goal. In the case of reducing failed cash collections, we could formulate the predictive analytics task of predicting the chance of payment for each new transaction.
Measurement and Data: The selection of outcome and predictor measurements depends on the formulated predictive analytics problem. We must determine which measurements are of interest and are available not only now, but also at the time of deployment, when predictions will be generated. The outcome measurement is the one that we are trying to predict. In the COD example of predicting the probability of payment, the outcome measurement is whether a customer paid or did not pay (a “pay/ no-pay” measurement). The predictor variables are measurements that we suspect are correlated with the outcome, but in a predictive fashion. This means measurements that give an indication of the outcome before it occurs. In the COD examples, perhaps age, gender and prior purchase history can be indicative of the chance of payment in a future transaction.
Concerns about data have evolved from “Do we have sufficient data?” to “How do we get access to these data in a timely manner?” and “How do we integrate data from multiple sources?” Once a dataset is assembled, we randomly partition it into two subsets: a training set and a holdout set. The training data are used for building a predictive model; the model “learns” from the training data. The holdout data is then used to test the performance of the model on data that it “did not see.” The idea of deploying the model on the holdout data is to mimic reality, in which we deploy the model on completely new data. One should also not get obsessed or disappointed by too much or too few data. The focus always is on the end goal of the analysis, and when it is unavoidable, one may still get good insights from a moderate set.
To gain useful results, it is critical to have a close connection among predictive analytic techniques, data and business needs and requirements.
Model: The model-building stage comprises trying different algorithms and models and selecting one or more that meet the required performance goals. The variety among models and algorithms is due to the different ways in which they search for patterns and correlations in the data. Some methods are more data driven, in the sense that they make no assumptions about possible patterns, while others assume some underlying structure. All these methods fall under the term “data mining,” a field that combines methods from artificial intelligence, statistics and other disciplines. It is impossible to know in advance which predictive method will be best, and in fact, we often combine multiple models to obtain better predictions. It is therefore important that analysts are knowledgeable about various methods, from classification and regression trees, to nearest-neighbour algorithms, Naïve Bayes, regression models and more.
Deployment: Once the model is validated on the holdout dataset, it is ideally tested on some live data. It is possible that data patterns have undergone a change since modelling and/ or that new factors emerge as being material. This deployment will also give an idea of practical issues such as run-time, cost and other issues that might not have surfaced in the model building stage. When the model is finally fully deployed, it generates more new data, which can then be used to improve modelling or build future models. In the COD example, once the predictive algorithm is deployed, it generates a probability of payment for each new transaction. The high likelihood of payment deliveries are then sent out and the actual payment behaviour is observed.
Classic Uses Of Predictive Analytics In Retail
Direct marketing campaigns are one of the most popular applications of predictive analytics in retail. Since predictive analytics is a powerful personalisation tool, it can be used to tackle questions such as which product or coupon to offer to a particular customer, what medium to use for sending the offer, when to send the offer, etc. Each of these questions would require different outcome measurements as well as success criteria. Increasing customer spend, reducing the number of days since the last visit or purchase and increasing redemption of coupons are all possible outcomes of interest.
With the availability of structured and unstructured data, customer data can include demographic and purchase history information as well as other behaviour from sources such as social media or third party data. Such data can be a rich source of marketing insights at the individual customer level. The use of predictive analytics for personalisation must be carried out carefully, keeping in mind ethical and cultural sensitivities. This powerful tool can cause awkward deployments, as in the recent case of the megastore Target, which used customer data to predict customer pregnancy and extend offers accordingly. Some customers (and their relatives) were extremely upset by the discovery of their situation.
A word of caution: Do not ignore data that yields negative results, such as defaulters in loan applications or non-purchasers in a marketing campaign. For example, in a marketing campaign, customers were sent redemption coupons and redeem rates were captured. While companies are typically interested only in redeeming customers, data on the non- redeemers is crucial for developing future campaigns. These data are then used to study and analyse the relationship between outcome (redemption in this case) and predictors (customer demographic data, etc.). A database of only redeemed coupons would not be conducive to knowing what type of customers did not use the coupon. Knowing who redeemed is as important as knowing who did not redeem.
A word of caution: Do not ignore data that yields negative results, such as defaulters in loan applications or non-purchasers in a marketing campaign. For example, in a marketing campaign, customers were sent redemption coupons and redeem rates were captured. While companies are typically interested only in redeeming customers, data on the non-redeemers is crucial for developing future campaigns. These data are then used to study and analyse the relationship between outcome (redemption in this case) and predictors (customer demographic data, etc.). A database of only redeemed coupons would not be conducive to knowing what type of customers did not use the coupon. Knowing who redeemed is as important as knowing who did not redeem.
Another important area in retail is professional development. A company can apply predictive analytics to employee and programme information in order to determine who to send to a particular training programme. Information on employees’ past performance, course performance, demographics, training mode and other information that correlates with post-training employee performance is used to predict the chances of success for each employee, or alternatively, to choose the employees most likely to benefit from the training programme.
Also, training programmes could be analysed as another dimension of training to weed out inefficient trainings and improve upon factors that could yield better end-performance of trainees.
Customer churn is a major concern for retailers. In many membership-driven business models such as mobile phone schemes, one must deploy limited resources judiciously. Whom to target? What should the message be? What could be the channel and what product mix would aid retention of customers? Such questions can be tackled by predictive analytics to identify customers who are most likely to churn so that they can be contacted appropriately.
Stocking Levels in Fashion Retail
In the case of leading fashion brands, achieving a balance between customer satisfaction and inventory levels is tough. Stores retailing fashion clothing must remain stocked optimally to ensure availability and variety, while minimising stockouts as well as “stale” merchandise. Predictive analytics offers methods for forecasting demand patterns and optimising store capacity, variety and even cross-product elasticity.
Practitioners’ Tips for Predictive Analytics
With the growing experience of retailers venturing into predictive analytics, some of the pitfalls that are known and frequently encountered by practitioners of predictive analytics
offer useful lessons.
While there is always a temptation to get that “aha moment” on new insights borne out of predictive analytics, one must bear in mind that it is only correlation and not causation that we can infer by the ability to predict an outcome from a set of predictors.
It is possible to be awed by the variety of models and algorithms and also because different models may yield varying results. They could well be designed and intended to yield answers to different questions, but it is crucial to have a particular goal formulated in the words of predictive analytics.
One key determinant of success is support and even leadership from upper management. Predictive analytics can lead to successes, but there is often a learning period in which there will be failures. Managerial support must extend to allowing failures, as long as they result in lessons learnt.
As a concluding tip, we circle back to the original thought that business and statistics go together in predictive analytics. When applying findings from the analytics component, one must carefully align outcomes to reality and not just apply them mechanically. Without business knowledge, deploying analytics can be useless at best, and disastrous at worst.
Conclusion: Predictive Analysis for a More Personalised Service
Predictive analytics can be leveraged to reduce the burden on customers, employees, suppliers, society and nature in general by offering more personalised and insightful decisions. In India, where diversity is extremely broad in many dimensions, predictive analytics can help scale up “white glove” service to large numbers of people, transactions and events. Moreover, region-specific customs, realities and modes of operation (such as cash on delivery) open the door to many new questions that can be tackled with predictive analytics. Creativity is therefore the key to new predictive analytics applications. Using predictive analytics, companies can strengthen their relationships with their stakeholders as well as reduce the uncertainty that plagues so many of their operations.