Promotion Insights in Retail

Machine Learning (ML) is proven to provide insights into data and processes to enable business decisions in any industry domain. However, the high volume and velocity of data makes it challenging to get those insights both proactively, depending on already established processes, and reactively, accounting for the unknown. In this blog post, we give you an overview of the methodology on identifying and addressing the problem of cannibalization of products caused by promotional campaigns in the retail domain. The use case discussed here aims to analyze the impact of promotional products, specifically to identify cannibalization effects, i.e., when the promotions decrease sales of the non-promotional products dramatically.

An example: A chain of supermarkets decides to have a promotion on 500 g of chicken breasts. This promotion has significant effects on the sales figures of certain other products. In our example, the sales of the beef schnitzel and various turkey products dropped by more than 20% within the period of the promotion. However, many of those associations are not discovered upfront. If the retail management could tackle such an influence in advance of the promotion on other products, they could adjust their demand planning to order products that will be in demand and order fewer products that will be less popular during the promotional period.

Therefore, in this use case we predict the demand on products accounting for promotional campaigns and as a result, stocks can be adjusted, and waste is reduced. We have implemented this use case as a fully automated ML application that is capable to (i) learn ML models on data, (ii) provide valuable insights, and (iii) perform monitoring and assess ML model competence over time.

The input data for this use case were sales transactions for one category of products, i.e., meat products, over several years of a retailer that has multiple supermarkets in different regions. Data also includes information whether the product was on promotion over a certain period. The outputs of this use case can used by two different roles: (i) a demand planer (operational outputs) and (ii) a business analyst (ML analytics). The operational output contains insights about the most negative impacts of promotional products on other products. The purpose of getting such insights is to adjust the stock in supermarkets accordingly and not to order, e.g., foods that spoil fast if it is expected that their sales will be decreased. The ML analytics output contains ML models, performance parameters, and evaluation results over time. These insights provide the means by which to assess the performance of ML models over time using incoming daily sales transaction data and to retrain those models when their performance decreases.

The methodology includes building ML models to get promotional cannibalization insights and creating workflows to enable ML model life cycle, monitoring, and evaluation. The vital part of this use case is A1 Digital ML Platform powered by BigML. The platform takes care of the advanced work of hand-tuning of ML models and of executing complex workflows. It allows us to fully automate the workflows and their executions.

ML models in this use case include regressions, association discovery, and anomaly detectors. Regression models are used to predict expected sales for products to analyze and estimate if the actual sales of non-promotional products are decreased or increased because of promotional campaigns.

Association discovery finds ‘significant’ associations, or so-called association rules, between promotional and non-promotional products. We are interested in identifying the negative impact of promotions, i.e., where the expected sales of products are decreased by more than 20%. For example, Figure 1 shows the associations received for the pairs of products with a considerable sales decrease compared to the expected sales. A non-promotional product with the identifier id=141 is affected by the promotional product with the id=89, non-promotional products with id=44 and id=62 are affected by the promotional product id=145 and so on.


Figure 1: Association rules between promotional and non-promotional products

The results of the association discovery are converted into sales decrease percentages, i.e., showing how sales for certain products will be decreased depending on promotional products for the period of the promotion. These results can be used to analyze beforehand planned promotions and adjust the retail stock ahead of time. This is especially important when the products spoil fast, i.e., foods and drinks. For example, the promotional product PRODUCT-45 is expected to negatively affect sales of products PRODUCT-98, PRODUCT-53, PRODUCT-144 and so on. This means that it will be efficient to stock less of those non-promotional products during the promotional period of the PRODUCT-45 to save money and reduce the possible waste.


Figure 2: The impact of one promotional PRODUCT-45 on other non-promotional products

Together with the association discovery, it is needed to monitor how well those association rules perform daily and to constantly learn from the incoming sales transactions data. Therefore, we train anomaly detectors, which is a powerful tool to measure the reliability of association rules. We build an anomaly detector every time the association rules are produced. Having calculated the rate of how anomalous the new daily sales transactions data is, we can get a measure of how far the new data is from the data that was used to produce the association rules, as shown in Figure 2. This provides necessary information for ML analysts on when the association rules must be updated. Having a high anomaly score for a certain period means that the association rules do not fit any longer to the new sales transactions due to changes in customer behavior, or, for example, due to a major crisis, e.g., a coronavirus outbreak, which most certainly has changed the sales figures dramatically. When the association rules are updated, it is necessary to have a testing period for the new rules to be evaluated and checked if they perform better on the new incoming sales transactions data.


Figure 3: The rate of anomalous sales transactions data incoming every day

Calculating anomaly scores are not the only approach to measure how well the association discovery performs. When the models are in place, we can compare daily sales transactions data with the predicted sales figures and calculate the error rate of the performance of the association rules. Therefore, error rates together with anomaly scores show when to produce new association rules using the latest available sales transactions data. For example, Figure 3 shows the overview of model performance with the example evaluations of two association discovery models: ‘In Use’ and ‘In Testing’.


Figure 4: The model performance overview

This use case is a good example of how Machine Learning can provide an overview of how the promotional campaigns affect sales of the non-promotional products in the retail domain. Machine Learning helps to identify and tackle the negative effects of those promotional campaigns on products, proactively and reactively adjust stock and reduce waste of products, and most importantly reduce the costs of a retailer.