Alexei ChernobrovovConsultant on Analytics and Data Monetization

How to speed up and reduce the cost of advertising experiments by predicting the behavior of users of an online store using proxy metrics

Conversion, the average purchase receipt and the effect of return on investment (ROI) are perhaps the most important indicators in online marketing on which business strategies are built. To adequately evaluate these metrics, you need to have sufficiently large statistics on them. But it is not always possible to collect or accumulate the sample necessary for making a decision based on data due to the seasonality of the industry, limited time, lack of funds or other external factors. The forecasting of the values ​​of the necessary indicators will help to accelerate the obtaining of the result based on the indirectly related (correlating) known factors. For example, if a user viewed a product page and put it in a basket, but did not place an order, then most likely the product is interesting to the consumer. Therefore, on average, the ROI from personalized advertising of this product to this customer will be significantly higher than in the case of an uninterested user. How to predict the value of marketing indicators based on the analysis of user behavior events, we will consider in this article. 

Problem Statement

To draw conclusions about the feasibility of offering a particular product or service, you need sales statistics for this product. Typically, in an online store with a wide assortment, conversion ranges from 1-2%, so for a small online store (up to 50 thousand visits) it will take at least a few months for a reliable evaluation of the experiment to collect statistics. The business does not have this time, because you need to sell today. Therefore, it is required to predict the value of targets based on existing data. For example, is it advisable to invest in remarketing, returning a visitor to the site using contextual advertising of the products he viewed? The answer to this question will help ROI - the greater the return on investment in advertising, the more effective the marketing campaign.

Having determined how the probability of a product’s purchase by a specific user is related to its actions on the site (page views, adding to the basket, etc.), we can draw conclusions about the advisability of investing in advertising for this client. Such indicators, which approximate the target variable, but have a much larger number of observations, are called proxy metrics. From the set of data on user behavior, it is necessary to select those proxy metrics that will most quickly and accurately determine the ROI from advertising campaigns and remarketing. Next, build a model that calculates the target variable from the values ​​of the proxy metrics. Such prognostic tasks are perfectly solved with the help of machine learning (Machine Learning, ML).


The Maths of Proxy Metrics 

So, there is a training sample of observations x∈ X, and individual values ​​of the target metric F *: x → y are known. It is required to construct a function F with a generalizing ability such that F: X → Y. The proxy metric G approximates the target on a smaller number of observations, i.e. for the same training sample x∈ X, G *: x → z. Moreover, T (z) → y for z∃T. It is necessary to construct a function G with a generalizing ability such that G: X → Z. Moreover, the data for finding proxy metrics are known in advance or can be obtained at a lower cost (time or resource consumption) than information for predicting the target metric directly, and not indirectly (through a proxy), i.e. T (G): X → Y is better (better, cheaper, etc.) than F: X → Y.

The trivial case is T (z) = z. For example, “adding an item to the basket” may be a proxy metric to the variable for “order”. Indeed, all who make orders on the site, be sure to add goods to the basket. And sometimes such a proxy metric can give a good result.

In practice, the difficulty lies in the fact that the task is not always reduced to trivial proxy metrics. Often, a more complex predictor is required, consisting of several indicators, for example, the availability of goods in the basket and the number of pages viewed (Fig. 1).

составная прокси-метрика
Fig. 1. Composite proxy metric (from several predictors)


Moreover, the ML model does not use “ready-made values” of proxy metrics, but predicts them.


Search of proxy metric

ML modeling is preceded by a stage of data preparation, which includes as many as 5 operations: sampling, cleaning, feature generation, integration and formatting. Of these procedures, the most complex, but also the most interesting, is the generation of signs. In this case, the tasks of extracting, transforming and selecting variables are solved so that the dataset contains normalized numerical vectors of only those predictors that really affect the target indicator. The quality of the dataset determines the accuracy of the solution and the speed of calculations, therefore, special attention should be paid to the preparation of data.

The search for relevant proxy metrics is a typical task of generating features, the choice of which should be guided by the following considerations:

  • suitable observation volume for the formation of training and test samples;
  • ease of data collection or calculation;
  • transparency of communication with the target variable, for example, matching the viewed goods to an advertisement or search phrase, etc.

Further, appropriate proxy metrics should be converted, if necessary, to a numerical form and normalized - reduced to a single range of values ​​or probability distribution. This is required for the ML-models to work correctly: an imbalance between the values ​​of the attributes can cause the instability of the algorithm, worsen the learning results and slow down the modeling process.

Next, cut off the “redundant” signs using ML-algorithms that allow you to evaluate the importance of the signs: greedy, logistic regression, random forest, gradient boosting, etc. It is worth considering how the quality of the target metric changes. For example, a proxy metric is considered acceptable enough if:

  • the decrease in the quality of the evaluation of the target metric "pays off" due to the speed of calculations, i.e. less time is needed to make a decision;
  • the quality of the target metric is increasing.

Thus, the proxy metrics search comes down to the following algorithm:

  1. Selection of the set of candidates for the proxy metric G: X → Z;
  2. The selection of those variables that can be predicted is quite “good”, i.e. get vector z from proxy metrics;
  3. Refinement of the function T (z) → y to improve the approximation. In this case, you should use a simple algorithm (shallow decision tree or logistic regression with regularization) to avoid over-training of the model.
  4. Retraining the model G: X → Z and T (z) → y.

Implementation example

There is an online store advertising campaign with a daily budget of X rubles. It takes N days to test the hypothesis of its benefits. In this case, the actual average daily income for campaign R must be greater than the given r, i.e. R> r. The cost of this experiment is the difference between the revenue received and the costs incurred for N days (R-X) N, taking into account the statistical error p. Using proxy metrics allows you to speed up the results, i.e. conduct an experiment for less n? N. The savings will be (R-X) (N-n), statistical error p for the proxy metric.

So, before the introduction of ML-models, testing of advertising campaigns took N days, and on average, it took 7-8 times more to obtain statistically significant results. The decision on effectiveness was made on the basis of the obtained sample. Remarketing was configured for those users who added the product to the basket, but did not order it.

A proxy metric was built as a decisive tree over the following predictive events:

  • Add to cart
  • view more than N products;
  • stay on the site for more than T minutes;
  • viewing the delivery page;
  • viewing goods with dumping price;
  • Average price of viewed products is greater than p.

As a result of forecasting, decision-making time was reduced by almost 2 times, on average, to N / 2 days, and the average loss from unsuccessful tests decreased by about 50% and amounted to (R-X) N / 2. At the same time, remarketing was targeted to users who met the following conditions:

  • added goods to the basket, but did not place an order;
  • the total amount in the basket is less than X rubles;
  • have not made purchases in the last 3 months;
  • according to the forecast of the ML-model, they will view the delivery page.

As a result, remarketing ROI has more than doubled.


So, proxy metrics are excellent predictive analytics tools that help solve the problem of insufficient training set and speed up the result. However, the selection of relevant and reliable proxy metrics is a whole art, since it is necessary to solve all the tasks of generating features for machine learning: from extracting variables to cutting off “unnecessary” predictors. In this case, you should always remember about the appropriateness of applying ML: whether the potential profit or savings exceeds the cost of collecting and preparing data. If forecasting proxy metrics is really profitable, you should use this method for samples of insufficiently large volume. And when more data has accumulated, they can be used for additional training of ML-models by combining into an ensemble (stacking) to improve the quality of algorithms.