Alexei ChernobrovovConsultant on Analytics and Data Monetization

Developing CJM through the task of text analysis

Customer journey maps (CJM) are visuals or graphics that depict the customer relationship with a company, their service, and their product over time. At the core of a great customer experience is the customer journey, and how an organization's processes, systems, and people interact with its customers. 

For building CJM there is no need in complex or expensive tools, you can build it with an Excel-tabular as it was demonstrated on the example of However, the results of this approach are not clear enough and are difficult for automation. In particular, this simple method cannot be scaled: it will be necessary to redo all the work of segmenting consumers in case of launching new regions or introducing new goods and services. In this article, we will consider a more technical way to build CJM, based on an automated analysis of the behavior of site users using the text clustering task. 

Formulation of the task

It necessary to  gather log-fails of customer visits that describe acts of customers as an analyzing text: shift to the page, product views, adding to cart and etc. By coding all acts by letter we will get words that would be clustered and classified (Fig.1). 

Кодирование действий пользователя

Fig. 1. Coding of user actions

Firstly, it is necessary to define rules of coding, i.e. to make an alphabet, for example: 

  • А – search;
  • B – view of product card;
  • С – adding to cart;
  • D – process order;
  • Z – targeted intervention;
  • F – view of category;
  • G – clicking another page;
  • H – deleting a product from a cart;
  • Y – cancellation of an act;
  • X –leaving from the website.

Then, according the alphabet, we are transforming the sequence of customer acts from log-fails into words, for example: ABCDE, FGCHBGHX and i.e. After that it necessary to cluster words, i.e. to group users with the same behavior. Thus, we are determining customers into segments according to the model of their behavior.  Then we are classifying segments by the CJM vertices. CJM is a directional graph, the vertices of which shows the contact with the consumers, and the edges indicate the probabilities of the transition to the next interaction (Fig. 2).

Схематичное изображение CJM

Fig.2. Layout view of CJM


Complexity of realization

Despite of its simplicity, this approach on practice could be very laborious.   

First of all, maths and computer-oriented knowledge in Data Science are needed: analysis, clustering and text information classification. Meanwhile Data Scientist certainly doesn’t know the exact product, so close interaction with the business would be necessary for making up alphabet. The development of the alphabet would take a lot of time in case of huge amount of goods or services. Each item is a single letter. But if there are several thousands of goods on the website, then it is hard to work with that volume. Firstly, it would be necessary to cluster and than to reduce the set of using letters.  

Secondly, since CJM is only a marketing strategy, we need to define how it could help business, especially to rise the conversion. Thus, the task  is reducing to the searching of optimal sequence of user acts, that is increasing the whole profit of company. For that purpose we should find the letter in the analyzing words (logs) of customer behavior and change the letter on another one. We would need replacement regulations and letter deletion that could be realized with the help of marketing and technic methods, for example, show the same goods, open the card of a new product in another window and etc. It need much effort because from the business side regulation development and from the technic side Data Scientist has to do coding tasks, develop error detection and correction algorithms [1], calculate Hamming distance [2] for evaluating differences between exploring words. 

Finally, it is necessary to choose the метрику оценки вводимых правил very precisely and accurately, because after you change the alphabet one index can rise at the expense of other, for example the conversion increased, thus the level of refusals increased.  Несмотря на кажущуюся простоту, на практике данный подход оказывается достаточно трудоемким.

Simple example

Desired action  - Z, X  - leaving website or app. We have such logos:

  • AAAX
  • ACDX
  • AAX
  • ADRZ
  • AFGZ

Our task is to change the app so that the letter Z would be common in the logs. It can be seen from the above set that everywhere  where R occurs above P, the probability of conversion decreases significantly. 

Then we can easily predict what would be if the letter P was removed. When there is a huge amount of logs, there is also a directed search by word modifiers with a minimum distance above it.   For example, you can use the Damerau-Levenshtein distance. Thus the word modifiers could be relatively quickly found and app architecture could be changed.  Modifiers could be easily interpreted: 

  • To delete letter - to remove the link (or the option in the application) 
  • To add letter - to highlight a link (to improve navigation) 
  • To change letters - to adapt the way of users (to review the menu, navigation) 
  • To replace the letter - the same way.


Practical Implementation

The approach for building CJM based on the task of text analysis was released for one well-known Bank in Russia, departments of which are spread through the country. The main aim was to increase open deposits through the site or mobile app. For this purpose the detailed audit was held in order to show up its main problems in marketing and to supply the optimal settings, structure and content based on the customer behavior analysis. The following steps were fulfilled: 

  • Analysis of traffic distribution: leaders in denials, attendance, involvement;
  • Customer behavior analysis;
  • Searching analytics;
  • Detection of anomalies;
  • Analysis of the site menu and mobile app;
  • Creating the map of website based on the analytics ;
  • Developing the new structure of website and mobile app.

The whole project lasted for 6 months, 3 of them went on the selecting of the alphabet and creating rules of word optimization in the task of customer behavior analysis based on the log-fails. Finally, after several iterations we have managed to select the working alphabet and to start making clustering and text classification.   

Elaborated algorithm showed that the quantity of products on the screen with offer of deposits is inversely to the conversion: a client looks through many pages, but chooses only one deposit or nothing. Customers who had seen only the first offer had the higher conversion. Thus, within our alphabet the decision was taken to keep only one deposit visible.  

As a result of that simple act the conversion into deposits through the mobile app increased up to 23%.  However, other decisions that were found by this algorithm were hardly realizable. Thus, most of work of creating this alphabet is not usable now. However, other solutions found by the developed algorithm turned out to be practically unrealizable. Nonetheless the investments into developing and implementing this algorithm paid. 


Building CJM based on researches of user logos through the task of text analysis is a very viable method of describing the interaction of client with the site and working on the way of its optimization. This method could be used, if aims of customer are known and there is an opportunity to track the whole trajectory of customer’s  site or app traffic. The advantage of this method is its objectivity - independence from the expert opinion unlike the manual development of CJM, you could read about it in my previous article. Nevertheless, the results of applying this approach depends on the selected alphabet, the development of which takes a lot of time. How to escape from this disadvantages, when you are building CJM  by means of experts and Machine Learning Models I will discuss in the next article



  1. Error detection and correction
  2. Hamming distance
  3. Damerau–Levenshtein distance