Dating sites and dating apps are some of the most interesting and at the same time absolutely typical applications of modern machine learning algorithms. This article describes the main tasks and principles of operating data systems from the perspective of Data Science and gives original examples of user attempts to "hack" the mating rules in the hope of finding love on the other side of the screen.
The beginning of the development of Internet dating is considered to be the 2010s when the world entered the era of Big Data technology. For example, the most popular online dating service, Tinder, appeared in 2012, Pure in 2013, and Bumble in 2014. However, some widely used dating solutions such as Mamba, Badoo, and OkCupid emerged almost 10 years earlier: in 2004, 2006, and 2007, respectively. Initially, they developed as dating websites and then moved into the category of mobile applications. Big social networks like Facebook, VKontakte, and other similar platforms also include some functions for finding a couple, for example, the possibility to nutlike on the photo, put the information about family status, filter users' profiles, show potential friends, etc.
While dating services can be used for business purposes, the main purpose of these apps is to find a romantic partner. This is simplified as follows :
However, today it is not necessary to look through the profile of every user you are interested in. Similar to the recommendation systems in online stores and online movie theaters, dating platforms perform the initial selection of candidates themselves, forming a list of potential partners. In this case, not only simple filters work, such as the choice of the desired sex, orientation, age, geolocation, and other formal criteria. Most dating services analyze photos of users, showing you people with the type of appearance you find most attractive. In addition, users' interests and preferences from third-party services, such as playlists on Spotify or recent purchases from Amazon, are also taken into account . And Tinder algorithms in general are based on your behavior on Facebook and Instagram, including virtual friends, posts, likes, and photos. This approach based on the analysis of real data, rather than responses to generic questionnaires like Match and OkCupid, is considered more relevant to the real character of the user. So, the probability to offer him or her a suitable person will be higher . All these features of automated personal recommendation matching are impossible without machine learning methods, which we will talk about next.
Among all the dating services, Tinder is the most popular. For example, by the number of downloads for all mobile platforms, it is the leader and provides the largest capitalization for Match Group  (Fig. 2). Therefore, this application can be considered as a clear example of the practical application of Data Science to the field of online dating.
Almost every DS task related to data analysis and machine learning starts with the preparation of a dataset. In particular, in dating services, algorithms are trained for the first few days by examining a new user's behavior: his activity, preferences, matches. Next, Tinder assigns the user his own score (Elo score), which does not reflect the degree of attractiveness or desirability but is used to systematize the data. This is necessary, among other things, to prevent user abuse, such as an unlimited number of positive swipes (likes) of all candidates in a row. Therefore, their maximum number was reduced to 200 per day and implemented an algorithmic classification by formal attributes (gender, age, location, education, and other biographical data). But the greatest contribution to the final ML model is the rating of the other person with whom the pair was made. This increases not only the likelihood of a match but also subsequent communication, including outside of the app. Also, user activity in the app, such as frequency of logins, likes and dislikes, correspondence with people you like, also increases your internal rating .
As mentioned above, Tinder takes information about a user's interests from third-party services: Facebook and Instagram profiles, listening to online music repositories, shopping at online stores, search query history, and visits to other topical sites using cookies. Other dating apps like eHarmony, Match, and OkCupid build a portrait of their user and their ideal partner based on a detailed profile. However, the analysis of user behavior on the Internet allows more accurate identification of the interests of each client, including his true personal characteristics and lifestyle. At the same time, the possibility of special or unintentional distortion of reality is excluded, which is allowed by the user himself in his attempts to present in the questionnaire "the best version of himself" .
The operation of dating services is not limited to the selection of the best candidates for each user. After the match, the dating platform continues to analyze the nature of communication of people who like each other, recognizing the tone and meaning of the content in the text and audio messages . It also automatically recognizes images. In particular, Badoo has a function to identify images of an intimate nature in photos, to warn the user about it .
The main model of monetization of any dating service is a subscription with additional features and no restrictions of the free version. However, dating platforms make money even on those users who do not pay them. For example, by selling impersonal or even private data to DSP-exchanges (Demand Side Platform) and other Data-providers that specialize in targeting advertising . A detailed explanation of this opportunity to profit from user data is given in another article. Of course, this provokes a number of questions to the data services about the security of their usage, which is especially important in the context of strict GDPR requirements. Moreover, when all the big and small data about you is aggregated in one place outside of your control, it is, at the very least, alarming. For example, back in 2017, a journalist from The Guardian newspaper received a detailed 800-page A4 report from Tinder, requesting all information about herself from the company . This case once again shows that Big Data technologies and Data Science methods allow an interested party to remember and profitably use all the digital traces left by each of us.
After reviewing the typical capabilities of data mining applications, we can list the main tasks of ML algorithms that are used in the implementation of these functions:
In a complex way, all these tasks are summarized in collaborative filtering which is a prediction of user preferences based on the history of previous events and interests of similar people. This combines elements of classification, clustering, and missing data replenishment .
Of the specific ML algorithms, neural networks are actively used including those working on Deep Learning algorithms. For example, in Badoo, it is possible to search for users with a certain appearance by specifying a photo of the desired partner who looks like a famous athlete, actor, or another well-known person. The algorithm finds the face in the photo, determines its features and characteristics, and compares it to the data in its database (in Badoo, there are more than 335 million people), and produces similar candidates. For example, in the UK, the most popular requests were Robert Pattinson and Cara Delevingne. In Russia, ladies who look like Vera Brezhneva, Irina Sheik, and Natalia Vodianova, as well as men who resemble Sergei Lazarev, Alexander Kerzhakov, and Roman Abramovich, are in demand. Dmitry Medvedev and Vladimir Putin are also in the Russian Top 20. Kim Kardashian, Selena Gomez, and Donald Trump open the world's top 10 . Deep Learning models such as BERT, XLNet, and similar deep learning neural networks are used to solve NLP tasks, you can read about it here.
Despite the high-profile claims of Tinder and other dating services about their profound mission to satisfy every consumer's desires, real users of these apps are often dissatisfied with their performance. In particular, lack of matches or irrelevant recommendations. Therefore, many people try to cheat the system by "hacking" ML algorithms with scripts or special techniques.
For example, an interesting story is that of Chris McKinley, 35, from Los Angeles, who couldn't find a match on OkCupid for a long time because the topics he was interested in didn't resonate with most of the girls. So Chris created 12 fake accounts and wrote a Python script to automatically manage them. Each account responded randomly to the questions. That way he was able to cover all the questions with different answers and gained access to the entire database of L.A. girls who were registered on the site. McKinley then generated a dataset for his own search model, reducing the original sample by about a factor of 4. He then identified 7 groups of girls whose interests, habits, and outlook on life matched, and chose the 2 most attractive to him: young (25 - 35 years old) and more mature women (35 - 45 years old). Chris then created 2 accounts of him, focusing as much as possible on each target group. As a result, he got a huge number of matches. After going on 87 dates, McKinley proposed to the 88th girl out of the candidates selected in this way .
A similar experience can be made with Tinder using data tagging and Python scripts from a repository on GitHub . Tinder, however, isn't too fond of such hype. For example, the company forced Canadian programmer Justin Long to stop using a bot he wrote to automate the initial selection of candidates that Tinder's own ML algorithms offered. Similar to the dating system, Long's Bernie app was linked to a user's Tinder account and trained on likes and dislikes. After training, Bernie itself swiped candidates' photos for the user and even started a chat conversation when there was mutual interest .
Today, dating apps are one of the most common ways of getting to know each other. For example, in the U.S. 3 million people who met online got married from 2005 to 2012. This represents about 30% of all registered marriages. And the number of divorces in such cases is about 10 times lower than for people who have met each other differently . However, despite the active development of various methods of Data Science, modern ML algorithms and Big Data technologies can not yet cover all the nuances of interpersonal communication. For example, the so-called "chemistry" - the spontaneous sympathy that occurs when a man and a woman meet in person - remains outside the scope of a dating service. Therefore, it is not worth relying completely on the recommendation system of dating services in the hope of finding a life partner, forgetting about traditional offline dating.