I already wrote that continuous data analysis is necessary for every data-driven company. In this article, we’ll look at the fundamental differences between a data researcher and an analyst, and why even a very advanced Data Analyst will not replace a Data Scientist.
Before comparing these two disciplines, we define what exactly is meant by each term.
Data analysis is a field of knowledge at the intersection of mathematics and computer science, which develops and explores general mathematical methods and computational algorithms for extracting knowledge from experimental data, including the processes of research, filtering, transforming and modeling data in order to extract useful information for making applied and managerial decisions. This area of knowledge includes Data Mining - a method that focuses on modeling and opening data, rather than describing it, as well as Business Analytics aimed at aggregated visualization of applied information from various sources [1].
Data Science is a field of computer science that studies the problems of analysis, processing and presentation of digital information, including Big Data processing technologies in conditions of high level of parallelism (Big Data), statistical methods, Data Mining tools and artificial intelligence applications for working with data, as well as design and development tools for databases [2].
From the above definitions it follows that data analytics and Data Science use almost the same areas of knowledge: mathematics, computer science and systems analysis. However, the responsibilities of Data Analyst and Data Scientist are different (Fig. 1.).
Data Analyst, like Data Scientist, works with information arrays in order to extract business-friendly information from “raw data” that will enable them to make optimal management decisions to improve target values. Both of these specialists build process data, build predictive models and test them by modeling in specialized application programs. For example, in the field of bank lending, this may be a hypothesis about the dependence of the solvency of a potential borrower on his area of interest. To test such assumptions, statistical methods are used, as well as artificial intelligence tools, for example, Machine Learning. However, with the similarity of common goals, the results and means of achieving them differ between the analyst and the data researcher.
While the analyst and scientist can work with the same source data, they can have completely different points of view on solving the problem. In particular, the Data Scientist, in comparison with the analyst, pays much more attention to automating the process of collecting and preparing information, building information pipelines (data pipeline), partly falling into the responsibility of the Data Engineer. For this, unlike data analytics and business analytics, Data Scientist should be well versed in Big Data technologies (Apache Hadoop stack), cloud computing and software development tools (Fig. 2.). The latter, in turn, requires the data researcher to build distributed applications and experience the rapid deployment of software solutions. And this already belongs to the competencies of DevOps engineer [3].
In addition, the results of the work of these specialists differ significantly. A Data Scientist, like a Data Analyst, formulates applied hypotheses, conducts experiments, builds and validates forecasts, drawing conclusions that help make a decision. However, Data Analyst after its research offers the business theoretical options for solving the problem in the form of calculations, graphs and other results [4]. Data Scientist, on the other hand, creates an application product by developing software (software), which facilitates understanding of the business and optimizes it (Fig. 3). For example, an application for predicting customer outflows, a recommendation system, a program for calculating the probability of a timely return of consumer loans, etc. For this, a Data Scientist, however, like an analyst, needs knowledge of the subject area and specifics of the business, as well as some system analysis practices: lean manufacturing methods , project management, models of economic calculations, etc.).
In addition to the tools, methods and results of operations, Data Scientist and Data Analyst also differ in the level of remuneration - the activity of a data researcher is estimated more expensive (Fig. 4). So, in August 2019, according to a review of vacancies from the HeadHunter recruitment portal, domestic employers offer data analytics of 80-100 thousand rubles per month, while Data Scientist’s - 100-200 thousand rubles [4]. A similar trend is observed in the foreign labor market: as noted in the annual report of Stack OverFlow, Data Scientist and machine learning specialist earn about 61 thousand dollars a year (more than 300 thousand rubles a month), while a data analyst and BI specialist receive 59 thousand dollars per year (a little less than 300 thousand rubles per month) [5].
Summing up the description of the work tasks and professional competencies of the Data Analyst and Scientist, we note that these specialists, with all the similarities, are not interchangeable.
With the same set of disciplines in Data Science and Data Analytics (Fig. 1), these areas of knowledge have different meanings for Data Scientist and Data Analyst. In particular, a data analyst more and more often uses “classical” mathematics (statistical methods), while a researcher uses applied disciplines of software development. However, for both professionals, understanding the business and mastering modern data processing tools is very important. However, since Data Scientist is more expensive and works at a higher level of maturity of corporate business processes according to the CMMI model (I talked about this in more detail here), it is better to start data analysis projects with an analyst. In any case, both the analyst and the data researcher are ultimately evaluated by the business in terms of the benefits that they can bring to it. Therefore, for its professional growth (and increasing personal value in the labor market), Data Analyst masters Computer Science tools, including Machine Learning methods, and Data Scientist - statistical models and mathematical calculation tools.
Sources