What is Data Science – told by a Data Scientist

September 14, 2021

What is Data Science – told by a Data Scientist -Interview with Jasmin Data Scientist at Holisticon

I have been a Data Scientist at Holisticon AG for almost a year. In my everyday life, I deal with data. Structured and unstructured, clean and unclean, large and small amounts of data. Sometimes, even more data emerges from the data at hand, when combining them creates new, relevant information. This happens, for example, when an average is calculated to describe a group of data.

Data Scientists can collect data from various sources, such as databases or Excel spreadsheets, and prepare it so that analyses can be performed. This is particularly difficult when data is incomplete or contains incorrect values, for example. The quality of this unclean data must first be increased by various procedures before it can be used.

After the necessary preparation, machine learning models and neural networks can be trained to predict future behavior, trends or a specific outcome. Often, new or as yet unknown relationships in the data can be discovered through such an approach. This is done by using programming languages such as Python or R, which provide ready-made packages or code libraries to implement algorithms. The results of the generated models are then tested for accuracy and sometimes displayed using a graph that shows the ratio of correctly to incorrectly detected values of the model.

Data Science: same process model, diverse use cases

In short, the discipline of Data Science follows a fixed sequence of steps that is always the same. First, data must be collected and processed. Then, models are trained on the basis of this data. Finally, the validity of these models and their results are tested. All of this is done using code that can be adapted to a wide variety of use cases and data types. These use cases range from customer analytics for improving products and services to fraud detection for detecting fraud and anomalies in transactions to predictive maintenance, where maintenance requirements and defects in machines are predicted.

How does one become a Data Scientist?

There are now dedicated degree programs for Data Science, but the specialized young talent is not expected to be sufficient to cover the job openings on the labor market for the next few years. So, at least in the medium term, career changers should have no problems finding a job in this industry and also have a good chance of keeping up with other applicants, provided the skills are comparable.

If you try to become a data scientist as a career changer, the first step in almost all cases is a degree in a technical or scientific field. This can be done with a bachelor’s degree, but is often rounded off with a master’s degree. This is followed by the development of specialized theoretical knowledge about important algorithms and methodologies as well as the necessary programming know-how. By taking courses and earning certificates, necessary key skills such as knowledge of SQL, R and Python, machine learning, and data visualization can be built.

However, Data Science has not been an orphan as a discipline for some time now. In addition to programming skills and technical knowledge, knowledge of middle and hardware is increasingly needed, as well as practical know-how on how to develop APIs and frameworks for consolidating data. This expertise, which actually belongs to the discipline of data engineering, is so important because it is no longer just about creating machine learning models, but also about running them in real time in such a way that decisions can be made based on actual data. Data engineers are as responsible for creating these automated pipelines as they are for optimizing them and ensuring consistent performance.

For example, it’s no good if a production floor is dependent on predicting component quantities for the supplier, and then suddenly the pipeline doesn’t work properly. On the one hand, data-driven business processes are known for being efficient. On the other hand, they depend on an infrastructure that works at all times and on carefully generated and versioned machine learning models. In this context, hardware and software requirements as well as the topics of IT security and data protection become all the more important. As a Data Scientist, it is therefore also worthwhile to have a basic knowledge of other related disciplines.

What does the daily routine of a Data Scientist consist of?

As already mentioned, I deal with data in all possible variations. Sometimes potential customers provide anonymized data sets so that hidden patterns or possible use cases can be found. But often, freely available datasets from all kinds of industries and use cases can be found on publicly available platforms like Kaggle, which can be used to develop proofs of concepts or practice methodologies.

Besides preparing the data, the most important step is choosing a suitable algorithm. Not the most suitable one, because often several methods can be considered for use and provide comparable results. Depending on what the goal of the underlying use case is and what kind of data is available, different groups of procedures are used.

Fraud detection, for example, is about identifying fraud, where the goal may be to classify all the available data sets into fraudulent and non-fraudulent transactions. This is done using classification algorithms, but is only possible if at least some of the data records are provided with information on whether they were fraudulent, so that the model can learn to distinguish between them. If such information about the transactions is not available, classification algorithms cannot be used, and clustering methods are used to try to identify patterns and find anomalies in, for example, financial flows. financial flows, for example.

Why is Data Science interesting as a career field?

Anyone who enjoys dealing with numbers and is enthusiastic about finding hidden connections and doing a bit of detective work will feel right at home in the field of Data Science. The wide variety of application areas and use cases keep the subject fresh, and the many possibilities arising from technological progress create a strong incentive to always try something new. In short, you never get bored. never get bored. Since it is comparatively easy to prevail over specialized students due to the shortage of skilled workers and you are now needed on many corners as a Data Scientist, the pay is usually also worth it.

Skills such as statistics, machine learning and data visualization are needed today in almost all companies where large amounts of data are generated, whether in production, sales or customer care. And now, thanks to the use of cloud solutions, there are also sufficient technical resources to run Data Science for small and medium-sized companies. Data Scientists have therefore already been in high demand on the job market for several years and generally have the opportunity to pick and choose their jobs.

Looking ahead, the amount of data and the performance of the technical resources required to process it will continue to increase exponentially. In light of the data-driven business processes that are evolving to support this, disciplines such as Data Science and Data Engineering will only become more integral to successful businesses.

About Holisticon

Holisticon AG is a management and IT consulting company headquartered in Hamburg. Outstanding minds, individualists, characters and pioneers work here. All of them pursue the same bold goal: to offer honest technological and methodical management and IT consulting at the highest possible level - and to maintain this level. And who remain as casual as they are curious.

You can find more information about Holisticon, their approach and vacancies here.

If you want to learn the basics of Data Science have a look at our courses (for example the Python courses).