What Does a Data Scientist Do?

Posted on Posted in Data Science
Data scientist is a person who has the knowledge and skills to conduct sophisticated and systematic analyses of data. A data scientist extracts insights from data sets for product development, and evaluates and identifies strategic opportunities.
“A data scientist is that unique blend of skills that can both unlock the insights of data and tell a fantastic story via the data,” — DJ Patil

“What is data science?”

Most people hyping data science have focused on the first word: data. They care about volume and velocity and whatever other buzzwords describe data that is too big for you to analyze in Excel. This hype about the size (relative or absolute) of the data being collected fed into the second category of hype: hype about tools. People threw around EC2, Hadoop, Pig, and had huge debates about Python versus R.  But the key word in data science is not “data”; it is science.Data science is only useful when the data are used to answer a question. That is the science part of the equation.

What is Structure of a Data Science Project?

Before knowing about critical activities of a data scientist it is mandatory to understand various phases of a Data Science Project. And what is the output of a Data Science Experiment?

There are mainly the five phases of a data science project
1. Question
2. Exploratory data analysis
3. Formal modeling
4. Interpretation
5. Communication.

Output of a Data Science Experiment

The outputs of a data science experiment are actually pretty much limitless. However, there are mainly four general types of outputs that we use most frequently.

  • Reports
  • Presentations
  • Interactive web pages
  • Data Product or Data Apps

What Does a Data Scientist Do?

Now you know what is data science and what is structure of a data science project and finally what is output of a data science experiment. So, a Data Scientist do certain core activities which are really involved with the data analysis epicycle. Few of them include;

1. Define the question

The first step is setting expectations. This include what question I am going to answer for my business? Define that question first and later on try to find answer through various mechanism.

2. Defining the ideal data set for the experiment

Next step is find out what kind of data can you need to answer the question. In this step a data scientist usually figure out a ideal data set for his experiment.

3. Get the data

You know what kind of data can answer your question . Now, go go ahead and collect data from diverse sources.

4. Clean the Data

In real world sometimes the data you’re analyzing is too messy and it hasn’t been well-maintained and difficult to work. A data scientist take part in data cleanliness and making them useful for the analysis.

5. Do exploratory analysis to understand the data more and more

Do some exploratory analysis to understand the data and get some more insights. Often times, presentation of data in a pictorial or graphical format so it can be easily analyzed.

6. Perform features engineering / features selection

Feature engineering is the process of creating new feature or selecting appropriate features using domain knowledge of the data that helps machine learning algorithms to work perfectly.

7. Do Prediction/ modeling

The next step is building model. There might be lots of model created by a data scientist but choosing the right  statistical model from a set of candidate models is called model selection. And data scientist is also responsible to picking the appropriate model the the analysis.

8. Interpreting the results

Analyzing the data and interpreting results is another important part of the data science process.

9. Create dashboard

Visualizing and communicating data is really important. So creating report and dashboard helps people to understand  data-driven decisions.

10. Show the result to the other people

Now, show the result to world. It’s important that your manager or VP or colleagues  understand what insights you have derived from that data and why that is important. Sometimes, poor communication may fails to convince people that will make the difference between action and inaction on your analysis.

Sometimes these steps are not orderly followed. It is possible to go back and forth to get better result.

Skill Matrix for a Data Scientist


Also read  Data Scientist’s Toolkits
  • Very good article. I am dealing with many of these issues as well..