Who is a Data Scientist?

Posted on Posted in Data Science

 A Data scientist is a person who extracts insights from data sets. He has enough knowledge and skills to do sophisticated and systematic analysis of data that helps for product development, and evaluates and identifies strategic opportunities for your organization.

“A data scientist represents an evolution from the business or data analyst role. The formal training is similar, with a solid foundation typically in computer science and applications, modeling, statistics, analytics and math. What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge. Good data scientists will not just address business problems, they will pick the right problems that have the most value to the organization,” — IBM researchers

What Data Scientists Do?

Data scientists are basically data experts who utilize their knowledge of statistics and modeling to convert data into actionable insights. Their technical skills to solve complex problems and the curiosity to explore the data brings insights about everything from product development to customer retention to new business opportunities.

A data scientist is responsible for the following work

Pulling and cleaning data

Data Scientist often collect the raw data needed for to solve your problem. They also process the data for analysis. In real world sometimes the data you’re analyzing is too messy and it hasn’t been well-maintained and difficult to work. A data scientist take part in data cleanliness and making them useful for the analysis.

Designing experiments

Often times, the data scientist do Experimental Design. Because, most observational data have ridiculous problems with collinearity that without an experimental design being able to break that, you’d have very little systematic or study-able variation.

Feature Engineering and Model Selection

Feature engineering is the process of creating new feature or selecting appropriate features using domain knowledge of the data that helps machine learning algorithms to work perfectly. There might be lots of model created by a data scientist but choosing the right  statistical model from a set of candidate models is called model selection. And data scientist is also responsible to picking the appropriate model the the analysis.

Analyzing data and communicating result

Analyzing data and communicating results is the final part of your data science process. It’s important that your manager or VP understand why the insights you’ve uncovered are important. Proper communication fails to convince people that will make the difference between action and inaction on your analysis.

Skills Needed To Succeed

Data scientists should good technical, analytical, and presentation skills. They understand statistics and applied mathematics. They use  hypotheses testing with experiments they design. They have enough knowledge of programming to engineer methods for sourcing, processing, and storing data. And of course the main crucial part is the they communicate their findings through data visualizations and stories to convince people about their findings.

A data scientist should have stronger statistics and presentation skills than a data analyst and data engineer. A data scientist would have strong skills of Inferential Statistics, Machine Learning, Data Analysis, Data Communication, Neural Networks and Big Data Technology.

Data scientists are “analytically-minded, statistically and mathematically sophisticated data engineers who can infer insights into business and other complex systems out of large quantities of data,” — Steve Hillion

For this Job role a wide varieties skill sets are required though in real world a person may not be very good at all  mentioned fields but they should be having fair knowledge in all the above technologies.

“A data scientist is an engineer who employs the scientific method and applies data-discovery tools to find new insights in data. The scientific method—the formulation of a hypothesis, the testing, the careful design of experiments, the verification by others—is something they take from their knowledge of statistics and their training in scientific disciplines. The application (and tweaking) of tools comes from their engineering, or more specifically, computer science and programming background. The best data scientists are product and process innovators and sometimes, developers of new data-discovery tools,” — Gil Press

Skills Matrix:

Python, R, Scala, Apache Spark, Hadoop, machine learning, deep learning and statistics.


Data Science Experience, Jupyter, RStudio

Also read Data Scientist Toolkits