Structure of a Data Science Project | Different Phases in Data Science Project

Posted on Posted in Data Science, Data Science with Python

A typical data science project will be structured in a few different phases. There’s roughly five different phases that
we can think about in a data science project.

Phase 1: Defining A Question

The first phase is the most important phase, and that’s the phase where you ask the question and you specify what
is it that you’re interested in learning from data. Now, specifying the question and kind of refining it over time
is really important because it will ultimately guide the data that you obtain and the type of analysis that you do. There are six types of questions that you can ask going from kind of descriptive, to exploratory, to inferential, to causal, to prediction, predictive and mechanistic.

So, figuring out what type of question you’re asking and what exactly is the type of question is really influential. You should spend a lot of time thinking about this.

Phase 2: Exploratory data analysis

There are two main goals to exploratory data analysis. The first is you want to know if the data that you have is suitable for answering the question that you have.

“Is there enough data?”
“Are there too many missing values?”
Am I missing certain variables or do I need to collect more data to get those variables, etc?

The second goal of exploratory data analysis is to start to develop a sketch of the solution.

Phase 3: Formal modeling

The next stage is the third stage which is about formal modeling. If your sketch is okay and it seems to work, you’ve got the right data and it seems appropriate to move on. The formal modeling phase is the way to specifically write down what questions you’re asking and what parameters you’re trying to estimate. Challenging your model and developing a formal framework is really important to making sure that you can develop robust evidence for answering your question. And It helps to examine their sensitivity to different assumptions.

Phase 4: Interpretation

Once you’ve done your analysis and your formal modeling you want to think about how to interpret your results. You’ve probably done many different analyses, you probably fit many different models. And so you have many different bits of information to think about. Part of the challenge of the interpretation phase is to assemble all of the information and weigh each of the different pieces of evidence. You know which pieces are more reliable, which are are more uncertain than others, and which more important than others to get a sense of the totality of evidence with respect to answering the question.

Phase 5: Communication

The last phase is the communication phase. Any data science project that is successful will want to communicate
its findings to some sort of audience. That audience may be internal to your organization, it may be external, it
may be to a large audience or even just a few people

Output of a Data Science Experiment

The  outputs of a data science experiment are pretty much limitless. However, four general types of
outputs pop up most frequently. Those are:

  • Reports
  • Presentations
  • Interactive web pages
  • Apps