Define a Question and Getting Your Data Science Project Started
A typical data science project will be structured in mainly five different phases.
The first phase is always the most important phase, and that’s the phase where you ask the question and you specify what is it that you’re interested in learning from data.I would like to examine the Gapminder Dataset and how gross domestic product (GDP) is related to urbanization? As income per person may depend on urbanization and employment rate. Alternatively GDP may lost when unemployment rate is high. So, basically here I would like to explore the relationship between income per person and two other variables:
2. Employment Rate
Data Science Questions:
1. Is GDP associated Urbanization(Urban Rate)?
2. Is GDP has any relationship with employment rate? Is a bigger GDP implies a higher employment rate?
My hypothesis is a positive answer to this two questions.
My variable of interest are:
There are so many variables in the data set but I am interested only in three variables given below.
My hypothesis is a positive answer to the above two questions. I think it may be feasible to hypothesize that both “employrate” and “urbanrate” variables are positively associated with “incomeperperson”.
1. “Causal relationship between construction activities, employment and GDP: The case of Hong Kong”, Y.H. Chiang, Li Tao, Francis K.W. Wong, Volume 46, April 2015, Pages 1–12.
2. “An integrated approach to climate change, income distribution, employment, and economic growth”, Lance Taylora, Armon Rezaib,Duncan K. Foleya, Ecological Economics, Volume 121, January 2016, Pages 196–205.
4. “Difference among the Growth of GDP and Urbanization of the Provinces and the Cities in West China since the Reform and Opening-up”, Li Zhena,Yang Yongchuna,Liu Yuxianga, China Population, Resources and Environment, Volume 18, October 2008,Issue 5, Pages 19–26.