For the purpose of Data Science with Python tutorial, I would like to work with a data set called Gapminder and I will provide some sample python codes for learning data analysis fundamentals. This portion of the GapMinder data includes one year of numerous country-level indicators of health, wealth and development.
Content Overview
Download GapMinder Data Set : gapminder.csv
Visit http://www.gapminder.org for more information
GapMinder
Founded in Stockholm by Ola Rosling, Anna Rosling Rönnlund and Hans Rosling, GapMinder is a non-profit venture promoting sustainable global development and achievement of the United Nations Millennium Development Goals. It seeks to increase the use and understanding of statistics about social, economic, and
environmental development at local, national, and global levels. Since its conception in 2005, Gapminder has grown to include over 200 indicators, including gross domestic product, total employment rate, and estimated HIV prevalence. Gapminder contains data for all 192 UN members, aggregating data for Serbia and Montenegro. Additionally, it includes data for 24 other areas, generating a total of 215 areas. GapMinder collects data from a handful of sources, including the Institute for Health Metrics and Evaulation, US Census Bureau’s International Database, United Nations Statistics Division, and the World Bank.
Gapminder Codebook
Variable Name & Description of Indicator:
1. incomeperperson
The variable is for 2010 Gross Domestic Product per capita in constant 2000 US$. The inflation but not the differences in the cost of living between countries has been taken into account.
2. alcconsumption
It represents 2008 alcohol consumption per adult (age 15+), litres Recorded and estimated average alcohol consumption, adult (15+) per capita consumption in litres pure alcohol
3. armedforcesrate
This variable is about Armed forces personnel (% of total labor force)
4. breastcancerper100TH
It does represent 2002 breast cancer new cases per 100,000 female Number of new cases of breast cancer in 100,000 female residents during the certain year.
5. co2emissions
It is 2006 cumulative CO2 emission (metric tons), Total amount of CO2 emission in metric tons since 1751.
6. femaleemployrate
2007 female employees age 15+ (% of population) Percentage of female population, age above 15, that has been
employed during the given year.
7. employrate
This is for 2007 total employees age 15+ (% of population) Percentage of total population, age above 15, that has been employed during the given year.
8. HIVrate
It is 2009 estimated HIV Prevalence % – (Ages 15-49) Estimated number of people living with HIV per 100 population of age group 15-49.
9. Internetuserate
This variable for 2010 Internet users (per 100 people) Internet users are people with access to the worldwide network.
10. lifeexpectancy
This variable for 2011 life expectancy at birth (years) The average number of years a newborn child would live if current mortality patterns were to stay the same.
11. oilperperson
It represents in 2010 what was the oil Consumption per capita (tonnes per year and person)
12. polityscore
2009 Democracy score (Polity) Overall polity score from the Polity IV dataset, calculated by subtracting an autocracy score from a democracy score. The summary measure of a country’s democratic and free nature. -10 is the lowest
value, 10 the highest.
13. relectricperperson
This is about in 2008 residential electricity consumption, per person (kWh) The amount of residential electricity consumption per person during the given year, counted in kilowatt-hours (kWh).
14. suicideper100TH
It represents 2005 Suicide, age adjusted, per 100 000 Mortality due to self-inflicted injury, per 100 000 standard population, age adjusted
15. urbanrate
2008 urban population (% of total) Urban population refers to people living in urban areas as defined by
national statistical offices (calculated using World Bank population estimates and urban ratios from the United Nations World Urbanization Prospects)