Data to Analytics: Data Analysis and Visualisation with Python

Python has huge number of libraries and functions using that we can easily do data profiling, data analysis and data visualisations.

Specially for Data Analysts and Data Architect its very common and day-to-day challenges to analyse and profile huge data thats siting on heterogeneous data sources. Like some of data available in flat file, some are need to be copy from internet and some of data need to be taken from relational database after joining all data together only analysis can perform.

Use case :

We have a dataset that copied from internet and other dataset given in CSV need join together and show " Life expectancy and fertility rate statistics by country .

Data sources :

Datasets in List format :

Dataset_1:
Country_Code = list (["ABW","AFG","AGO","ALB","ARE","ARG"]
Life_Expectancy_At_Birth_2001 = list ([65.5693658536586,32.328512195122,32.9848292682927,62.2543658536585,52.2432195121951,65.2155365853659]

Dataset_2 : Countries_2001_Dataset = list (["Aruba","Afghanistan","Angola","Albania","United Arab Emirates","Argentina"]
Codes_2001_Dataset = list (["ABW","AFG","AGO","ALB","ARE","ARG"]

Dataset_3 : ( CSV format): File Link

Code :

import pandas as pd;
import numpy as np;

demographic= pd.read_csv("/Users/perx/desktop/vikas/learning/source data/P4-Demographic-Data.csv")
# Give your local file path where you have downloaded .csv

# Validate data
demographic.head(4)

# convert list into Matrix
matrix_facts={}
matrix_facts["country_code"]=Country_Code
matrix_facts["Life_Expectancy_At_Birth_1960"]=Life_Expectancy_At_Birth_1960
matrix_facts["Life_Expectancy_At_Birth_2013"]=Life_Expectancy_At_Birth_2013

matrix_dim={}
matrix_dim["Countries_2012_Dataset"]=Countries_2012_Dataset
matrix_dim["Codes_2012_Dataset"]=Codes_2012_Dataset
matrix_dim["Regions_2012_Dataset"]=Regions_2012_Dataset

#converting matrix into table
dataset_facts= pd.DataFrame(matrix_facts)
dataset_dim=pd.DataFrame(matrix_dim)

#Validate second dataset
dataset_facts

dataset_dim

# Joining Dataset

joined_dataset_1=pd.merge(dataset_facts, dataset_dim, left_on="country_code", right_on="Codes_2012_Dataset")

# Final Dataset given in list format
joined_dataset_1.head(10)

#Dataset given in file
demographic.head(4)

#Joining File Data set and list dataset
final_dataset=pd.merge(demographic,joined_dataset_1, left_on="Country Code",right_on="country_code", how="outer")

final_dataset.head(5)

#Visualisation

import matplotlib.pyplot as plt
import seaborn as sns

viz1=sns.lmplot(data=final_dataset, x="Birth rate", y="Life_Expectancy_At_Birth_1960", hue="Regions_2012_Dataset",scatter_kws={"s": 50})

Now we are ready visualise our data

Here in above example we see, you can combine many number of dataset and can do analysis with just python.

5 comments:

Unknown30 July 2020 at 01:45
This is really very nice blog, your content is very interesting and worth reading it. Thanks for providing such a valuable Knowledge on Data Analysis With Python. Keep sharing. Very knowledgeable Blog.
Vikas Pathak14 September 2021 at 07:50
Thanks Vihan
Anonymous8 October 2021 at 01:49
This is blog is really eye catching, the content that you have provided is worth reading especially for students/people who are interested in the field. I really appreciate you providing such information on. cannot wait for more information also if you want you could check out data science course that would intrigue you
sam kirubakar2 February 2022 at 02:21
I am really very happy to visit your blog. Directly I am found which I truly need. please visit our website for more information
Data Visualization Service in USA

Data to Analytics

menu

Saturday, 11 April 2020

Data Analysis and Visualisation with Python

Use case :

Data sources :

Now we are ready visualise our data

5 comments: