Saturday 11 April 2020

Data Analysis and Visualisation with Python




Python  has huge number of libraries and functions using that we can easily do data profiling, data analysis and data visualisations.

Specially for Data Analysts and Data Architect its very common and day-to-day challenges to analyse and profile huge data thats siting on heterogeneous data sources. Like some of data available in flat file, some are need to be copy from internet and some of data need to be taken from relational database after joining all data together only analysis can perform.

Use case :  

We have a dataset that copied from internet and other dataset given in CSV need join together and show " Life expectancy and fertility rate statistics by country .

 Data sources :


 Datasets in List format :

Dataset_1:
Country_Code = list (["ABW","AFG","AGO","ALB","ARE","ARG"]
Life_Expectancy_At_Birth_2001 = list ([65.5693658536586,32.328512195122,32.9848292682927,62.2543658536585,52.2432195121951,65.2155365853659]
Dataset_2 : Countries_2001_Dataset = list (["Aruba","Afghanistan","Angola","Albania","United Arab Emirates","Argentina"]
Codes_2001_Dataset = list (["ABW","AFG","AGO","ALB","ARE","ARG"]

Dataset_3 : ( CSV format):    File Link

Code : 

import pandas as pd; 
import numpy as np;

demographic= pd.read_csv("/Users/perx/desktop/vikas/learning/source data/P4-Demographic-Data.csv")
# Give your local file path where you have downloaded .csv

# Validate data
demographic.head(4)

# convert list into Matrix
matrix_facts={}
matrix_facts["country_code"]=Country_Code
matrix_facts["Life_Expectancy_At_Birth_1960"]=Life_Expectancy_At_Birth_1960
matrix_facts["Life_Expectancy_At_Birth_2013"]=Life_Expectancy_At_Birth_2013

matrix_dim={}
matrix_dim["Countries_2012_Dataset"]=Countries_2012_Dataset
matrix_dim["Codes_2012_Dataset"]=Codes_2012_Dataset
matrix_dim["Regions_2012_Dataset"]=Regions_2012_Dataset

#converting matrix into table
dataset_facts= pd.DataFrame(matrix_facts)
dataset_dim=pd.DataFrame(matrix_dim)

#Validate second dataset
dataset_facts

dataset_dim

# Joining Dataset

joined_dataset_1=pd.merge(dataset_facts, dataset_dim, left_on="country_code", right_on="Codes_2012_Dataset")

# Final Dataset given in list format
joined_dataset_1.head(10)

#Dataset given in file
demographic.head(4)

#Joining File Data set and list dataset
final_dataset=pd.merge(demographic,joined_dataset_1, left_on="Country Code",right_on="country_code", how="outer")

final_dataset.head(5)

#Visualisation

import matplotlib.pyplot as plt
import seaborn as sns

viz1=sns.lmplot(data=final_dataset, x="Birth rate", y="Life_Expectancy_At_Birth_1960", hue="Regions_2012_Dataset",scatter_kws={"s": 50})

Now we are ready visualise our data 





Here in above example we see, you can combine many number of dataset and can do analysis with just python.

5 comments:

  1. This is really very nice blog, your content is very interesting and worth reading it. Thanks for providing such a valuable Knowledge on Data Analysis With Python. Keep sharing. Very knowledgeable Blog.

    ReplyDelete
    Replies
    1. Thank you very much for encouraging.

      Delete
  2. This is blog is really eye catching, the content that you have provided is worth reading especially for students/people who are interested in the field. I really appreciate you providing such information on. cannot wait for more information also if you want you could check out data science course that would intrigue you

    ReplyDelete
  3. I am really very happy to visit your blog. Directly I am found which I truly need. please visit our website for more information
    Data Visualization Service in USA

    ReplyDelete

Data Mesh

  Data Mesh is a relatively new approach to managing and organizing data within organizations, especially large enterprises. It advocates fo...