DATA SCRAPING/STREAMING WITH VISUALIZATIONS (PART 1): USING BEAUTIFUL SOUP TO SCRAPE DATA FROM WIKIPEDIA PAGE AND VISUALIZE WITH MATPLOTLIB|SEABORN|PLOTLY & TABLEAU
1. INTRODUCTION
GOAL: We will use beautiful soup library to scrap static data from wikipedia page that contains Nigeria’s population in 2006 and 2016 in a table format, then visualize by comparing the python libraries: matplotlib, seaborn , plotly with tableau.
Page: https://en.wikipedia.org/wiki/List_of_Nigerian_states_by_population
2 DATA-SCRAPING PROCESS
2.1 Import libraries
2.2 Define url path, and test if your request is successful, NB: 200 means a successful request
2.3 Get title and find all tables in the wikipedia page
2.4 Then we locate the correct table by searching for the html tag that specifies the name of the table that contains the Nigeria States and Population, press f12 on your keyboard on the wikipedia page to do this. We see that it is contained under “wikitable sortable”
2.5 Get the length of rows and columns of the table above, then print the headers.
2.6 Loop through the rows of the table and append to a list
2.7 Print html of each table record “tr” tag
2.7 extract table body i.e rows.
2.8 create dictionary of the header and body
2.9 Then we now have the exact wikipedia table has a dataframe/dictionary
2.10 Save as a csv, and replace “,” in population for visualization
3. VISUALIZATIONS
We will be visualizing Nigeria population in 2016 only,however the same process applies to the 2006 population.
3.1 BARCHARTS
3.1.1 Using Matplotlib
3.1.2 Using Seaborn
3.1.3 Using Plotly
3.2 PIECHARTS
3.2.1 Using Matplotlib
3.2.2 Using Plotly
3.2.3 Piecharts are not avaliable in Seaborn.
3.3 MAPS
3.3.1 Using Plotly
We first Load geojson file nigeria.
Then create a dictionary called “state_id_map” from the features of the geojson, so it can be used to identify states
Read the edited dataframe(Federal Capital Territory was changed to FCT, Abuja)and remove whitspaces and States:
Create new id column that uses the id in the state_id_map dictionary to identify state:
Use the chloropleth to plot:
Using mapbox for a better visual
NB : NO suitable method for ploting maps in seaborn and Matplotlib
4 VISUALIZATIONS IN TABLEAU
Tableau is the fastest growing data visualization and data analytics tool that aims to help people see and understand data.
It is used in Data analysis ,as it is a powerful visualization tool in the business intelligence industry.
4.1 Using Barcharts
4.2 Using Piecharts
4.3 Using MAP
5 CONCLUSION
This project demonstrated a simple way to get static data from a wikipedia page using beautiful soup,it Also compared the Matplotlib,Seaborn, and Plotly libraries with Tableau. Tableau is the fastest method to get visualizations from data, and plotly is the most interactive library for visualization.
WRITER: OLUYEDE SEGUN . A(jnr)
Tableau Dashboard Link: https://public.tableau.com/profile/oluyede.segun#!/
Explanatory Notebook and dataset: https://github.com/juniorboycoder/DATASCRAPING
linkelin profile: https://www.linkedin.com/in/oluyede-segun-jr-a-a5550b167/
twitter profile: https://twitter.com/oluyedejun1
TAGS: #TABLEAU #DATASCIENCE #DATAVISUALIZATION #DATASCRAPING #PYTHON #PANDAS #BEAUTIFULSOUP #SEABORN #MATPLOTLIB #PLOTLY