DATA SCRAPING/STREAMING WITH VISUALIZATIONS (PART 1): USING BEAUTIFUL SOUP TO SCRAPE DATA FROM WIKIPEDIA PAGE AND VISUALIZE WITH MATPLOTLIB|SEABORN|PLOTLY & TABLEAU

oluyede Segun (jr)
5 min readDec 10, 2020

--

1. INTRODUCTION

GOAL: We will use beautiful soup library to scrap static data from wikipedia page that contains Nigeria’s population in 2006 and 2016 in a table format, then visualize by comparing the python libraries: matplotlib, seaborn , plotly with tableau.

Page: https://en.wikipedia.org/wiki/List_of_Nigerian_states_by_population

2 DATA-SCRAPING PROCESS

2.1 Import libraries

Python Libraries needed for project

2.2 Define url path, and test if your request is successful, NB: 200 means a successful request

Make request and response

2.3 Get title and find all tables in the wikipedia page

Get title and find all tables in wikipedia page

2.4 Then we locate the correct table by searching for the html tag that specifies the name of the table that contains the Nigeria States and Population, press f12 on your keyboard on the wikipedia page to do this. We see that it is contained under “wikitable sortable”

locating wikitable sortable html tag by pressing f12

2.5 Get the length of rows and columns of the table above, then print the headers.

length of rows and columns of table

2.6 Loop through the rows of the table and append to a list

Loop through the rows of the table and append to a list

2.7 Print html of each table record “tr” tag

Print html of each table record “tr” tag

2.7 extract table body i.e rows.

extract table body i.e rows.

2.8 create dictionary of the header and body

create dictionary of the header and body

2.9 Then we now have the exact wikipedia table has a dataframe/dictionary

Then we now have the exact wikipedia table has a dataframe/dictionary

2.10 Save as a csv, and replace “,” in population for visualization

save to csv and replace

3. VISUALIZATIONS

We will be visualizing Nigeria population in 2016 only,however the same process applies to the 2006 population.

3.1 BARCHARTS

3.1.1 Using Matplotlib

matplotlib barchart

3.1.2 Using Seaborn

seaborn barchart

3.1.3 Using Plotly

plotly barchart

3.2 PIECHARTS

3.2.1 Using Matplotlib

matplotlib piechart

3.2.2 Using Plotly

plotly piechart

3.2.3 Piecharts are not avaliable in Seaborn.

3.3 MAPS

3.3.1 Using Plotly

We first Load geojson file nigeria.

Load geojson

Then create a dictionary called “state_id_map” from the features of the geojson, so it can be used to identify states

state_id_map dictionary

Read the edited dataframe(Federal Capital Territory was changed to FCT, Abuja)and remove whitspaces and States:

strip states and remove whitespaces to match the dictionary

Create new id column that uses the id in the state_id_map dictionary to identify state:

Create new id column that uses the id in the state_id_map dictionary

Use the chloropleth to plot:

Use the chloropleth to plot
Plotly map

Using mapbox for a better visual

Plotly mapbox

NB : NO suitable method for ploting maps in seaborn and Matplotlib

4 VISUALIZATIONS IN TABLEAU

Tableau is the fastest growing data visualization and data analytics tool that aims to help people see and understand data.

It is used in Data analysis ,as it is a powerful visualization tool in the business intelligence industry.

4.1 Using Barcharts

Tableau Barcharts

4.2 Using Piecharts

Tableau Piecharts

4.3 Using MAP

Tableau MAP

5 CONCLUSION

This project demonstrated a simple way to get static data from a wikipedia page using beautiful soup,it Also compared the Matplotlib,Seaborn, and Plotly libraries with Tableau. Tableau is the fastest method to get visualizations from data, and plotly is the most interactive library for visualization.

WRITER: OLUYEDE SEGUN . A(jnr)

Tableau Dashboard Link: https://public.tableau.com/profile/oluyede.segun#!/

Explanatory Notebook and dataset: https://github.com/juniorboycoder/DATASCRAPING

linkelin profile: https://www.linkedin.com/in/oluyede-segun-jr-a-a5550b167/

twitter profile: https://twitter.com/oluyedejun1

TAGS: #TABLEAU #DATASCIENCE #DATAVISUALIZATION #DATASCRAPING #PYTHON #PANDAS #BEAUTIFULSOUP #SEABORN #MATPLOTLIB #PLOTLY

--

--

oluyede Segun (jr)

Certified I.T specialist | Computer Network Admin | Cloud | Artificial intelligence ( Machine Learning & Data Science),& webdev. python/JavaScript language