DATA INSIGHTS(PART 2) : DATA VISUALIZATIONS WITH SEABORN.
DATA VISUALIZATION USING SEABORN (Python)
DATASET: STUDENT PERFORMANCE DATASET
link: https://www.kaggle.com/spscientist/students-performance-in-exams?select=StudentsPerformance.csv
GOAL : SHOWING THE EFFECT OF STUDENTS PARENTS BACK GROUND, STUDENT TEST PREPARATION. etc ON STUDENT PERFORMANCE.
SEABORN VISUALIZATIONS USED: Regplots, linecharts,scatterplots,boxplot, barcharts
ORGANIZING DATA FOR VISUALIZATION
(a) Quick look at data
(b) create the student performance column using the average of : math, reading and writing score
(c ) count all columns to know number of times they appear
DATA VISUALIZATIONS
- USING SEABORN RELPOTS :
for visualizing statistical relationships and to compare the number of math score and student performance with the test preparation of students
Observations: the student with no test preparation scored below 20
2. USING LINE CHARTS
Line charts use data point “markers” that are connected by straight lines to aid in visualization
To determine the lowest reading score for students performance using the test preparation
observations: student performed better when the test preparation was completed for reading
NB: Other columns like math score… can compared similarly
2(ii) student performance vs reading score using parental education
observation: masters degree and bachelor have the highest score, and some high school have the lowest score. this means that the parental level of education affects the student performance for reading score
3 USING SCATTERPLOT MOST EFFICIENT FOR THIS PROBLEM
A scatter plot is a set of points plotted on a horizontal and vertical axes. Scatter plots show the extent of correlation,
3(i) test prepartion course vs student performance
observations: more students that completed there test preparation failed minimumly at 38, while students with no preparation failed minimumly at 4
3(ii) race/ethnicity vs student performance
3(iii) parental level of education vs student performance
(optional) SCATTERPLOT WHEN SCORE IS LESS THAN 50 for student performance vs test preparation course
4 USING BOXPLOTS:
a box plot or boxplot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending from the boxes (whiskers) indicating variability outside the upper and lower quartiles , also for visualizing the density of data.
Student performance vs test preparation course
observations: students that completed there test preparation are more btw 80 and 60 passmarks, while students with no preparation are more btw 80 and 45 passmarks.
5. BARCHATS:
A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally.
observations: Completed column scored highest
6. OTHER USEFUL SEABORN VISUALIZATION INCLUDE :
(a) Heatmaps
import numpy as np; np.random.seed(0)
>>> import seaborn as sns; sns.set()
>>> uniform_data = np.random.rand(10, 12)
>>> ax = sns.heatmap(uniform_data)
(b) distribution plot:
x = np.random.normal(size=100)
sns.distplot(x);
(c ) joint distribution
with sns.axes_style('white'):
sns.jointplot("total_bill", "tip", data=tips, kind='hex')
REFER TO DOCUMENTATIONS FOR MORE EXAMPLES: https://seaborn.pydata.org/examples/index.html
7. CONCLUSION
We successfully compared seaborn visualization on the student performance, using relplots, linecharts, scatterplots, boxplots, and barcharts
WRITER: OLUYEDE SEGUN . A(jnr)
Explanatory Notebook and dataset:
https://github.com/juniorboycoder/DATA-VISUALIZATION-WITH-SEABORN
linkelin profile: https://www.linkedin.com/in/oluyede-segun-jr-a-a5550b167/
twitter profile: https://twitter.com/oluyedejun1
TAGS: #SEABORN #ML #DATASCIENCE #DATAVISUALIZATION