US State Incomes and Literacy Rates

Learn to analyze the relationship between literacy rates and income levels across US states using Python. This lesson guides you through importing data, classifying incomes by quantiles, merging datasets, and creating plots. Understand why literacy may correlate with income rather than cause differences in economic groups.

We'll cover the following...

US Literacy rates
Incomes of each state
Classifying income of each state
Merging and plotting of the data
Jupyter notebook in action

Classifying income of each state

In the income dataset, we have the mean wages of each state. Using the income data, we are going to classify the states into four groups. But first, we will convert the Mean wage column type to integer. Next, we will use the command dfv.quantile to find the three quantiles which will help us divide the data into four equal parts and group the wages into the following classes:

Low income: The first quarter (0% to 25%).
Lower middle income: The second quarter (25% to 50%).
Upper middle income: The third quarter (50% to 75%).
High income: The fourth quarter (75% to 100%).

Python 3.5

import pandas as pd
import matplotlib as plt
import numpy as np
dfv = pd.read_csv('US_annual_income_by_states.csv')
dfv.rename(columns={'Stateor territory':'State'}, inplace=True)
dfv.rename(columns={'Mean wage in US$[4]':'Mean wage'}, inplace=True)
dfv = dfv[['State', 'Mean wage']]
dfv['Mean wage'] = dfv['Mean wage'].str.replace("$","")
dfv['Mean wage'] = dfv['Mean wage'].str.replace(",","")
nan_value = float("NaN")
dfv['Mean wage'].replace("No data", nan_value, inplace=True)
dfv = dfv.dropna(subset=['Mean wage'])
dfv['Mean wage'] = dfv['Mean wage'].astype(int)
q = dfv.quantile([0.25, 0.50, 0.75])
dfv['Income group'] = float("NaN")
col = 'Mean wage'
for row in dfv.index:
  if dfv[col][row]<q[col][0.25]:
    dfv['Income group'][row] = 'Low income'
  if ((dfv[col][row]>=q[col][0.25]) & (dfv[col][row]<q[col][0.50])):
    dfv['Income group'][row] = 'Lower middle income'
  if ((dfv[col][row]>=q[col][0.50]) & (dfv[col][row]<q[col][0.75])):
    dfv['Income group'][row] = 'Upper middle income'
  if dfv[col][row]>=q[col][0.75]:
    dfv['Income group'][row] = 'High income'
dfv = dfv.dropna(subset=['Income group'])
dfv = dfv[['State', 'Income group']]
print (dfv)

Python 3.5

import pandas as pd
import matplotlib as plt
import numpy as np
import plotly.graph_objs as go
import plotly.express as px
df = pd.read_csv('US_literacy_rate_by_states.csv')
df.rename(columns={'Literacy Rate (%)':'Literacy Rate'}, inplace=True)
df = df[['State','Literacy Rate']]
df = df.dropna(subset=['Literacy Rate'])
dfv = pd.read_csv('US_annual_income_by_states.csv')
dfv.rename(columns={'Stateor territory':'State'}, inplace=True)
dfv.rename(columns={'Mean wage in US$[4]':'Mean wage'}, inplace=True)
dfv = dfv[['State', 'Mean wage']]
dfv['Mean wage'] = dfv['Mean wage'].str.replace("$","")
dfv['Mean wage'] = dfv['Mean wage'].str.replace(",","")
nan_value = float("NaN")
dfv['Mean wage'].replace("No data", nan_value, inplace=True)
dfv = dfv.dropna(subset=['Mean wage'])
dfv['Mean wage'] = dfv['Mean wage'].astype(int)
q = dfv.quantile([0.25, 0.50, 0.75])
dfv['Income group'] = float("NaN")
col = 'Mean wage'
for row in dfv.index:
  if dfv[col][row]<q[col][0.25]:
    dfv['Income group'][row] = 'Low income'
  if ((dfv[col][row]>=q[col][0.25]) & (dfv[col][row]<q[col][0.50])):
    dfv['Income group'][row] = 'Lower middle income'
  if ((dfv[col][row]>=q[col][0.50]) & (dfv[col][row]<q[col][0.75])):
    dfv['Income group'][row] = 'Upper middle income'
  if dfv[col][row]>=q[col][0.75]:
    dfv['Income group'][row] = 'High income'
dfv = dfv.dropna(subset=['Income group'])
dfv = dfv[['State', 'Income group']]
merged_data = pd.merge(dfv ,df, on='State')
print(merged_data)
fig = px.scatter(merged_data, x="Literacy Rate", y="Income group",
        log_x=False,
        hover_data=["Literacy Rate", "Income group", "State"])
fig.update_yaxes(categoryorder='array', categoryarray= ['Low income','Lower middle income','Upper middle income','High income'])
fig.write_image("output/graph.png")

1.Before We Begin

2.Comparing Wages With Consumer Price Index Data

3.Wages and CPI: Reality Check

4.Working With Major US Storm Data

Project

5.Property Rights and Economic Development

6.How Representative Is Your Government?

7.Does Wealth Influence The Prevalence Of Mental Illness?

8.Do Birthdays Make Elite Athletes?

9.Does Literacy Impact The Income of People

10.Conclusion

11.Appendix

US State Incomes and Literacy Rates

US Literacy rates

Incomes of each state

Classifying income of each state

Merging and plotting of the data

Jupyter notebook in action