# 分析美国人口普查数据的收入水平

This article was published as a part of the Data Science Blogathon.

In this article, we will be predicting the income of US people based on the US census data and later we will be concluding whether that individual American have earned more or less than 50000 dollars a year. If you want to know more about the dataset visit this link.

1. Exploratory data analysis: Learn Exploratory data analysis on the complex dataset.
2. Data Insights: Visualizing the data and getting the business-related insights using data visualization.
3. Visualization Library: Learn about the powerful visualization library i.e.Plotly and Dexplot.

```import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objs as go
from plotly.offline import iplot```

```df = pd.read_csv(r"D:Data Science projectsUS census income predictionPredicting the Income Level- US census dataadult.csv")

`df.columns`

```Index(['39', ' State-gov', ' 77516', ' Bachelors', ' 13', ' Never-married',
' Adm-clerical', ' Not-in-family', ' White', ' Male', ' 2174', ' 0',
' 40', ' United-States', ' <=50K'],
dtype='object')```

```df.drop([' 2174', ' 0', ' 40'], axis = 'columns', inplace = True)

```df.columns = ['Age', 'Type_of_Owner', 'id', 'Education', 'No_of_Projects_Done',
'Marital_Status', 'Job_Designation', 'Family_Relation', 'Race', 'Gender',
'Country', 'Salary']```

`df.head()`

`df.shape`

`(32560, 12)`

`df.info()`

`df.describe()`

`df.isnull().sum()`

```labels = df['Type_of_Owner'].value_counts().index
values = df['Type_of_Owner'].value_counts().values

colors = df['Type_of_Owner']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```labels = df['Age'].value_counts()[:10].index
values = df['Age'].value_counts()[:10].values

colors = df['Age']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```labels = df['Education'].value_counts().index
values = df['Education'].value_counts().values

colors = df['Education']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```labels = df['No_of_Projects_Done'].value_counts().index
values = df['No_of_Projects_Done'].value_counts().values

colors = df['No_of_Projects_Done']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```labels = df['Marital_Status'].value_counts().index
values = df['Marital_Status'].value_counts().values

colors = df['Marital_Status']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```labels = df['Job_Designation'].value_counts().index
values = df['Job_Designation'].value_counts().values

colors = df['Job_Designation']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```labels = df['Family_Relation'].value_counts().index
values = df['Family_Relation'].value_counts().values

colors = df['Family_Relation']

fig = go.Figure(data = [go.Pie()])

fig.show()```

`df['Race'].unique()`

```array([' White', ' Black', ' Asian-Pac-Islander', ' Amer-Indian-Eskimo',
' Other'], dtype=object)```

```labels = df['Race'].value_counts().index
values = df['Race'].value_counts().values

'#1d4466',
'#2678bf',
'#2c6699']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```labels = df['Gender'].value_counts().index
values = df['Gender'].value_counts().values

'#2c6699']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```labels = df['Salary'].value_counts().index
values = df['Salary'].value_counts().values

'#2c6699']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```import dexplot as dxp

dxp.count(
val="Age",
data = df,
split="Type_of_Owner",
stacked = True,
figsize=(12,12))```

```dxp.count(
val="Age",
data = df,
split="Marital_Status",
stacked = True,
figsize=(12,12))```

```dxp.count(
val="Age",
data = df,
split="Job_Designation",
stacked = True,
figsize=(12,12))```

```dxp.count(
val="Age",
data = df,
split="Race",
stacked = True,
figsize=(12,12))```

```dxp.count(
val="Age",
data = df,
split="Gender",
stacked = True,
figsize=(12,12))```

```dxp.count(
val="Age",
data = df,
split="Salary",
stacked = True,
figsize=(12,12))```

^