###### 编程技术网

 用户名 Email 自动登录 找回密码 密码 立即注册

# 分析美国人口普查数据的收入水平

D10001 数据库/缓存 2022-1-19 10:35 153人围观

This article was published as a part of the Data Science Blogathon.

In this article, we will be predicting the income of US people based on the US census data and later we will be concluding whether that individual American have earned more or less than 50000 dollars a year. If you want to know more about the dataset visit this link.

1. Exploratory data analysis: Learn Exploratory data analysis on the complex dataset.
2. Data Insights: Visualizing the data and getting the business-related insights using data visualization.
3. Visualization Library: Learn about the powerful visualization library i.e.Plotly and Dexplot.

500: Internal Server Error500: Internal Server Error

```import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objs as go
from plotly.offline import iplot```

```df = pd.read_csv(r"D:Data Science projectsUS census income predictionPredicting the Income Level- US census dataadult.csv")

`df.columns`

```Index(['39', ' State-gov', ' 77516', ' Bachelors', ' 13', ' Never-married',
' Adm-clerical', ' Not-in-family', ' White', ' Male', ' 2174', ' 0',
' 40', ' United-States', ' <=50K'],
dtype='object')```

```df.drop([' 2174', ' 0', ' 40'], axis = 'columns', inplace = True)

```df.columns = ['Age', 'Type_of_Owner', 'id', 'Education', 'No_of_Projects_Done',
'Marital_Status', 'Job_Designation', 'Family_Relation', 'Race', 'Gender',
'Country', 'Salary']```

`df.head()`

`df.shape`

`(32560, 12)`

`df.info()`

`df.describe()`

`df.isnull().sum()`

```labels = df['Type_of_Owner'].value_counts().index
values = df['Type_of_Owner'].value_counts().values

colors = df['Type_of_Owner']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```labels = df['Age'].value_counts()[:10].index
values = df['Age'].value_counts()[:10].values

colors = df['Age']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```labels = df['Education'].value_counts().index
values = df['Education'].value_counts().values

colors = df['Education']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```labels = df['No_of_Projects_Done'].value_counts().index
values = df['No_of_Projects_Done'].value_counts().values

colors = df['No_of_Projects_Done']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```labels = df['Marital_Status'].value_counts().index
values = df['Marital_Status'].value_counts().values

colors = df['Marital_Status']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```labels = df['Job_Designation'].value_counts().index
values = df['Job_Designation'].value_counts().values

colors = df['Job_Designation']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```labels = df['Family_Relation'].value_counts().index
values = df['Family_Relation'].value_counts().values

colors = df['Family_Relation']

fig = go.Figure(data = [go.Pie()])

fig.show()```

`df['Race'].unique()`

```array([' White', ' Black', ' Asian-Pac-Islander', ' Amer-Indian-Eskimo',
' Other'], dtype=object)```

```labels = df['Race'].value_counts().index
values = df['Race'].value_counts().values

'#1d4466',
'#2678bf',
'#2c6699']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```labels = df['Gender'].value_counts().index
values = df['Gender'].value_counts().values

'#2c6699']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```labels = df['Salary'].value_counts().index
values = df['Salary'].value_counts().values

'#2c6699']

fig = go.Figure(data = [go.Pie()])

fig.show()```

```import dexplot as dxp

dxp.count(
val="Age",
data = df,
split="Type_of_Owner",
stacked = True,
figsize=(12,12))```

```dxp.count(
val="Age",
data = df,
split="Marital_Status",
stacked = True,
figsize=(12,12))```

```dxp.count(
val="Age",
data = df,
split="Job_Designation",
stacked = True,
figsize=(12,12))```

```dxp.count(
val="Age",
data = df,
split="Race",
stacked = True,
figsize=(12,12))```

```dxp.count(
val="Age",
data = df,
split="Gender",
stacked = True,
figsize=(12,12))```

```dxp.count(
val="Age",
data = df,
split="Salary",
stacked = True,
figsize=(12,12))```

Greeting to everyone, I’m currently working in TCS and previously, I worked as a Data Science Analyst in Zorba Consulting India. Along with full-time work, I’ve got an immense interest in the same field, i.e. Data Science, along with its other subsets of Artificial Intelligence such as Computer Vision, Machine learning, and Deep learning; feel free to collaborate with me on any project on the domains mentioned above (LinkedIn).

Here you can access my other articles, which are published on Analytics Vidhya as a part of the Blogathon (link).

^