🌟 Hit star button to save this repo in your profile

Pandas

Pandas is a popular Python library for data manipulation and analysis, and it offers a wide range of functionalities that are particularly useful for conducting Exploratory Data Analysis (EDA). Here are some common pandas syntax and functions suitable for EDA:

Loading Data:

Read data from various file formats (e.g., CSV, Excel, SQL database):

import pandas as pd
!wget https://raw.githubusercontent.com/drshahizan/dataset/main/titanic/train.csv -O train.csv
df = pd.read_csv('train.csv')

Data Summary:
- Get basic information about the dataset:
```
df.info()
```
- Display summary statistics for numerical columns:
```
df.describe()
```
- View the first few rows of the dataset:
```
df.head()
```

Data Cleaning and Handling:

Handle missing values:

df.isna().sum()  # Check for missing values
df.dropna()       # Drop rows with missing values
df.fillna(value)  # Fill missing values with a specified value

Remove duplicates:
```
df.drop_duplicates()
```

Data Selection and Slicing:
- Select specific columns:
```
df['column_name']
```
- Select rows based on conditions:
```
df[df['column_name'] > 50]
```

Data Visualization:

Create basic visualizations:

import matplotlib.pyplot as plt
df['column_name'].plot(kind='hist')
plt.show()

Pair plots for exploring relationships between multiple variables:
```
import seaborn as sns
sns.pairplot(df)
```

Grouping and Aggregation:
- Group data by a column and calculate statistics:
```
df.groupby('category_column').mean()
```
Correlation Analysis:
- Compute the correlation matrix:
```
df.corr()
```

Outlier Detection:

Identify outliers using z-scores:

from scipy import stats
z_scores = np.abs(stats.zscore(df['column_name'])

Data Transformation:

Apply functions to columns:

df['column_name'] = df['column_name'].apply(function)

Apply transformations (e.g., log transformation):
```
df['column_name'] = np.log(df['column_name'])
```

Categorical Variables:
- Get frequency counts of unique values:
```
df['category_column'].value_counts()
```
Data Export:
- Save the modified DataFrame to a new file:
```
df.to_csv('new_data.csv', index=False)
```

These are some of the common pandas syntax and functions you can use for EDA. Depending on your specific dataset and analysis goals, you may need to use additional pandas functions and techniques to explore and analyze your data effectively.

Contribution 🛠️

Please create an Issue for any improvements, suggestions or errors in the content.

You can also contact me using Linkedin for any other queries or feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas.md

pandas.md

Pandas

Contribution 🛠️

Files

pandas.md

Latest commit

History

pandas.md

File metadata and controls

Pandas

Contribution 🛠️