The steps to calculate summary statistics for numerical columns using pandas methods like describe()
, mean()
, median()
, and std()
. We will also create frequency tables for categorical columns.
-
Load the Titanic Dataset:
- Ensure the dataset is loaded into a pandas DataFrame.
import pandas as pd !wget https://raw.githubusercontent.com/drshahizan/dataset/main/titanic/train.csv -O train.csv df = pd.read_csv('train.csv')
-
Use the describe() Method:
- The
describe()
method provides a summary of statistics for numerical columns.
df.describe()
- The
-
Calculate Mean, Median, and Standard Deviation:
- Use the
mean()
,median()
, andstd()
methods to calculate these statistics for numerical columns.
mean_values = df.mean() median_values = df.median() std_values = df.std()
- Use the
-
Identify Categorical Columns:
- Use the
select_dtypes()
method to select columns with object data type (typically used for categorical data).
categorical_columns = df.select_dtypes(include=['object']).columns
- Use the
-
Create Frequency Tables:
- Use the
value_counts()
method to create frequency tables for each categorical column.
frequency_tables = {col: df[col].value_counts() for col in categorical_columns}
- Use the
-
Load the Titanic Dataset:
import pandas as pd df = pd.read_csv('train.csv')
-
Use the describe() Method:
summary_statistics = df.describe() print(summary_statistics)
-
Calculate Mean, Median, and Standard Deviation:
mean_values = df.mean() median_values = df.median() std_values = df.std() print("Mean Values:\n", mean_values) print("Median Values:\n", median_values) print("Standard Deviation Values:\n", std_values)
-
Identify Categorical Columns:
categorical_columns = df.select_dtypes(include=['object']).columns print("Categorical Columns:\n", categorical_columns)
-
Create Frequency Tables:
frequency_tables = {col: df[col].value_counts() for col in categorical_columns} for col, freq_table in frequency_tables.items(): print(f"Frequency Table for {col}:\n{freq_table}\n")
By following these steps, you will have calculated summary statistics for numerical columns and created frequency tables for categorical columns in the Titanic dataset.
Please create an Issue for any improvements, suggestions or errors in the content.
You can also contact me using Linkedin for any other queries or feedback.