Applying different machine learning algorithms on the famous Titanic dataset
Source of the dataset: https://www.kaggle.com/c/titanic/data
The purpose of this repository to demonstrate different classification algorithms on the same dataset. Since it is a well-known dataset I did not made any exploratory data analysis. Different notebooks will be add in the future.
Let's have a look at the dataset.
Pass.Id | Surv. | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Emb. |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
'
Variable Notes (Source: Kaggle)
pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower
age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
sibsp: The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored)
parch: The dataset defines family relations in this way... Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children travelled only with a nanny, therefore parch=0 for them.
History:
2019.12.22.: Random forest classification : Titanic_random_forest_classification.ipynb
2019.12.26.: K-nearest neighbors classification: Titanic_K-nearest_neighbors_classification.ipynb