Skip to content

Latest commit

 

History

History
18 lines (10 loc) · 1.78 KB

README.md

File metadata and controls

18 lines (10 loc) · 1.78 KB

Project 2 - Cancer Data Analysis

Description:

This project consists in the application of machine learning models and algorithms related to supervised learning. Our dataset consists in cancer data. The goal is to predict whether a patient has cancer or not. The dataset is composed of 30 attributes and 1 class. The class is: B for benign, M for malignant. They work like a boolean (0 or 1), being 1 an cell with cancer. The dataset is composed of 570 instances (cells). The dataset is available in the file Cancer_Data.csv.

Supervised learning includes the following steps: dataset analysis to check for the need for data pre-processing, identification of the target concept, definition of the training and test sets, selection and parameterization of the learning algorithms to employ, and evaluation of the learning process (in particular on the test set). At least 3 supervised learning (classification) algorithms should be employed (Decision Trees, Neural Networks, K-NN, SVM, ...) but more may be employed and compared using the Scikit-Learn Python library and considering the characteristics of the dataset. Results should be compared using tables or plots (e.g., using Seaborn or Matplotlib libraries).

How to run:

You can run the program by running the cells present in the Jupyter Notebook developed. The notebook is called cancer_notebook.ipynb and is present in the root directory of the project. You can see the results achived by us just seeing the output present in the notebook. The following libraries were needed: pandas, seaborn, numpy, copy, sklearn, matplotlib, time, tensorflow.

Group Members: