[Outreachy applications] Startup task: Train and test a classification model #2

dzeber · 2020-03-04T20:02:52Z

This is a good way to get started with the environment and the problem domain. It will also provide the basis for a test case for future work. At a minimum, you should:

load a dataset from the repo
train a classification model from scikit-learn
compute an evaluation metric on a held-out test set

Feel free to include any additional steps you feel are relevant or you are interested in trying out, such as:

basic exploratory analysis of the dataset
data preprocessing
hyperparameter tuning

When you start work on this task, please post a comment here indicating which dataset and model you will be working with so that other contributors can avoid duplicating your work.

KaairaGupta · 2020-03-05T16:53:13Z

Hey!
Can I work on this??

tab1tha · 2020-03-05T17:02:35Z

I'll use the generated.csv dataset.

dzeber · 2020-03-05T17:04:01Z

@KaairaGupta yes! in fact everyone is encouraged to work on this issue first

KaairaGupta · 2020-03-05T17:09:15Z

Okay.

I'll use winequality.csv dataset

mhmohona · 2020-03-05T17:30:00Z

Hello @dzeber! I want to work on defaults.csv.

dzeber · 2020-03-05T17:37:11Z

Sounds good! To be clear, it's also fine to work on the same dataset as someone else, as there are only 5 datasets currently. The goal is just to collect a diversity of models and approaches across these datasets.

shreyagupta30 · 2020-03-05T17:50:24Z

Hello @dzeber, I'd like to work on eeg.csv.

Clare-Joyce · 2020-03-05T18:02:19Z

Hello @dzeber, I will work with the winequality.csv

elie-wanko · 2020-03-05T19:57:55Z

Hello @dzeber, I will be working on the defaults.csv dataset.

BBimie · 2020-03-05T20:04:43Z

Hello @dzeber, I'll work with the winequality.csv

hammedb197 · 2020-03-05T20:10:27Z

Hello @dzeber, I will be working on vehicles.csv.

shiza16 · 2020-03-05T20:59:15Z

Hello @dzeber , I'd like to work on vehicle.csv .

ghost · 2020-03-06T03:50:42Z

I would like to work on eeg.csv dataset.

alberginia · 2020-03-06T06:02:26Z

I have been working on the generated.cvs dataset with SVMs.

Soniyanayak51 · 2020-03-06T08:15:48Z

I would like to work on defaults.csv @dzeber

silvererudite · 2020-03-06T09:18:43Z

I would be working on vehicles.csv @dzeber are we required to push a PR for this issue?

blancadesal · 2020-03-06T10:12:21Z

Hi all! I will be working on the winequality.csv dataset.

SanchiMittal · 2020-03-06T10:24:27Z

I will be working with winequality.csv dataset.

aakankshadhurandhar · 2020-03-06T13:07:25Z

@dzeber I will be working on the default.csv dataset.

Bolaji61 · 2020-03-06T14:07:42Z

Hi @dzeber , I will be working on the vehicles.csv dataset.

NailaNeena · 2020-03-06T16:53:03Z

Hello @dzeber, How can we submit the task after completion?

dzeber · 2020-03-06T19:21:16Z

When you are done with this task, please submit a PR following the guidelines listed in the README.

maxtastu · 2020-03-07T11:39:37Z

Hello everyone, I am working on the wine quality dataset. I didn't pick a model yet.

pratyush-ragini · 2020-03-07T12:16:48Z

Hi @dzeber , I shall be working on the vehicles.csv dataset

NailaNeena · 2020-03-07T12:33:57Z

Hello @dzeber, I have some question regarding training data-set. I am using Vehicles data-set, I load the dataset. But my question is that should we train it against some rows or against some columns?

janvi04 · 2020-03-07T12:40:05Z

Hi @dzeber , I will be working on wine quality dataset.

asthad16 · 2020-03-07T15:48:23Z

hello, i am beginning my learning and exploration by working on wine quality dataset.

NailaNeena · 2020-03-08T10:03:41Z

Is there any one who could till me that how to identify the dependent and independent variable in a data-set?
like in Vehicles what will be the dependent variable?

these committed changes fixes issue #3 of traversal space of train-test splits using KNN model.in #2 i have used decision tree and further recommended outlier detection algorithm for classification. so in this PR i have used KNN and compared results with previous classfication.this PR uses already defined modules in #2.

asthad16 · 2020-03-25T09:11:34Z

i have done the modifications requested in the same PR #26. please review it. i m sorry for the delay because i was not well.

* Create Readme.md * Create files for exploring issue #2 * Format using black * Remove notebook from master * Increase modularization * create file for issue 6 * remove file added by mistake * Create notebook for issue 6 * Re-upload to the right folder * Delete file from the incorrect folder

* Update .gitignore * Preliminary Analysis * Helper modules (Bar and Hist graph) * Rough KNN algorithm implemented * Delete libraries.py * KNN classifier refactored and polished Returns only variable of intests for use the metrics calculations. * refactored for performance just the required functions imported * draft mlp classifier implemented to be reviewed * ... * Threshold conversion logic implemented Since knn.predict calculates a probability, we implement a logic for binary classification * Prelimary cleaning and knn model classification implemented! * Adjusted plor error with title placement * ... * Files reformated with 'Black' * Logistic Regression classifier * Refactores modules to improve modularity * Implemented Log Reg * Deleted mpl module to focus on knn and log reg * Refactors gotignore to my personal folder * refactored for readability * Implementation to add counts and relative percentages on bars graph * Refactored name #2, Completed Prelimary Analysis and Interpreted Results * Update Issue #2 - Train and test a classification model (PRESC).ipynb * Files reformated with 'Black' * Display Error corrected * Interpreted choice of hyper-parameters * Refactored and Added Modules used for Issue 3 * Prelimanry Analysis - Traversal of the space of train_test splits * Issue#3 complete * Removed Issues #2 and #3 ipynb * Issue #4 - completed Issue #4 - Traversal of the space of cross-validation folds * Delete defaults_data.csv Removing duplication of the existing data set which can be loaded from the repos root directory. Co-authored-by: mlopatka <[email protected]>

* Classification model wine.csv * Classification model wine.csv * Merging modifications

#2 Dropped quality, shifted the logic to python file, shifted imports to the top, added confusion_matrix and classification_report

…el (Stochastic Gradient Descent) on winequality.csv (#58) * adds incomplete files * adds .ipynb, .py and updates environment.yml * Delete winequality.ipynb removing duplicate files * Delete winequality_modules.py removing duplicate files * Delete winequality.ipynb removing incomplete files * Delete winequality_modules.py removing incomplete files * adds .ipynb, .py and updates environment.yml * adds description and deatiled reasoning for the methods, models and parameters used * drops quality column * updates .py file * adds files in a new folder * updates .yml

* WIP: #2 on the dataset 'eeg.csv' WIP: #2 on the dataset 'eeg.csv' * Add files via upload * Delete WIP: #2 on the dataset 'eeg.csv' * Delete #2 Train and test a classification model, eeg.csv-checkpoint.ipynb * WIP: #2 on the dataset 'eeg.csv' * Delete #2 Train and test a classification model, eeg.csv-checkpoint.ipynb * WIP: #2 Train and test a classification model, eeg.csv dataset * Delete #2 Train and test a classification model, eeg.csv.ipynb * Create README * WIP: #2 Train and test a classification model, eeg.csv dataset * Delete README

Updating KaairaGupta/master

For #2: on the dataset 'winequality.csv'

#3 traversal of train_test_split

* WIP:Issue #2 KNN Classifier for eeg.csv -Added separate modules for preprocessing eeg.csv -Added a notebook with the results -This commit addresses the startup task - Issue #2 * Updated Notenook results Updated the results in the notebook for review Co-authored-by: swatik718 <>

Exploration of the Vehicles dataset based on Startup task #2

* Create Readme.md * Create files for exploring issue #2 * Format using black * Remove notebook from master * Increase modularization * create file for issue 6 * remove file added by mistake * WIP: Importance score for datapoints * Re-upload to the correct directory * Delete file from wrong directory

* Update .gitignore * Preliminary Analysis * Helper modules (Bar and Hist graph) * Rough KNN algorithm implemented * Delete libraries.py * KNN classifier refactored and polished Returns only variable of intests for use the metrics calculations. * refactored for performance just the required functions imported * draft mlp classifier implemented to be reviewed * ... * Threshold conversion logic implemented Since knn.predict calculates a probability, we implement a logic for binary classification * Prelimary cleaning and knn model classification implemented! * Adjusted plor error with title placement * ... * Files reformated with 'Black' * Logistic Regression classifier * Refactores modules to improve modularity * Implemented Log Reg * Deleted mpl module to focus on knn and log reg * Refactors gotignore to my personal folder * refactored for readability * Implementation to add counts and relative percentages on bars graph * Refactored name #2, Completed Prelimary Analysis and Interpreted Results * Update Issue #2 - Train and test a classification model (PRESC).ipynb * Files reformated with 'Black' * Display Error corrected * Interpreted choice of hyper-parameters * Exported clean data to csv * Update Issue #2 - Train and test a classification model.ipynb * Update Issue #2 - Train and test a classification model.ipynb * Create Issue #2 - Train and test a classification model.ipynb * Issue#2 complete Interpretation of hyperparameters and html fil added. * Updates on Issue#2 Net attempting PCA and WoE, to invetigate models performances * Merge conflicts fixed, Update observations

fixed issue #2

* Create Readme.md * Create files for exploring issue #2 * Format using black * Remove notebook from master * Create file for cross-validation exploration * Resolve conflict and update * Attempt to resolve conflict

…cles dataset (#162) * Removed an imported package not used in the code * removed files not meant to be on the master branch * Combined _model() functions into a single function * Update files to adhere to black formatting * Updating files to pass black formatting * Presenting the results of evaluation in place of hard-coding * Deleted a module not in use

dzeber added the good first issue Good for newcomers label Mar 5, 2020

mhmohona mentioned this issue Mar 7, 2020

Classification of default of Credit Card clients Data Set #17

Merged

arizzogithub mentioned this issue Mar 22, 2020

WIP: #2 Train and test a classification model, eeg.csv dataset #111

Merged

pratyush-ragini added a commit to pratyush-ragini/PRESC that referenced this issue Mar 24, 2020

mozilla#2 on vehicles.csv dataset

1aefcfc

urvigodha mentioned this issue Mar 24, 2020

Startup task: issue#2 attempt#1 #120

Merged

asthad16 mentioned this issue Mar 25, 2020

Asthad16 issue3 train test split #122

Merged

mlopatka pushed a commit that referenced this issue Mar 27, 2020

issue #2 Training classification model for winequality.csv dataset (#92)

5dc5a98

* Classification model wine.csv * Classification model wine.csv * Merging modifications

dzeber added a commit that referenced this issue Mar 27, 2020

Merge pull request #40 from simran0117/master

885e7b2

#2 Dropped quality, shifted the logic to python file, shifted imports to the top, added confusion_matrix and classification_report

mlopatka closed this as completed in #58 Mar 27, 2020

mlopatka reopened this Mar 27, 2020

dzeber pushed a commit that referenced this issue Mar 28, 2020

Merge pull request #2 from mozilla/master

b13e89f

Updating KaairaGupta/master

dzeber added a commit that referenced this issue Mar 28, 2020

Merge pull request #18 from KaairaGupta/master

d08cd19

For #2: on the dataset 'winequality.csv'

dzeber pushed a commit that referenced this issue Mar 30, 2020

Merge pull request #2 from asthad16/asthad16-issue3-train_test_split

f9f64f4

#3 traversal of train_test_split

dzeber added a commit that referenced this issue Mar 30, 2020

Merge pull request #77 from opeyemiferanmi1/My-contributions

a5b823e

Exploration of the Vehicles dataset based on Startup task #2

msmelo mentioned this issue Apr 1, 2020

Traversal of train test splits and cross validation and Visualization for misclassifications #142

Merged

dzeber added a commit that referenced this issue Apr 2, 2020

Merge pull request #87 from ishagarg06/master

9f7a293

fixed issue #2

dzeber pushed a commit that referenced this issue Jul 13, 2020

Contribution to issue #2

1c44553

dzeber changed the title ~~Startup task: Train and test a classification model~~ [Outreachy applications] Startup task: Train and test a classification model Jul 13, 2020

dzeber closed this as completed Jul 14, 2020

arizzogithub added a commit to arizzogithub/PRESC that referenced this issue Aug 27, 2020

Update mozilla#2 Train and test a classification model, eeg.csv.ipynb

af55fae

arizzogithub added a commit to arizzogithub/PRESC that referenced this issue Jul 3, 2022

Update mozilla#2 Train and test a classification model, eeg.csv.ipynb

476100b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Outreachy applications] Startup task: Train and test a classification model #2

[Outreachy applications] Startup task: Train and test a classification model #2

dzeber commented Mar 4, 2020 •

edited

Loading

KaairaGupta commented Mar 5, 2020

tab1tha commented Mar 5, 2020

dzeber commented Mar 5, 2020

KaairaGupta commented Mar 5, 2020

mhmohona commented Mar 5, 2020

dzeber commented Mar 5, 2020

shreyagupta30 commented Mar 5, 2020

Clare-Joyce commented Mar 5, 2020

elie-wanko commented Mar 5, 2020

BBimie commented Mar 5, 2020

hammedb197 commented Mar 5, 2020

shiza16 commented Mar 5, 2020

ghost commented Mar 6, 2020

alberginia commented Mar 6, 2020

Soniyanayak51 commented Mar 6, 2020

silvererudite commented Mar 6, 2020 •

edited

Loading

blancadesal commented Mar 6, 2020

SanchiMittal commented Mar 6, 2020

aakankshadhurandhar commented Mar 6, 2020

Bolaji61 commented Mar 6, 2020

NailaNeena commented Mar 6, 2020

dzeber commented Mar 6, 2020

maxtastu commented Mar 7, 2020

pratyush-ragini commented Mar 7, 2020

NailaNeena commented Mar 7, 2020

janvi04 commented Mar 7, 2020

asthad16 commented Mar 7, 2020

NailaNeena commented Mar 8, 2020

asthad16 commented Mar 25, 2020

[Outreachy applications] Startup task: Train and test a classification model #2

[Outreachy applications] Startup task: Train and test a classification model #2

Comments

dzeber commented Mar 4, 2020 • edited Loading

KaairaGupta commented Mar 5, 2020

tab1tha commented Mar 5, 2020

dzeber commented Mar 5, 2020

KaairaGupta commented Mar 5, 2020

mhmohona commented Mar 5, 2020

dzeber commented Mar 5, 2020

shreyagupta30 commented Mar 5, 2020

Clare-Joyce commented Mar 5, 2020

elie-wanko commented Mar 5, 2020

BBimie commented Mar 5, 2020

hammedb197 commented Mar 5, 2020

shiza16 commented Mar 5, 2020

ghost commented Mar 6, 2020

alberginia commented Mar 6, 2020

Soniyanayak51 commented Mar 6, 2020

silvererudite commented Mar 6, 2020 • edited Loading

blancadesal commented Mar 6, 2020

SanchiMittal commented Mar 6, 2020

aakankshadhurandhar commented Mar 6, 2020

Bolaji61 commented Mar 6, 2020

NailaNeena commented Mar 6, 2020

dzeber commented Mar 6, 2020

maxtastu commented Mar 7, 2020

pratyush-ragini commented Mar 7, 2020

NailaNeena commented Mar 7, 2020

janvi04 commented Mar 7, 2020

asthad16 commented Mar 7, 2020

NailaNeena commented Mar 8, 2020

asthad16 commented Mar 25, 2020

dzeber commented Mar 4, 2020 •

edited

Loading

silvererudite commented Mar 6, 2020 •

edited

Loading