Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Outreachy applications] Startup task: Train and test a classification model #2

Closed
dzeber opened this issue Mar 4, 2020 · 49 comments · Fixed by #19, #53 or #58
Closed

[Outreachy applications] Startup task: Train and test a classification model #2

dzeber opened this issue Mar 4, 2020 · 49 comments · Fixed by #19, #53 or #58
Labels
good first issue Good for newcomers

Comments

@dzeber
Copy link
Contributor

dzeber commented Mar 4, 2020

This is a good way to get started with the environment and the problem domain. It will also provide the basis for a test case for future work. At a minimum, you should:

Feel free to include any additional steps you feel are relevant or you are interested in trying out, such as:

  • basic exploratory analysis of the dataset
  • data preprocessing
  • hyperparameter tuning

When you start work on this task, please post a comment here indicating which dataset and model you will be working with so that other contributors can avoid duplicating your work.

@dzeber dzeber added the good first issue Good for newcomers label Mar 5, 2020
@KaairaGupta
Copy link
Contributor

Hey!
Can I work on this??

@tab1tha
Copy link
Contributor

tab1tha commented Mar 5, 2020

I'll use the generated.csv dataset.

@dzeber
Copy link
Contributor Author

dzeber commented Mar 5, 2020

@KaairaGupta yes! in fact everyone is encouraged to work on this issue first

@KaairaGupta
Copy link
Contributor

Okay.

I'll use winequality.csv dataset

@mhmohona
Copy link
Contributor

mhmohona commented Mar 5, 2020

Hello @dzeber! I want to work on defaults.csv.

@dzeber
Copy link
Contributor Author

dzeber commented Mar 5, 2020

Sounds good! To be clear, it's also fine to work on the same dataset as someone else, as there are only 5 datasets currently. The goal is just to collect a diversity of models and approaches across these datasets.

@shreyagupta30
Copy link

Hello @dzeber, I'd like to work on eeg.csv.

@Clare-Joyce
Copy link
Contributor

Hello @dzeber, I will work with the winequality.csv

@elie-wanko
Copy link
Contributor

Hello @dzeber, I will be working on the defaults.csv dataset.

@BBimie
Copy link
Contributor

BBimie commented Mar 5, 2020

Hello @dzeber, I'll work with the winequality.csv

@hammedb197
Copy link
Contributor

Hello @dzeber, I will be working on vehicles.csv.

@shiza16
Copy link
Contributor

shiza16 commented Mar 5, 2020

Hello @dzeber , I'd like to work on vehicle.csv .

@ghost
Copy link

ghost commented Mar 6, 2020

I would like to work on eeg.csv dataset.

@alberginia
Copy link
Collaborator

I have been working on the generated.cvs dataset with SVMs.

@Soniyanayak51
Copy link
Contributor

I would like to work on defaults.csv @dzeber

@silvererudite
Copy link

silvererudite commented Mar 6, 2020

I would be working on vehicles.csv @dzeber are we required to push a PR for this issue?

@blancadesal
Copy link

Hi all! I will be working on the winequality.csv dataset.

@SanchiMittal
Copy link
Contributor

I will be working with winequality.csv dataset.

@aakankshadhurandhar
Copy link

@dzeber I will be working on the default.csv dataset.

@Bolaji61
Copy link
Contributor

Bolaji61 commented Mar 6, 2020

Hi @dzeber , I will be working on the vehicles.csv dataset.

@NailaNeena
Copy link

Hello @dzeber, How can we submit the task after completion?

@dzeber
Copy link
Contributor Author

dzeber commented Mar 6, 2020

When you are done with this task, please submit a PR following the guidelines listed in the README.

@maxtastu
Copy link

maxtastu commented Mar 7, 2020

Hello everyone, I am working on the wine quality dataset. I didn't pick a model yet.

@pratyush-ragini
Copy link

Hi @dzeber , I shall be working on the vehicles.csv dataset

@NailaNeena
Copy link

Hello @dzeber, I have some question regarding training data-set. I am using Vehicles data-set, I load the dataset. But my question is that should we train it against some rows or against some columns?

@janvi04
Copy link
Contributor

janvi04 commented Mar 7, 2020

Hi @dzeber , I will be working on wine quality dataset.

@asthad16
Copy link
Contributor

asthad16 commented Mar 7, 2020

hello, i am beginning my learning and exploration by working on wine quality dataset.

@NailaNeena
Copy link

Is there any one who could till me that how to identify the dependent and independent variable in a data-set?
like in Vehicles what will be the dependent variable?

pratyush-ragini added a commit to pratyush-ragini/PRESC that referenced this issue Mar 24, 2020
asthad16 referenced this issue in asthad16/PRESC Mar 25, 2020
these committed changes fixes issue #3 of traversal space of train-test splits using KNN model.in #2 i have used decision tree and further recommended outlier detection algorithm for classification. so in this PR i have used KNN and compared results with previous classfication.this PR uses already defined modules in #2.
@asthad16
Copy link
Contributor

i have done the modifications requested in the same PR #26. please review it. i m sorry for the delay because i was not well.

mlopatka pushed a commit that referenced this issue Mar 27, 2020
* Create Readme.md

* Create files for exploring issue #2

* Format using black

* Remove notebook from master

* Increase modularization

* create file for issue 6

* remove file added by mistake

* Create notebook for issue 6

* Re-upload to the right folder

* Delete file from the incorrect folder
mlopatka added a commit that referenced this issue Mar 27, 2020
* Update .gitignore

* Preliminary Analysis

* Helper modules (Bar and Hist graph)

* Rough KNN algorithm implemented

* Delete libraries.py

* KNN classifier refactored and polished

Returns only variable of intests for use the metrics calculations.

* refactored for performance

just the required functions imported

* draft mlp classifier implemented

to be reviewed

* ...

* Threshold conversion logic implemented

Since knn.predict calculates a probability, we implement a logic for binary classification

* Prelimary cleaning and knn model classification implemented!

* Adjusted plor error with title placement

* ...

* Files reformated with 'Black'

* Logistic Regression classifier

* Refactores modules to improve modularity

* Implemented Log Reg

* Deleted mpl module to focus on knn and log reg

* Refactors gotignore to my personal folder

* refactored for readability

* Implementation to add counts and relative percentages on bars graph

* Refactored name #2, Completed Prelimary Analysis and Interpreted Results

* Update Issue #2 - Train and test a classification model (PRESC).ipynb

* Files reformated with 'Black'

* Display Error corrected

* Interpreted choice of hyper-parameters

* Refactored and Added Modules used for Issue 3

* Prelimanry Analysis - Traversal of the space of train_test splits

* Issue#3 complete

* Removed Issues #2 and #3 ipynb

* Issue #4 - completed

Issue #4 - Traversal of the space of cross-validation folds

* Delete defaults_data.csv

Removing duplication of the existing data set which can be loaded from the repos root directory.

Co-authored-by: mlopatka <[email protected]>
mlopatka pushed a commit that referenced this issue Mar 27, 2020
* Classification model wine.csv

* Classification model wine.csv

* Merging modifications
dzeber added a commit that referenced this issue Mar 27, 2020
#2 Dropped quality, shifted the logic to python file, shifted imports to the top, added confusion_matrix and classification_report
mlopatka pushed a commit that referenced this issue Mar 27, 2020
…el (Stochastic Gradient Descent) on winequality.csv (#58)

* adds incomplete files

* adds .ipynb, .py and updates environment.yml

* Delete winequality.ipynb

removing duplicate files

* Delete winequality_modules.py

removing duplicate files

* Delete winequality.ipynb

removing incomplete files

* Delete winequality_modules.py

removing incomplete files

* adds .ipynb, .py and updates environment.yml

* adds description and deatiled reasoning for the methods, models and parameters used

* drops quality column

* updates .py file

* adds files in a new folder

* updates .yml
@mlopatka mlopatka reopened this Mar 27, 2020
mlopatka pushed a commit that referenced this issue Mar 27, 2020
* WIP: #2 on the dataset 'eeg.csv'

WIP: #2 on the dataset 'eeg.csv'

* Add files via upload

* Delete WIP: #2 on the dataset 'eeg.csv'

* Delete #2  Train and test a classification model, eeg.csv-checkpoint.ipynb

* WIP: #2 on the dataset 'eeg.csv'

* Delete #2  Train and test a classification model, eeg.csv-checkpoint.ipynb

* WIP:  #2 Train and test a classification model, eeg.csv dataset

* Delete #2  Train and test a classification model, eeg.csv.ipynb

* Create README

* WIP: #2 Train and test a classification model, eeg.csv dataset

* Delete README
dzeber pushed a commit that referenced this issue Mar 28, 2020
Updating KaairaGupta/master
dzeber added a commit that referenced this issue Mar 28, 2020
For #2: on the dataset 'winequality.csv'
dzeber pushed a commit that referenced this issue Mar 30, 2020
mlopatka pushed a commit that referenced this issue Mar 30, 2020
* WIP:Issue #2 KNN Classifier for eeg.csv

-Added separate modules for preprocessing eeg.csv
-Added a notebook with the results
-This commit addresses the startup task - Issue #2

* Updated Notenook results

Updated the results in the notebook for review

Co-authored-by: swatik718 <>
dzeber added a commit that referenced this issue Mar 30, 2020
Exploration of the Vehicles dataset based on Startup task #2
mlopatka pushed a commit that referenced this issue Mar 30, 2020
* Create Readme.md

* Create files for exploring issue #2

* Format using black

* Remove notebook from master

* Increase modularization

* create file for issue 6

* remove file added by mistake

* WIP: Importance score for datapoints

* Re-upload to the correct directory

* Delete file from wrong directory
mlopatka pushed a commit that referenced this issue Mar 31, 2020
* Update .gitignore

* Preliminary Analysis

* Helper modules (Bar and Hist graph)

* Rough KNN algorithm implemented

* Delete libraries.py

* KNN classifier refactored and polished

Returns only variable of intests for use the metrics calculations.

* refactored for performance

just the required functions imported

* draft mlp classifier implemented

to be reviewed

* ...

* Threshold conversion logic implemented

Since knn.predict calculates a probability, we implement a logic for binary classification

* Prelimary cleaning and knn model classification implemented!

* Adjusted plor error with title placement

* ...

* Files reformated with 'Black'

* Logistic Regression classifier

* Refactores modules to improve modularity

* Implemented Log Reg

* Deleted mpl module to focus on knn and log reg

* Refactors gotignore to my personal folder

* refactored for readability

* Implementation to add counts and relative percentages on bars graph

* Refactored name #2, Completed Prelimary Analysis and Interpreted Results

* Update Issue #2 - Train and test a classification model (PRESC).ipynb

* Files reformated with 'Black'

* Display Error corrected

* Interpreted choice of hyper-parameters

* Exported clean data to csv

* Update Issue #2 - Train and test a classification model.ipynb

* Update Issue #2 - Train and test a classification model.ipynb

* Create Issue #2 - Train and test a classification model.ipynb

* Issue#2 complete

Interpretation of hyperparameters and html fil added.

* Updates on Issue#2

Net attempting PCA and WoE, to invetigate models performances

* Merge conflicts fixed, Update observations
dzeber added a commit that referenced this issue Apr 2, 2020
mlopatka pushed a commit that referenced this issue Apr 6, 2020
* Create Readme.md

* Create files for exploring issue #2

* Format using black

* Remove notebook from master

* Create file for cross-validation exploration

* Resolve conflict and update

* Attempt to resolve conflict
mlopatka pushed a commit that referenced this issue Jul 13, 2020
…cles dataset (#162)

* Removed an imported package not used in the code

* removed files not meant to be on the master branch

* Combined _model() functions into a single function

* Update files to adhere to black formatting

* Updating files to pass black formatting

* Presenting the results of evaluation in place of hard-coding

* Deleted a module not in use
dzeber pushed a commit that referenced this issue Jul 13, 2020
@dzeber dzeber changed the title Startup task: Train and test a classification model [Outreachy applications] Startup task: Train and test a classification model Jul 13, 2020
@dzeber dzeber closed this as completed Jul 14, 2020
arizzogithub added a commit to arizzogithub/PRESC that referenced this issue Aug 27, 2020
arizzogithub added a commit to arizzogithub/PRESC that referenced this issue Jul 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment