-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Outreachy applications] Startup task: Train and test a classification model #2
Comments
Hey! |
I'll use the generated.csv dataset. |
@KaairaGupta yes! in fact everyone is encouraged to work on this issue first |
Okay. I'll use winequality.csv dataset |
Hello @dzeber! I want to work on defaults.csv. |
Sounds good! To be clear, it's also fine to work on the same dataset as someone else, as there are only 5 datasets currently. The goal is just to collect a diversity of models and approaches across these datasets. |
Hello @dzeber, I'd like to work on eeg.csv. |
Hello @dzeber, I will work with the winequality.csv |
Hello @dzeber, I will be working on the defaults.csv dataset. |
Hello @dzeber, I'll work with the winequality.csv |
Hello @dzeber, I will be working on vehicles.csv. |
Hello @dzeber , I'd like to work on vehicle.csv . |
I would like to work on eeg.csv dataset. |
I have been working on the generated.cvs dataset with SVMs. |
I would like to work on defaults.csv @dzeber |
I would be working on vehicles.csv @dzeber are we required to push a PR for this issue? |
Hi all! I will be working on the winequality.csv dataset. |
I will be working with winequality.csv dataset. |
@dzeber I will be working on the default.csv dataset. |
Hi @dzeber , I will be working on the vehicles.csv dataset. |
Hello @dzeber, How can we submit the task after completion? |
When you are done with this task, please submit a PR following the guidelines listed in the README. |
Hello everyone, I am working on the wine quality dataset. I didn't pick a model yet. |
Hi @dzeber , I shall be working on the vehicles.csv dataset |
Hello @dzeber, I have some question regarding training data-set. I am using Vehicles data-set, I load the dataset. But my question is that should we train it against some rows or against some columns? |
Hi @dzeber , I will be working on wine quality dataset. |
hello, i am beginning my learning and exploration by working on wine quality dataset. |
Is there any one who could till me that how to identify the dependent and independent variable in a data-set? |
these committed changes fixes issue #3 of traversal space of train-test splits using KNN model.in #2 i have used decision tree and further recommended outlier detection algorithm for classification. so in this PR i have used KNN and compared results with previous classfication.this PR uses already defined modules in #2.
i have done the modifications requested in the same PR #26. please review it. i m sorry for the delay because i was not well. |
* Create Readme.md * Create files for exploring issue #2 * Format using black * Remove notebook from master * Increase modularization * create file for issue 6 * remove file added by mistake * Create notebook for issue 6 * Re-upload to the right folder * Delete file from the incorrect folder
* Update .gitignore * Preliminary Analysis * Helper modules (Bar and Hist graph) * Rough KNN algorithm implemented * Delete libraries.py * KNN classifier refactored and polished Returns only variable of intests for use the metrics calculations. * refactored for performance just the required functions imported * draft mlp classifier implemented to be reviewed * ... * Threshold conversion logic implemented Since knn.predict calculates a probability, we implement a logic for binary classification * Prelimary cleaning and knn model classification implemented! * Adjusted plor error with title placement * ... * Files reformated with 'Black' * Logistic Regression classifier * Refactores modules to improve modularity * Implemented Log Reg * Deleted mpl module to focus on knn and log reg * Refactors gotignore to my personal folder * refactored for readability * Implementation to add counts and relative percentages on bars graph * Refactored name #2, Completed Prelimary Analysis and Interpreted Results * Update Issue #2 - Train and test a classification model (PRESC).ipynb * Files reformated with 'Black' * Display Error corrected * Interpreted choice of hyper-parameters * Refactored and Added Modules used for Issue 3 * Prelimanry Analysis - Traversal of the space of train_test splits * Issue#3 complete * Removed Issues #2 and #3 ipynb * Issue #4 - completed Issue #4 - Traversal of the space of cross-validation folds * Delete defaults_data.csv Removing duplication of the existing data set which can be loaded from the repos root directory. Co-authored-by: mlopatka <[email protected]>
#2 Dropped quality, shifted the logic to python file, shifted imports to the top, added confusion_matrix and classification_report
…el (Stochastic Gradient Descent) on winequality.csv (#58) * adds incomplete files * adds .ipynb, .py and updates environment.yml * Delete winequality.ipynb removing duplicate files * Delete winequality_modules.py removing duplicate files * Delete winequality.ipynb removing incomplete files * Delete winequality_modules.py removing incomplete files * adds .ipynb, .py and updates environment.yml * adds description and deatiled reasoning for the methods, models and parameters used * drops quality column * updates .py file * adds files in a new folder * updates .yml
* WIP: #2 on the dataset 'eeg.csv' WIP: #2 on the dataset 'eeg.csv' * Add files via upload * Delete WIP: #2 on the dataset 'eeg.csv' * Delete #2 Train and test a classification model, eeg.csv-checkpoint.ipynb * WIP: #2 on the dataset 'eeg.csv' * Delete #2 Train and test a classification model, eeg.csv-checkpoint.ipynb * WIP: #2 Train and test a classification model, eeg.csv dataset * Delete #2 Train and test a classification model, eeg.csv.ipynb * Create README * WIP: #2 Train and test a classification model, eeg.csv dataset * Delete README
For #2: on the dataset 'winequality.csv'
Exploration of the Vehicles dataset based on Startup task #2
* Create Readme.md * Create files for exploring issue #2 * Format using black * Remove notebook from master * Increase modularization * create file for issue 6 * remove file added by mistake * WIP: Importance score for datapoints * Re-upload to the correct directory * Delete file from wrong directory
* Update .gitignore * Preliminary Analysis * Helper modules (Bar and Hist graph) * Rough KNN algorithm implemented * Delete libraries.py * KNN classifier refactored and polished Returns only variable of intests for use the metrics calculations. * refactored for performance just the required functions imported * draft mlp classifier implemented to be reviewed * ... * Threshold conversion logic implemented Since knn.predict calculates a probability, we implement a logic for binary classification * Prelimary cleaning and knn model classification implemented! * Adjusted plor error with title placement * ... * Files reformated with 'Black' * Logistic Regression classifier * Refactores modules to improve modularity * Implemented Log Reg * Deleted mpl module to focus on knn and log reg * Refactors gotignore to my personal folder * refactored for readability * Implementation to add counts and relative percentages on bars graph * Refactored name #2, Completed Prelimary Analysis and Interpreted Results * Update Issue #2 - Train and test a classification model (PRESC).ipynb * Files reformated with 'Black' * Display Error corrected * Interpreted choice of hyper-parameters * Exported clean data to csv * Update Issue #2 - Train and test a classification model.ipynb * Update Issue #2 - Train and test a classification model.ipynb * Create Issue #2 - Train and test a classification model.ipynb * Issue#2 complete Interpretation of hyperparameters and html fil added. * Updates on Issue#2 Net attempting PCA and WoE, to invetigate models performances * Merge conflicts fixed, Update observations
* Create Readme.md * Create files for exploring issue #2 * Format using black * Remove notebook from master * Create file for cross-validation exploration * Resolve conflict and update * Attempt to resolve conflict
…cles dataset (#162) * Removed an imported package not used in the code * removed files not meant to be on the master branch * Combined _model() functions into a single function * Update files to adhere to black formatting * Updating files to pass black formatting * Presenting the results of evaluation in place of hard-coding * Deleted a module not in use
This is a good way to get started with the environment and the problem domain. It will also provide the basis for a test case for future work. At a minimum, you should:
Feel free to include any additional steps you feel are relevant or you are interested in trying out, such as:
When you start work on this task, please post a comment here indicating which dataset and model you will be working with so that other contributors can avoid duplicating your work.
The text was updated successfully, but these errors were encountered: