Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handling of missing values when dropping rows with outliers #101

Merged
merged 5 commits into from
Mar 27, 2023

Conversation

lars-reimann
Copy link
Member

@lars-reimann lars-reimann commented Mar 27, 2023

Closes #7.

Summary of Changes

Previously, calling drop_rows_with_outliers on a Table that had at least one missing value in a numerical column cause the resulting table to be completely empty. This PR introduces two changes:

  1. Missing values are never considered outliers.
  2. Missing values are ignored when computing the standard deviation.

@lars-reimann lars-reimann requested a review from a team as a code owner March 27, 2023 16:39
@lars-reimann lars-reimann linked an issue Mar 27, 2023 that may be closed by this pull request
@lars-reimann
Copy link
Member Author

lars-reimann commented Mar 27, 2023

🦙 MegaLinter status: ✅ SUCCESS

Descriptor Linter Files Fixed Errors Elapsed time
✅ PYTHON black 2 0 0 0.9s
✅ PYTHON flake8 2 0 0.58s
✅ PYTHON isort 2 0 0 0.28s
✅ PYTHON mypy 2 0 2.51s
✅ PYTHON pylint 2 0 3.65s
✅ REPOSITORY git_diff yes no 0.03s

See detailed report in MegaLinter reports
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff

MegaLinter is graciously provided by OX Security

@codecov
Copy link

codecov bot commented Mar 27, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@a0c56ad). Learn more about missing BASE report.
Report is 507 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #101   +/-   ##
=======================================
  Coverage        ?   92.04%           
=======================================
  Files           ?       36           
  Lines           ?     1219           
  Branches        ?        0           
=======================================
  Hits            ?     1122           
  Misses          ?       97           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lars-reimann lars-reimann merged commit 0a5e853 into main Mar 27, 2023
@lars-reimann lars-reimann deleted the 7-drop_rows_with_outliers-currently-not-working branch March 27, 2023 16:44
lars-reimann pushed a commit that referenced this pull request Mar 27, 2023
## [0.6.0](v0.5.0...v0.6.0) (2023-03-27)

### Features

* allow calling `correlation_heatmap` with non-numerical columns ([#92](#92)) ([b960214](b960214)), closes [#89](#89)
* function to drop columns with non-numerical values from `Table` ([#96](#96)) ([8f14d65](8f14d65)), closes [#13](#13)
* function to drop columns/rows with missing values ([#97](#97)) ([05d771c](05d771c)), closes [#10](#10)
* remove `list_columns_with_XY` methods from `Table` ([#100](#100)) ([a0c56ad](a0c56ad)), closes [#94](#94)
* rename `keep_columns` to `keep_only_columns` ([#99](#99)) ([de42169](de42169))
* rename `remove_outliers` to `drop_rows_with_outliers` ([#95](#95)) ([7bad2e3](7bad2e3)), closes [#93](#93)
* return new model when calling `fit` ([#91](#91)) ([165c97c](165c97c)), closes [#69](#69)

### Bug Fixes

* handling of missing values when dropping rows with outliers ([#101](#101)) ([0a5e853](0a5e853)), closes [#7](#7)
@lars-reimann
Copy link
Member Author

🎉 This PR is included in version 0.6.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@lars-reimann lars-reimann added the released Included in a release label Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
released Included in a release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

drop_rows_with_outliers() currently not working
1 participant