Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX BorddelineSMOTE-2 use the full dataset to generate new sample #1023

Merged
merged 5 commits into from
Jul 10, 2023

Conversation

glemaitre
Copy link
Member

@glemaitre glemaitre commented Jul 10, 2023

closes #861

Make sure that we use the full dataset to generate new samples in BorderlineSMOTE version 2.

@glemaitre glemaitre marked this pull request as draft July 10, 2023 19:24
@glemaitre glemaitre marked this pull request as ready for review July 10, 2023 20:39
@glemaitre glemaitre merged commit 2859cb0 into scikit-learn-contrib:master Jul 10, 2023

self.nn_k_.fit(X_to_sample_from)
nns = self.nn_k_.kneighbors(X_danger, return_distance=False)[:, 1:]
X_new, y_new = self._make_samples(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation does not fully reflect the description of Borderline smote 2 in the paper. The paper says that to create the samples by interpolation between the template of the minority and a neigbhour of the majority, it multiplies by a factor between 0 and 0.5 (instead of 0-1) to ensure the synthetic data is closer to the minority.

If I understand this code correctly, we are multiplying everything by a factor between 0 and 1. Pls correct me if I am wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nop, indeed. I forgot to look at the next page of the article. I will try to propose a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants