Skip to content

Commit

Permalink
Minor touch
Browse files Browse the repository at this point in the history
  • Loading branch information
EssamWisam committed Mar 4, 2024
1 parent 0b8d6ae commit b6ecc3c
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,9 +114,10 @@ The `Imbalance.jl` documentation indeed satisfies this set of design principles.

A substantial body of literature in the field of machine learning and statistics is devoted to addressing the class imbalance issue. This predicament has often been aptly labeled the "curse of class imbalance," as noted in [@Picek:2018] and [@Kubt:1997] which follows from the pervasive nature of the issue across diverse real-world applications and its pronounced severity; a classifier may incur an extraordinarily large performance penalty in response to training on imbalanced data.

The literature encompasses a myriad of oversampling and undersampling techniques to approach the class imbalance issue. These include SMOTE [@Chawla:2002] which operates by generating synthetic examples along the lines joining existing points, SMOTE-N and SMOTE-NC [@Chawla:2002] which are variants of SMOTE that can deal with categorical data. The sheer number of SMOTE variants makes them a body of literature on their own. Notably, the most widely cited variant of SMOTE is BorderlineSMOTE [@Han:2005]. Other well-established oversampling techniques include RWO [@Zhang:2014] and ROSE [@Menardi:2012] which operate by estimating probability densities and sampling them to generate synthetic points. On the other hand, the literature also encompasses many undersampling techniques. Cluster undersampling [@Lin:2016] and condensed nearest neighbors [@Hart:1968] are two prominent examples which attempt to reduce the number of points while preserving the structure or classification of the data. Furthermore, methods that combine oversampling and undersampling [@Zeng:2016] or resampling with ensemble learning [@Liu:2009] are also present.
The literature encompasses a myriad of oversampling and undersampling techniques to approach the class imbalance issue. These include SMOTE [@Chawla:2002] which operates by generating synthetic examples along the lines joining existing points, SMOTE-N and SMOTE-NC [@Chawla:2002] which are variants of SMOTE that can deal with categorical data. The sheer number of SMOTE variants makes them a body of literature on their own. Notably, the most widely cited variant of SMOTE is BorderlineSMOTE [@Han:2005]. Other well-established oversampling techniques include RWO [@Zhang:2014] and ROSE [@Menardi:2012] which operate by estimating probability densities and sampling from them to generate synthetic points. On the other hand, the literature also encompasses many undersampling techniques. Cluster undersampling [@Lin:2016] and condensed nearest neighbors [@Hart:1968] are two prominent examples which attempt to reduce the number of points while preserving the structure or classification of the data. Furthermore, methods that combine oversampling and undersampling [@Zeng:2016] or resampling with ensemble learning [@Liu:2009] are also present.

The existence of a toolbox with techniques that harness this wealth of research is imperative for the development of novel approaches to the class imbalance problem and for machine learning research in general. Aside from addressing class imbalance in a general machine learning research setting, the toolbox can help in class imbalance research settings by making it possible to juxtapose different methods, compose them together, or form variants of them without having to reimplement them anew. In prevalent programming languages, such as Python, a variety of such toolboxes already exist, such as imbalanced-learn [@Lematre:2016] and SMOTE-variants [@Kovács:2019]. Meanwhile, Julia, a well known programming language with over 40M downloads [@DataCamp:2023], has been lacking a similar toolbox to address the class imbalance issue in general multi-class and heterogeneous data settings. This has served as the primary motivation for the creation of the `Imbalance.jl` toolbox.

The existence of a toolbox with techniques that harness this wealth of research is imperative to the development of novel approaches to the class imbalance problem and for machine learning research broadly. Aside from addressing class imbalance in a general machine learning research setting, the toolbox can help in class imbalance research settings by making it possible to juxtapose different methods, compose them together, or form variants of them without having to reimplement them from scratch. In prevalent programming languages, such as Python, a variety of such toolboxes already exist, such as imbalanced-learn [@Lematre:2016] and SMOTE-variants [@Kovács:2019]. Meanwhile, Julia, a well known programming language with over 40M downloads [@DataCamp:2023], has been lacking a similar toolbox to address the class imbalance issue in general multi-class and heterogeneous data settings. This has served as the primary motivation for the creation of the `Imbalance.jl` toolbox.


## Author Contributions
Expand Down

0 comments on commit b6ecc3c

Please sign in to comment.