diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index ae7ea28..0617216 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.6.7","generation_timestamp":"2024-09-28T23:25:27","documenter_version":"1.7.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.6.7","generation_timestamp":"2024-09-28T23:27:05","documenter_version":"1.7.0"}} \ No newline at end of file diff --git a/dev/about/index.html b/dev/about/index.html index fba4ea6..786593a 100644 --- a/dev/about/index.html +++ b/dev/about/index.html @@ -1,2 +1,2 @@ -About · Imbalance.jl

Credits

This package was created by Essam Wisam as a Google Summer of Code project, under the mentorship of Anthony Blaom. Special thanks also go to Rik Huijzer for his friendliness and the binary SMOTE implementation in Resample.jl.

+About · Imbalance.jl

Credits

This package was created by Essam Wisam as a Google Summer of Code project, under the mentorship of Anthony Blaom. Special thanks also go to Rik Huijzer for his friendliness and the binary SMOTE implementation in Resample.jl.

diff --git a/dev/algorithms/extra_algorithms/index.html b/dev/algorithms/extra_algorithms/index.html index c883150..176d663 100644 --- a/dev/algorithms/extra_algorithms/index.html +++ b/dev/algorithms/extra_algorithms/index.html @@ -72,4 +72,4 @@ julia> Imbalance.checkbalance(y; ref="minority") 1: ▇▇▇▇▇▇▇▇▇▇▇▇▇ 10034 (100.0%) -0: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 39966 (398.3%) source +0: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 39966 (398.3%) source diff --git a/dev/algorithms/implementation_notes/index.html b/dev/algorithms/implementation_notes/index.html index 20a360c..31add8e 100644 --- a/dev/algorithms/implementation_notes/index.html +++ b/dev/algorithms/implementation_notes/index.html @@ -1,2 +1,2 @@ -Implementation Notes · Imbalance.jl

Generalizing to Multiclass

Papers often propose the resampling algorithm for the case of binary classification only. In many cases, the algorithm only expects a set of points to resample and has nothing to do with the existence of a majority class (e.g., estimates the distribution of points then generates new samples from it) so it can be generalized by simply applying the algorithm on each class. In other cases, there is an interaction with the majority class (e.g., a point is borderline in BorderlineSMOTE1 if the majority but not all its neighbors are from the majority class). In this case, a one-vs-rest scheme is used as proposed in [1]. For instance, a point is now borderline if most but not all its neighbors are from a different class.

Generalizing to Real Ratios

Papers often proposes the resampling algorithm using integer ratios. For instance, a ratio of 2 would mean to double the amount of data in a class and a ratio of $2.2$ is not allowed or will be rounded. In Imbalance.jl any appropriate real ratio can be used and the ratio is relative to the size of the majority or minority class depending on whether the algorithm is oversampling or undersampling. The generalization occurs by randomly choosing points instead of looping on each point. That is, if a $2.2$ ratio corresponds to $227$ examples then $227$ examples are chosen randomly by replacement then applying resampling logic to each. Given an integer ratio $k$, this falls back to be on average equivalent to looping on the points $k$ times.

[1] Fernández, A., López, V., Galar, M., Del Jesus, M. J., and Herrera, F. (2013). Analysing the classifi- cation of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowledge-Based Systems, 42:97–110.

+Implementation Notes · Imbalance.jl

Generalizing to Multiclass

Papers often propose the resampling algorithm for the case of binary classification only. In many cases, the algorithm only expects a set of points to resample and has nothing to do with the existence of a majority class (e.g., estimates the distribution of points then generates new samples from it) so it can be generalized by simply applying the algorithm on each class. In other cases, there is an interaction with the majority class (e.g., a point is borderline in BorderlineSMOTE1 if the majority but not all its neighbors are from the majority class). In this case, a one-vs-rest scheme is used as proposed in [1]. For instance, a point is now borderline if most but not all its neighbors are from a different class.

Generalizing to Real Ratios

Papers often proposes the resampling algorithm using integer ratios. For instance, a ratio of 2 would mean to double the amount of data in a class and a ratio of $2.2$ is not allowed or will be rounded. In Imbalance.jl any appropriate real ratio can be used and the ratio is relative to the size of the majority or minority class depending on whether the algorithm is oversampling or undersampling. The generalization occurs by randomly choosing points instead of looping on each point. That is, if a $2.2$ ratio corresponds to $227$ examples then $227$ examples are chosen randomly by replacement then applying resampling logic to each. Given an integer ratio $k$, this falls back to be on average equivalent to looping on the points $k$ times.

[1] Fernández, A., López, V., Galar, M., Del Jesus, M. J., and Herrera, F. (2013). Analysing the classifi- cation of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowledge-Based Systems, 42:97–110.

diff --git a/dev/algorithms/mlj_balancing/index.html b/dev/algorithms/mlj_balancing/index.html index e1e7a78..0f0841e 100644 --- a/dev/algorithms/mlj_balancing/index.html +++ b/dev/algorithms/mlj_balancing/index.html @@ -20,4 +20,4 @@ logistic_model = LogisticClassifier() bagging_model = BalancedBaggingClassifier(model=logistic_model, T=10, rng=Random.Xoshiro(42))

Now you can fit, predict, cross-validate and finetune it like any other probabilistic MLJ model where X must be a table input (e.g., a dataframe).

mach = machine(bagging_model, X, y)
 fit!(mach)
-pred = predict(mach, X)
+pred = predict(mach, X) diff --git a/dev/algorithms/oversampling_algorithms/index.html b/dev/algorithms/oversampling_algorithms/index.html index cc3ec86..c49f6b8 100644 --- a/dev/algorithms/oversampling_algorithms/index.html +++ b/dev/algorithms/oversampling_algorithms/index.html @@ -373,4 +373,4 @@ oversampler = SMOTENC(y_ind; k=5, ratios=Dict(1=>1.0, 2=> 0.9, 3=>0.9), rng=42) Xyover = Xy |> oversampler # equivalently if TableTransforms is used -Xyover, cache = TableTransforms.apply(oversampler, Xy) # equivalently

Illustration

A full basic example along with an animation can be found here. You may find more practical examples in the tutorial section which also explains running code on Google Colab.

References

[1] N. V. Chawla, K. W. Bowyer, L. O.Hall, W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of artificial intelligence research, 321-357, 2002.

source +Xyover, cache = TableTransforms.apply(oversampler, Xy) # equivalently

Illustration

A full basic example along with an animation can be found here. You may find more practical examples in the tutorial section which also explains running code on Google Colab.

References

[1] N. V. Chawla, K. W. Bowyer, L. O.Hall, W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of artificial intelligence research, 321-357, 2002.

source diff --git a/dev/algorithms/undersampling_algorithms/index.html b/dev/algorithms/undersampling_algorithms/index.html index b174dee..2ebbc24 100644 --- a/dev/algorithms/undersampling_algorithms/index.html +++ b/dev/algorithms/undersampling_algorithms/index.html @@ -185,4 +185,4 @@ # Initiate TomekUndersampler model undersampler = TomekUndersampler(y_ind; min_ratios=1.0, rng=42) Xy_under = Xy |> undersampler -Xy_under, cache = TableTransforms.apply(undersampler, Xy) # equivalently

The reapply(undersampler, Xy, cache) method from TableTransforms simply falls back to apply(undersample, Xy) and the revert(undersampler, Xy, cache) is not supported.

Illustration

A full basic example along with an animation can be found here. You may find more practical examples in the tutorial section which also explains running code on Google Colab.

References

[1] Ivan Tomek. Two modifications of cnn. IEEE Trans. Systems, Man and Cybernetics, 6:769–772, 1976.

source +Xy_under, cache = TableTransforms.apply(undersampler, Xy) # equivalently

The reapply(undersampler, Xy, cache) method from TableTransforms simply falls back to apply(undersample, Xy) and the revert(undersampler, Xy, cache) is not supported.

Illustration

A full basic example along with an animation can be found here. You may find more practical examples in the tutorial section which also explains running code on Google Colab.

References

[1] Ivan Tomek. Two modifications of cnn. IEEE Trans. Systems, Man and Cybernetics, 6:769–772, 1976.

source diff --git a/dev/contributing/index.html b/dev/contributing/index.html index ee3a692..0e72765 100644 --- a/dev/contributing/index.html +++ b/dev/contributing/index.html @@ -12,4 +12,4 @@ └── extras.jl # extra functions like generating data or checking balance

The purpose of each file is further documented therein at the beginning of the file. The files are ordered here in the recommended order of checking.

Any method resampling method implemented in the oversampling_methods or undersampling_methods folder takes the following structure:

├── resample_method          # contains implementation and interfaces for a resampling method
 │   ├── interface_mlj.jl     # implements MLJ interface for the method
 │   ├── interface_tables.jl  # implements Tables.jl interface for the method
-│   └── resample_method.jl   # implements the method itself (pure functional interface)

Contribution

Reporting Problems or Seeking Support

Adding New Resampling Methods

Surely, you can ignore ignore the third step if the algorithm you are implementing does not operate in "per-class" sense.

🔥 Hot algorithms to add

Adding New Tutorials

+│ └── resample_method.jl # implements the method itself (pure functional interface)

Contribution

Reporting Problems or Seeking Support

Adding New Resampling Methods

Surely, you can ignore ignore the third step if the algorithm you are implementing does not operate in "per-class" sense.

🔥 Hot algorithms to add

Adding New Tutorials

diff --git a/dev/examples/Colab/index.html b/dev/examples/Colab/index.html index 494b2b3..57a83d5 100644 --- a/dev/examples/Colab/index.html +++ b/dev/examples/Colab/index.html @@ -9,4 +9,4 @@ rm /tmp/julia.tar.gz fi julia -e 'using Pkg; pkg"add IJulia; precompile;"' -echo 'Done'

Sincere thanks to Julia-on-Colab for making this possible.

+echo 'Done'

Sincere thanks to Julia-on-Colab for making this possible.

diff --git a/dev/examples/cerebral_ensemble/cerebral_ensemble/index.html b/dev/examples/cerebral_ensemble/cerebral_ensemble/index.html index b72bb65..9563939 100644 --- a/dev/examples/cerebral_ensemble/cerebral_ensemble/index.html +++ b/dev/examples/cerebral_ensemble/cerebral_ensemble/index.html @@ -194,4 +194,4 @@ │ BalancedAccuracy( │ predict_mode │ 0.772 │ 0.0146 │ [0.738, 0.769, ⋯ │ adjusted = false) │ │ │ │ ⋯ └─────────────────────┴──────────────┴─────────────┴─────────┴────────────────── - 1 column omitted

Under the normality of scores, the 95% confidence interval is 77.2±1.4% for the balanced accuracy.

+ 1 column omitted

Under the normality of scores, the 95% confidence interval is 77.2±1.4% for the balanced accuracy.

diff --git a/dev/examples/effect_of_k_enn/effect_of_k_enn/index.html b/dev/examples/effect_of_k_enn/effect_of_k_enn/index.html index db668c3..77cea2a 100644 --- a/dev/examples/effect_of_k_enn/effect_of_k_enn/index.html +++ b/dev/examples/effect_of_k_enn/effect_of_k_enn/index.html @@ -272,4 +272,4 @@ end plot!(dpi = 150) end -
gif(anim, "./assets/enn-k-animation.gif", fps=1)

enn-gif-hyperparameter

As we can see, the most constraining condition is all. It deletes any point where the label is different than any of the nearest k neighbors which also explains why it's the most sensitive to the hyperparameter k.

+
gif(anim, "./assets/enn-k-animation.gif", fps=1)

enn-gif-hyperparameter

As we can see, the most constraining condition is all. It deletes any point where the label is different than any of the nearest k neighbors which also explains why it's the most sensitive to the hyperparameter k.

diff --git a/dev/examples/effect_of_ratios/effect_of_ratios/index.html b/dev/examples/effect_of_ratios/effect_of_ratios/index.html index 279333f..c2baccf 100644 --- a/dev/examples/effect_of_ratios/effect_of_ratios/index.html +++ b/dev/examples/effect_of_ratios/effect_of_ratios/index.html @@ -213,4 +213,4 @@ plot!(dpi = 150) end
gif(anim, "./assets/smote-animation.gif", fps=6)
-println()

Ratios Parameter Effect

Notice how setting ratios greedily can lead to overfitting.

+println()

Ratios Parameter Effect

Notice how setting ratios greedily can lead to overfitting.

diff --git a/dev/examples/effect_of_s/effect_of_s/index.html b/dev/examples/effect_of_s/effect_of_s/index.html index 446efbd..22013e2 100644 --- a/dev/examples/effect_of_s/effect_of_s/index.html +++ b/dev/examples/effect_of_s/effect_of_s/index.html @@ -203,4 +203,4 @@ plot!(dpi = 150) end
gif(anim, "./assets/rose-animation.gif", fps=6)
-println()

ROSE Effect of S

As we can see, the larger s is the more spread out are the oversampled points. This is expected because what ROSE does is oversample by sampling from the distribution that corresponds to placing Gaussians on the existing points and s is a hyperparameter proportional to the bandwidth of the Gaussians. When s=0 the only points that can be generated lie on top of others; i.e., ROSE becomes equivalent to random oversampling

The decision boundary is mainly unstable because we used a small number of epochs with the perceptron to generate this animation. It still took plenty of time.

+println()

ROSE Effect of S

As we can see, the larger s is the more spread out are the oversampled points. This is expected because what ROSE does is oversample by sampling from the distribution that corresponds to placing Gaussians on the existing points and s is a hyperparameter proportional to the bandwidth of the Gaussians. When s=0 the only points that can be generated lie on top of others; i.e., ROSE becomes equivalent to random oversampling

The decision boundary is mainly unstable because we used a small number of epochs with the perceptron to generate this animation. It still took plenty of time.

diff --git a/dev/examples/fraud_detection/fraud_detection/index.html b/dev/examples/fraud_detection/fraud_detection/index.html index f74fb4e..c968650 100644 --- a/dev/examples/fraud_detection/fraud_detection/index.html +++ b/dev/examples/fraud_detection/fraud_detection/index.html @@ -144,4 +144,4 @@ │ BalancedAccuracy( │ predict_mode │ 0.908 │ 0.00932 │ [0.903, 0.898, ⋯ │ adjusted = false) │ │ │ │ ⋯ └─────────────────────┴──────────────┴─────────────┴─────────┴────────────────── - 1 column omitted

Assuming normal scores, the 95% confidence interval was 90.8±0.9 and after resampling it has become 93±0.7 which corresponds to a small improvement in accuracy.

+ 1 column omitted

Assuming normal scores, the 95% confidence interval was 90.8±0.9 and after resampling it has become 93±0.7 which corresponds to a small improvement in accuracy.

diff --git a/dev/examples/index.html b/dev/examples/index.html index ed84a45..7c062d3 100644 --- a/dev/examples/index.html +++ b/dev/examples/index.html @@ -88,4 +88,4 @@ - + diff --git a/dev/examples/smote_churn_dataset/smote_churn_dataset/index.html b/dev/examples/smote_churn_dataset/smote_churn_dataset/index.html index 232921c..786be34 100644 --- a/dev/examples/smote_churn_dataset/smote_churn_dataset/index.html +++ b/dev/examples/smote_churn_dataset/smote_churn_dataset/index.html @@ -163,4 +163,4 @@ │ BalancedAccuracy( │ predict_mode │ 0.552 │ 0.0145 │ [0.549, 0.563, ⋯ │ adjusted = false) │ │ │ │ ⋯ └─────────────────────┴──────────────┴─────────────┴─────────┴────────────────── - 1 column omitted

The improvement is about 5.2% after cross-validation. If we are further to assume scores to be normally distributed, then the 95% confidence interval is 5.2±1.45% improvement. Let's see if this gets any better when we rather use SMOTE-NC in a later example.

+ 1 column omitted

The improvement is about 5.2% after cross-validation. If we are further to assume scores to be normally distributed, then the 95% confidence interval is 5.2±1.45% improvement. Let's see if this gets any better when we rather use SMOTE-NC in a later example.

diff --git a/dev/examples/smoten_mushroom/smoten_mushroom/index.html b/dev/examples/smoten_mushroom/smoten_mushroom/index.html index c80c9e2..a6ce44e 100644 --- a/dev/examples/smoten_mushroom/smoten_mushroom/index.html +++ b/dev/examples/smoten_mushroom/smoten_mushroom/index.html @@ -253,4 +253,4 @@ │ BalancedAccuracy( │ predict │ 0.4 │ 0.00483 │ [0.398, 0.405, 0.3 ⋯ │ adjusted = false) │ │ │ │ ⋯ └─────────────────────┴───────────┴─────────────┴─────────┴───────────────────── - 1 column omitted

Fair enough. After oversampling the interval under the same assumptions is 40±0.5%; this agrees with our earlier observations using simple point estimates; oversampling here approximately delivers a 18% improvement in balanced accuracy.

+ 1 column omitted

Fair enough. After oversampling the interval under the same assumptions is 40±0.5%; this agrees with our earlier observations using simple point estimates; oversampling here approximately delivers a 18% improvement in balanced accuracy.

diff --git a/dev/examples/smotenc_churn_dataset/smotenc_churn_dataset/index.html b/dev/examples/smotenc_churn_dataset/smotenc_churn_dataset/index.html index f578570..70b7295 100644 --- a/dev/examples/smotenc_churn_dataset/smotenc_churn_dataset/index.html +++ b/dev/examples/smotenc_churn_dataset/smotenc_churn_dataset/index.html @@ -190,4 +190,4 @@ │ BalancedAccuracy( │ predict_mode │ 0.677 │ 0.0124 │ [0.678, 0.688, ⋯ │ adjusted = false) │ │ │ │ ⋯ └─────────────────────┴──────────────┴─────────────┴─────────┴────────────────── - 1 column omitted

Fair enough. After oversampling the interval under the same assumptions is 67.7±1.2% which is still a meaningful improvement over 56.5±0.62 that we had prior to oversampling ot the 55.2±1.5% that we had with logistic regression and SMOTE in an earlier example.

+ 1 column omitted

Fair enough. After oversampling the interval under the same assumptions is 67.7±1.2% which is still a meaningful improvement over 56.5±0.62 that we had prior to oversampling ot the 55.2±1.5% that we had with logistic regression and SMOTE in an earlier example.

diff --git a/dev/examples/walkthrough/index.html b/dev/examples/walkthrough/index.html index 119b2fe..c413d10 100644 --- a/dev/examples/walkthrough/index.html +++ b/dev/examples/walkthrough/index.html @@ -239,4 +239,4 @@ │ BalancedAccuracy( │ predict_mode │ 0.7 │ 0.0717 │ [0.7, 0.536, 0. ⋯ │ adjusted = false) │ │ │ │ ⋯ └─────────────────────┴──────────────┴─────────────┴─────────┴────────────────── - 1 column omitted

This results in an interval 70±7.2% which can be viewed as a reasonable improvement over 62.1±9.13%. The uncertainty in the intervals can be explained by the fact that the dataset is small with many classes.

+ 1 column omitted

This results in an interval 70±7.2% which can be viewed as a reasonable improvement over 62.1±9.13%. The uncertainty in the intervals can be explained by the fact that the dataset is small with many classes.

diff --git a/dev/index.html b/dev/index.html index 5aa6403..eccbfd7 100644 --- a/dev/index.html +++ b/dev/index.html @@ -53,4 +53,4 @@ oversampler = SMOTE(y_ind; k=5, ratios=Dict(0=>1.0, 1=> 0.9, 2=>0.8), rng=42) Xyover = Xy |> oversampler # can chain with other table transforms # equivalently if TableTransforms is used -Xyover, cache = TableTransforms.apply(oversampler, Xy) # equivalently

The reapply(oversampler, Xy, cache) method from TableTransforms simply falls back to apply(oversample, Xy) and the revert(oversampler, Xy, cache) reverts the transform by removing the oversampled observations from the table.

Notice that because the interfaces of MLJ and TableTransforms use the same model names, you will have to specify the source of the model if both are used in the same file (e.g., Imbalance.TableTransforms.SMOTE) for the example above.

Features

Rationale

Most if not all machine learning algorithms can be viewed as a form of empirical risk minimization where the object is to find the parameters $\theta$ that for some loss function $L$ minimize

\[\hat{\theta} = \arg\min_{\theta} \frac{1}{N} \sum_{i=1}^{N} L(f_{\theta}(x_i), y_i)\]

The underlying assumption is that minimizing this empirical risk corresponds to approximately minimizing the true risk which considers all examples in the populations which would imply that $f_\theta$ is approximately the true target function $f$ that we seek to model.

In a multi-class setting with $K$ classes, one can write

\[\hat{\theta} = \arg\min_{\theta} \left( \frac{1}{N_1} \sum_{i \in C_1} L(f_{\theta}(x_i), y_i) + \frac{1}{N_2} \sum_{i \in C_2} L(f_{\theta}(x_i), y_i) + \ldots + \frac{1}{N_K} \sum_{i \in C_K} L(f_{\theta}(x_i), y_i) \right)\]

Class imbalance occurs when some classes have much fewer examples than other classes. In this case, the terms corresponding to smaller classes contribute minimally to the sum which makes it possible for any learning algorithm to find an approximate solution to minimizing the empirical risk that mostly only minimizes the over the significant sums. This yields a hypothesis $f_\theta$ that may be very different from the true target $f$ with respect to the minority classes which may be the most important for the application in question.

One obvious possible remedy is to weight the smaller sums so that a learning algorithm more easily avoids approximate solutions that exploit their insignificance which can be seen to be equivalent to repeating examples of the observations in minority classes. This can be achieved by naive random oversampling which is offered by this package along with other more advanced oversampling methods that function by generating synthetic data or deleting existing ones. You can read more about the class imbalance problem and learn about various algorithms implemented in this package by reading this series of articles on Medium.

To our knowledge, there are no existing maintained Julia packages that implement resampling algorithms for multi-class classification problems or that handle both nominal and continuous features. This has served as a primary motivation for the creation of this package.

+Xyover, cache = TableTransforms.apply(oversampler, Xy) # equivalently

The reapply(oversampler, Xy, cache) method from TableTransforms simply falls back to apply(oversample, Xy) and the revert(oversampler, Xy, cache) reverts the transform by removing the oversampled observations from the table.

Notice that because the interfaces of MLJ and TableTransforms use the same model names, you will have to specify the source of the model if both are used in the same file (e.g., Imbalance.TableTransforms.SMOTE) for the example above.

Features

Rationale

Most if not all machine learning algorithms can be viewed as a form of empirical risk minimization where the object is to find the parameters $\theta$ that for some loss function $L$ minimize

\[\hat{\theta} = \arg\min_{\theta} \frac{1}{N} \sum_{i=1}^{N} L(f_{\theta}(x_i), y_i)\]

The underlying assumption is that minimizing this empirical risk corresponds to approximately minimizing the true risk which considers all examples in the populations which would imply that $f_\theta$ is approximately the true target function $f$ that we seek to model.

In a multi-class setting with $K$ classes, one can write

\[\hat{\theta} = \arg\min_{\theta} \left( \frac{1}{N_1} \sum_{i \in C_1} L(f_{\theta}(x_i), y_i) + \frac{1}{N_2} \sum_{i \in C_2} L(f_{\theta}(x_i), y_i) + \ldots + \frac{1}{N_K} \sum_{i \in C_K} L(f_{\theta}(x_i), y_i) \right)\]

Class imbalance occurs when some classes have much fewer examples than other classes. In this case, the terms corresponding to smaller classes contribute minimally to the sum which makes it possible for any learning algorithm to find an approximate solution to minimizing the empirical risk that mostly only minimizes the over the significant sums. This yields a hypothesis $f_\theta$ that may be very different from the true target $f$ with respect to the minority classes which may be the most important for the application in question.

One obvious possible remedy is to weight the smaller sums so that a learning algorithm more easily avoids approximate solutions that exploit their insignificance which can be seen to be equivalent to repeating examples of the observations in minority classes. This can be achieved by naive random oversampling which is offered by this package along with other more advanced oversampling methods that function by generating synthetic data or deleting existing ones. You can read more about the class imbalance problem and learn about various algorithms implemented in this package by reading this series of articles on Medium.

To our knowledge, there are no existing maintained Julia packages that implement resampling algorithms for multi-class classification problems or that handle both nominal and continuous features. This has served as a primary motivation for the creation of this package.