DIBclust Package

DIBclust is an R package for clustering datasets using the Deterministic Information Bottleneck (DIB) method. This package supports datasets with mixed-type variables (nominal, ordinal, and continuous), as well as datasets that are purely continuous or categorical. The DIB approach preserves the most relevant information while forming concise and interpretable clusters, guided by principles from information theory.

Installation

You can install the latest version of the package directly from GitHub using devtools:

install.packages("devtools")  # Install devtools if not already installed
devtools::install_github("amarkos/DIBclust")  # Install DIBclust from GitHub

Getting Started

Below is a comprehensive example demonstrating how to use the package for clustering mixed-type, continuous, and categorical datasets, and displaying the results.

library(DIBclust)

# Example Mixed-Type Data
data <- data.frame(
  cat_var = factor(sample(letters[1:3], 100, replace = TRUE)),      # Nominal categorical variable
  ord_var = factor(sample(c("low", "medium", "high"), 100, replace = TRUE),
                   levels = c("low", "medium", "high"),
                   ordered = TRUE),                                # Ordinal variable
  cont_var1 = rnorm(100),                                          # Continuous variable 1
  cont_var2 = runif(100)                                           # Continuous variable 2
)

# Perform Mixed-Type Clustering
result_mix <- DIBmix(X = data, ncl = 3, catcols = 1:2, contcols = 3:4)
cat("Mixed-Type Clustering Results:\n")
print(result_mix$Cluster)
print(result_mix$Entropy)
print(result_mix$MutualInfo)

# Example Continuous Data
X_cont <- matrix(rnorm(1000), ncol = 5)  # 200 observations, 5 features

# Perform Continuous Data Clustering
result_cont <- DIBcont(X = X_cont, ncl = 3, s = -1, nstart = 50)
cat("Continuous Clustering Results:\n")
print(result_cont$Cluster)
print(result_cont$Entropy)
print(result_cont$MutualInfo)

# Example Categorical Data
X_cat <- data.frame(
  Var1 = factor(sample(letters[1:3], 200, replace = TRUE)),  # Nominal variable
  Var2 = factor(sample(letters[4:6], 200, replace = TRUE)),  # Nominal variable
  Var3 = factor(sample(c("low", "medium", "high"), 200, replace = TRUE),
                levels = c("low", "medium", "high"), ordered = TRUE)  # Ordinal variable
)

# Perform Categorical Data Clustering
result_cat <- DIBcat(X = X_cat, ncl = 3, lambda = -1, nstart = 50)
cat("Categorical Clustering Results:\n")
print(result_cat$Cluster)
print(result_cat$Entropy)
print(result_cat$MutualInfo)

You may as well find ten classification data sets taken from the UCI Machine Learning repository and the relevant scripts to run these in this GitHub repository. These can be used for reproducing the results presented in the paper.

Contributing

Contributions are welcome! If you encounter issues, have suggestions, or would like to enhance the package, please feel free to submit an issue or a pull request on the GitHub repository.

License

This package is distributed under the GPL-3 License. See the GNU General Public License version 3 for details.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
R		R
man		man
src		src
DESCRIPTION		DESCRIPTION
DIBcat.R		DIBcat.R
DIBcat.Rd		DIBcat.Rd
DIBcont.R		DIBcont.R
DIBcont.Rd		DIBcont.Rd
DIBmix.R		DIBmix.R
DIBmix.Rd		DIBmix.Rd
DIBmix_iterate.R		DIBmix_iterate.R
NAMESPACE		NAMESPACE
README.md		README.md
RcppExports.R		RcppExports.R
RcppExports.cpp		RcppExports.cpp
calc_metrics.R		calc_metrics.R
coord_to_pxy_R.R		coord_to_pxy_R.R
eigengap.R		eigengap.R
entropy.R		entropy.R
entropy_functions.cpp		entropy_functions.cpp
qt_step.R		qt_step.R
qt_x_step.cpp		qt_x_step.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DIBclust Package

Installation

Getting Started

Contributing

License

About

Releases

Packages

Contributors 2

Languages

amarkos/DIBclust

Folders and files

Latest commit

History

Repository files navigation

DIBclust Package

Installation

Getting Started

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages