Skip to content

Commit

Permalink
Small doc fix (#5375)
Browse files Browse the repository at this point in the history
Authors:
  - Tarang Jain (https://github.com/tarang-jain)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #5375
  • Loading branch information
tarang-jain authored Apr 24, 2023
1 parent 2485650 commit 452f90f
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 15 deletions.
8 changes: 5 additions & 3 deletions cpp/src/hdbscan/detail/soft_clustering.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ void dist_membership_vector(const raft::handle_t& handle,
distance<raft::distance::DistanceType::CosineExpanded, value_t, value_t, value_t, int>(
handle, query + batch_offset * n, exemplars_dense.data(), dist.data(), samples_per_batch, n_exemplars, n, true);
break;
default: ASSERT(false, "Incorrect metric passed!");
default: RAFT_EXPECTS(false, "Incorrect metric passed!");
}

// compute the minimum distances to exemplars of each cluster
Expand Down Expand Up @@ -396,7 +396,8 @@ void all_points_membership_vectors(const raft::handle_t& handle,
size_t n = prediction_data.n_cols;

if (batch_size > m) batch_size = m;
RAFT_EXPECTS(0 < batch_size && batch_size <= m, "Invalid batch_size. batch_size should be > 0 and <= the number of samples in the training data");
RAFT_EXPECTS(0 < batch_size && batch_size <= m,
"Invalid batch_size. batch_size should be > 0 and <= the number of samples in the training data");

auto parents = condensed_tree.get_parents();
auto children = condensed_tree.get_children();
Expand Down Expand Up @@ -522,7 +523,8 @@ void membership_vector(const raft::handle_t& handle,
value_t* lambdas = condensed_tree.get_lambdas();

if (batch_size > n_prediction_points) batch_size = n_prediction_points;
RAFT_EXPECTS(0 < batch_size && batch_size <= n_prediction_points, "Invalid batch_size. batch_size should be > 0 and <= the number of samples in the training data");
RAFT_EXPECTS(0 < batch_size && batch_size <= n_prediction_points,
"Invalid batch_size. batch_size should be > 0 and <= the number of prediction points");

rmm::device_uvector<value_t> dist_membership_vec(n_prediction_points * n_selected_clusters,
stream);
Expand Down
24 changes: 12 additions & 12 deletions python/cuml/cluster/hdbscan/prediction.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -146,12 +146,12 @@ def all_points_membership_vectors(clusterer, batch_size=4096):
had ``prediction_data=True`` set.
batch_size : int, optional, default=min(4096, n_rows)
Lowers memory requirement by computing distance-based membership in
smaller batches of points in the training data. Batch size of 0 uses
all of the training points, batch size of 1000 computes distances for
1000 points at a time. The default batch_size is 4096. If the number
of rows in the original dataset is less than 4096, this defaults to
the number of rows.
Lowers memory requirement by computing distance-based membership
in smaller batches of points in the training data. A batch size
of 1000 computes distance based memberships for 1000 points at a
time. The default batch size is 4096. If the number of rows in
the original dataset is less than 4096, this defaults to the
number of rows.
Returns
-------
Expand Down Expand Up @@ -251,12 +251,12 @@ def membership_vector(clusterer, points_to_predict, batch_size=4096, convert_dty
clusterer was fit.
batch_size : int, optional, default=min(4096, n_points_to_predict)
Lowers memory requirement by computing distance-based membership in
smaller batches of points in the training data. Batch size of 0 uses
all of the training points, batch size of 1000 computes distances for
1000 points at a time. The default batch_size is 4096. If the number
of rows in the original dataset is less than 4096, this defaults to
the number of rows.
Lowers memory requirement by computing distance-based membership
in smaller batches of points in the prediction data. A batch size
of 1000 computes distance based memberships for 1000 points at a
time. The default batch_size is 4096. If the number of rows in
the prediction dataset is less than 4096, this defaults to the
number of rows.
Returns
-------
Expand Down

0 comments on commit 452f90f

Please sign in to comment.