Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small doc fix #5375

Merged
merged 4 commits into from
Apr 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions cpp/src/hdbscan/detail/soft_clustering.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ void dist_membership_vector(const raft::handle_t& handle,
distance<raft::distance::DistanceType::CosineExpanded, value_t, value_t, value_t, int>(
handle, query + batch_offset * n, exemplars_dense.data(), dist.data(), samples_per_batch, n_exemplars, n, true);
break;
default: ASSERT(false, "Incorrect metric passed!");
default: RAFT_EXPECTS(false, "Incorrect metric passed!");
}

// compute the minimum distances to exemplars of each cluster
Expand Down Expand Up @@ -396,7 +396,8 @@ void all_points_membership_vectors(const raft::handle_t& handle,
size_t n = prediction_data.n_cols;

if (batch_size > m) batch_size = m;
RAFT_EXPECTS(0 < batch_size && batch_size <= m, "Invalid batch_size. batch_size should be > 0 and <= the number of samples in the training data");
RAFT_EXPECTS(0 < batch_size && batch_size <= m,
"Invalid batch_size. batch_size should be > 0 and <= the number of samples in the training data");

auto parents = condensed_tree.get_parents();
auto children = condensed_tree.get_children();
Expand Down Expand Up @@ -522,7 +523,8 @@ void membership_vector(const raft::handle_t& handle,
value_t* lambdas = condensed_tree.get_lambdas();

if (batch_size > n_prediction_points) batch_size = n_prediction_points;
RAFT_EXPECTS(0 < batch_size && batch_size <= n_prediction_points, "Invalid batch_size. batch_size should be > 0 and <= the number of samples in the training data");
RAFT_EXPECTS(0 < batch_size && batch_size <= n_prediction_points,
"Invalid batch_size. batch_size should be > 0 and <= the number of prediction points");

rmm::device_uvector<value_t> dist_membership_vec(n_prediction_points * n_selected_clusters,
stream);
Expand Down
24 changes: 12 additions & 12 deletions python/cuml/cluster/hdbscan/prediction.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -146,12 +146,12 @@ def all_points_membership_vectors(clusterer, batch_size=4096):
had ``prediction_data=True`` set.

batch_size : int, optional, default=min(4096, n_rows)
Lowers memory requirement by computing distance-based membership in
smaller batches of points in the training data. Batch size of 0 uses
all of the training points, batch size of 1000 computes distances for
1000 points at a time. The default batch_size is 4096. If the number
of rows in the original dataset is less than 4096, this defaults to
the number of rows.
Lowers memory requirement by computing distance-based membership
in smaller batches of points in the training data. A batch size
of 1000 computes distance based memberships for 1000 points at a
time. The default batch size is 4096. If the number of rows in
the original dataset is less than 4096, this defaults to the
number of rows.

Returns
-------
Expand Down Expand Up @@ -251,12 +251,12 @@ def membership_vector(clusterer, points_to_predict, batch_size=4096, convert_dty
clusterer was fit.

batch_size : int, optional, default=min(4096, n_points_to_predict)
Lowers memory requirement by computing distance-based membership in
smaller batches of points in the training data. Batch size of 0 uses
all of the training points, batch size of 1000 computes distances for
1000 points at a time. The default batch_size is 4096. If the number
of rows in the original dataset is less than 4096, this defaults to
the number of rows.
Lowers memory requirement by computing distance-based membership
in smaller batches of points in the prediction data. A batch size
of 1000 computes distance based memberships for 1000 points at a
time. The default batch_size is 4096. If the number of rows in
the prediction dataset is less than 4096, this defaults to the
number of rows.

Returns
-------
Expand Down