point_utils is a Python package for computing offset points to a 3-D point cloud.
You can install the package directly from the source using pip. Clone the repository and run the installation command
# Clone the repository
git clone https://github.com/RenkeHuang/point_utils.git
cd point_utils
# Install the package
python -m pip install .
If you plan to modify the package, you can install it in editable mode, or use Makefile:
python -m pip install -e .
# or
make install
python -m scripts.main --help
# Alternatively, use the provided script entry point
RUN --help
RUN -config examples/config.yaml
# Run in the root directory where the Dockerfile is located
docker build --no-cache -t point_utils:latest .
Pull the latest image from the dockerhub:
docker pull renkeh/point_utils:0.1.1
Check out the DockerHub repository overview page for more details.
# Run the Container and persist the output with Docker volume (map examples directory inside the container to the examples directory on local host)
docker run --rm --name point_utils_container -v $(PWD)/examples:/app/examples point_utils:0.1.1
# If starting the Container with an interactive shell, manually execute the entry script in the shell
# -i: keep STDIN open, -t: allocate a pseudo-TTY for the shell session.
# /bin/bash: open a bash shell inside the container.
docker run --rm -it --name point_utils_container point_utils:0.1.1 /bin/bash
RUN -config examples/config.yaml
# Override the default CMD:
# The following command generates a plot for the data, and copy back to local host
docker run --rm -v $(PWD)/examples:/app/examples point_utils:0.1.1 python scripts/visualize.py examples/cdd.txt -o examples/fig.png
python -m pip install -r requirements-dev.txt
# Alternatively, use pyproject.toml
python -m pip install .[test]
# Run pytest in the root of repository, with coverage reporting
python -m pytest --cov=point_utils
# Confirm all the test files and functions are found
python -m pytest --collect-only
# Cleanup using Makefile, useful during development
make clean
The primary functionality of this package is implemented in the offsetter module. Its main objective is to augment a three-dimensional point cloud dataset by adding offset points corresponding to a selected subset of existing points.
For example, suppose we select a subset of points labeled "B" (this selection can be performed using data processing techniques such as SQL queries and tagging).
Each "B" point will have an associated offset point, which we will label "C" later.
The "C" points are positioned at a fixed distance
Note: The task described above is analogous to and serves as a simplified abstraction of the challenges encountered in generative machine learning models and chemoinformatics, particularly in exploring chemical space and enumerating molecular libraries. In these fields, generating new data points (e.g., molecular structures) that are meaningful and diverse, while avoiding overcrowding existing data regions, is a common objective. For example, in chemoinformatics, scientists often seek to generate new molecular structures by adding atoms or functional groups to existing molecules. This process must consider spatial configurations to prevent unfavorable interactions, such as atomic clashes when two atoms are positioned within their van der Waals radii, and to ensure that the new structures are chemically valid.
Several numerical methods can be used to determine directions of these offset vectors. While all methods implemented are tecnically functioning out-of-the-box, the "optimality" of the computed offset vectors are impacted by the specific dataset, method-dependent parameters, etc, so further scientific validations are required to check this. Here we give a brief overview of these methods:
Methods focus on local data characteristics, and rely on the immediate surroundings of the target point to determine the direction of the offset vectors.
-
Nearest-Neighbor via K-D Tree: Calculate the average displacement vectors from each "B" point to its nearest neighbors and use the opposite direction of this mean vector as the direction of the offset vector for the "B" point.
-
Surface Normals via Local Surface Fitting: Fit a local surface around each "B" point using techniques such as least squares fitting. Compute the surface normal from this fitted surface and use it as the direction for the offset vector.
-
Density Gradient: Use Kernel Density Estimation (KDE) to model the density of points around each B point. Compute the gradient of the density function to identify the direction of decreasing density, and use this direction for the offset vector.
Methods consider the global structure or properties of the entire dataset.
-
Surface Normals via Convex Hull Method: Construct a Convex Hull for the entire point cloud to determine the global geometric boundaries. Compute the normals of the convex hull to define the direction of the offset vectors.
-
Radial Expansion: Calculate the centroid of the entire point cloud. For each B point, computes the vector pointing from the centroid to the point and uses this direction for the offset vector.
-
Principal Component Analysis (PCA): Perform PCA on the entire point cloud to identify the principal directions of variance. For each B point, the direction corresponding to the smallest eigenvalue (least variance) can be considered as pointing “away” from the densest part of the data.
-
Voronoi Diagram: Construct a 3D Voronoi diagram of the entire point cloud. For each "B" point, identify its Voronoi cell and determine the direction towards its farthest vertex, which likely points away from neighboring points.
- Nearest-Neighbor via K-D Tree, implemented in KDTreeOffsets class
- Convex Hull, implemented in ConvexHullOffsets class
- Radial Expansion, implemented in CentroidOffsets class
Version 0.1.1
- Support two new methods, convex hull and radial expansion for offset vector computations
Version 0.1.0
- Support KDTree for offset vector computations