Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update PCA_Embedder Saving and Loading #2204

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

acreegan
Copy link

The main goal of this update was to be able to save a PCA model to disk, then load it again later without the need to re-run the PCA analysis from the original data. I wanted to do this within a python program importing ShapeWorks as a library. Of the two sets of PCA functionality in the ShapeWorks repository, the PCA_Embedder class from the pure python DataAugmentationUtils module was closest to having these features, and easiest to extend in python, so this update extends that class.

Changes made:

  • Allow __init__ method to accept None as a data_matrix parameter, in which case the PCA analysis is not run.
  • Add load_PCA to allow loading raw PCA attributes from arrays
  • Add from_directory class method, a factory function to create a PCA_Embedder instance from a model saved to disk
  • Tidy the write_PCA function to make it more consistent, give the option for not saving subject scores (in case these are priveledged information) and ensuring it is consistent with from_directory
  • Change the project function to use pre-calculated mean data instead of raw data, so that only the mean data needs to be saved
  • Change percent variability calculation to use greater or equal to allow percent_variability of 1
  • Remove the self.num_dim attribute, instead inferring this from the length of the PCA scores array passed to the project function.
  • Add documentation in code clarifying method parameters
  • Add tests to ensure:
    • PCA functionality is the same as ParticleShapeStatistics and sklearn
    • Loading and saving works as intended
    • Percent variability works as intended
    • (These tests are now passing in github actions)

@akenmorris
Copy link
Contributor

I'm getting this error with the deep_ssm use case (automated test)

2024-03-29T00:18:36.3395229Z 8:   File "/__w/ShapeWorks/ShapeWorks/Examples/Python/RunUseCase.py", line 97, in <module>
2024-03-29T00:18:36.3396061Z 8:     module.Run_Pipeline(args)
2024-03-29T00:18:36.3397041Z 8:   File "/__w/ShapeWorks/ShapeWorks/Examples/Python/deep_ssm.py", line 257, in Run_Pipeline
2024-03-29T00:18:36.3398394Z 8:     embedded_dim = DeepSSMUtils.run_data_augmentation(project, num_samples, num_dim, percent_variability, sampler,
2024-03-29T00:18:36.3399786Z 8:   File "/__w/ShapeWorks/ShapeWorks/Python/DeepSSMUtilsPackage/DeepSSMUtils/run_utils.py", line 289, in run_data_augmentation
2024-03-29T00:18:36.3401098Z 8:     embedded_dim = DataAugmentationUtils.runDataAugmentation(aug_dir, train_image_filenames,
2024-03-29T00:18:36.3402499Z 8:   File "/__w/ShapeWorks/ShapeWorks/Python/DataAugmentationUtilsPackage/DataAugmentationUtils/__init__.py", line 22, in runDataAugmentation
2024-03-29T00:18:36.3404177Z 8:     num_dim = DataAugmentation.point_based_aug(out_dir, img_list, world_point_list, num_samples, num_dim, percent_variability, sampler_type, mixture_num, processes)
2024-03-29T00:18:36.3405906Z 8:   File "/__w/ShapeWorks/ShapeWorks/Python/DataAugmentationUtilsPackage/DataAugmentationUtils/DataAugmentation.py", line 37, in point_based_aug
2024-03-29T00:18:36.3407071Z 8:     num_dim = PointEmbedder.num_dim
2024-03-29T00:18:36.3407902Z 8: AttributeError: 'PCA_Embbeder' object has no attribute 'num_dim'

@akenmorris
Copy link
Contributor

@acreegan , I've fixed those errors, but the new pca embedder test fails on Mac and Windows. I assume due to a precision/rounding difference. I'll take a look at it again when I have a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants