Add ensembling methods for tiling to Anomalib #1131

blaz-r · 2023-06-15T11:03:08Z

blaz-r
Jun 15, 2023

Project abstract

When detecting defects in high-resolution images, we encounter many challenges. One of those is that models don’t work well on such a large scale, and by downsampling, we would lose information. This issue can be solved by using a tiling mechanism, where we split the image into smaller parts and process those. This way we keep all the information and the models can still fit into memory.

Anomalib already has a tiling mechanism, but the problem is that models are trained on all tiles combined, which reduces the advantages of locally-aware models that require fixed position and orientation. For cases like this, an ensemble approach will be developed.
This involves splitting data to sections using an already existing tiling mechanism. Separate model will then be trained for each section. Finally, predictions will be merged in the post-processing stage. This approach will include evaluation and comparison of performance, while also taking efficiency into account, to clearly depict advantages and gain over non-ensemble methods.

The outcome of the project will be above described mechanism that works for all existing model architectures and any new ones that will be added.

Original proposal idea

Purpose of GSOC discussions thread

This project is a part of OpenVINO GSOC. GSOC is all about open source software and promoting of community collaboration on various projects. That is why this discussion thread will be used for active updates on the progress as well as for community to have insight and to provide suggestions.

So if you have any suggestions or questions, feel very welcome to put them bellow :)

blaz-r · 2023-06-15T11:05:38Z

blaz-r
Jun 15, 2023
Author

This section will be regularly updated with plan, ideas and updates of progress.

Approach(es)

Ensemble
Positional encoding for one model (Thanks to Joao P C Bertoldo) - PE also used in CFLOW-AD

Datasets

Since tiling mechanism is intended for large images, special datasets will be needed for proper evaluation, but for testing of basic mechanism already existing datsets, such as MVTec, will be used.

Mechanism testing:

MVTec AD

Proper evaluation:

VisA
AeBAD

Ensemble mechanism plan

In following subsections I present the idea and plan on entire tiling ensemble mechanism, summarizing above stated as well as discussed in weekly meetings. There is also a section describing the way approach is designed.

0 replies

blaz-r · 2023-06-20T20:08:27Z

blaz-r
Jun 20, 2023
Author

Ensemble design

In the diagram below, we can see the flow in case of every tile having a separate model. This will be the initial implementation.

This opens up many thing for discussion, main three being how exactly to implement ensemble to support all wanted functions, how to support different combining mechanisms, and very important, how to make training and execution of multiple models memory efficient.

This section will deal with tiling implementation, other two problems have sections bellow.

Implementation

Diagram bellow shows a high level flow of training and predicting using ensemble of models.

Tiling approach

For our purpose, a wrapper for existing Tiler was created, called EnsembleTiler. It's main purpose is to call existing tiler and then transform tiled images into correct shape used for our purpose.

The tiling of images is then done inside Dataloader.
As we discussed in the comments of this section, there are many approaches and locations where tiling can come in and this seemed like the best option.
It provides following benefits:
We need very little custom code, just custom collate_function class.
We perform tiling on batched data instead of most other approaches that do it for individual image (dataset getitem, transform...)
It also guarantees that we have correct data at every stage (train, val, test).
This means that using the above mentioned class for tiling, that is used inside class serving as collate function, we tile each batch of images, take the right tiles out of the batch, discard the rest, and finally this data is provided through standard mechanisms to the trainer.

Training

Ensemble of models is trained by separate training script. It is very similar to already existing training script, with some existing, as well as upcoming, modifications. It trains a separate model on each tile, then runs prediction. Once everything is predicted the post processing pipeline takes care of post processing, visualizing and metric calculation.
Parameters of ensemble are configured using a config file.

Prediction

To obtain all prediction, trainer.predict() is used, with special object handling storage of predictions:

Storing predictions

Since we are predicting on image data that can become quite large due to the nature of ensemble approach, we need to somehow handle the storage of predictions. Current approach offers 3 ways of storing data:

directly in main memory as-is
resized in main memory
stored to file system
There is possibility to add more ways by extending abstract class EnsemblePredictions.

MemoryEnsemblePredictions stores all the data in a dictionary inside the main memory. It is best in terms of speed but uses up most memory. But for most of the cases with a dataset that is not too big, it’s the best choice.

DownscaledEnsemblePredictions stores all the data in a dictionary inside the main memory, but it is downscaled when stored and then upscaled on fetch. This one is sort of a middle ground between basic and file system based one.

In FileSystemEnsemblePredictions each tile location predictions are saved to the file system. This offers processing of very large datasets, at the price of speed due to loading and saving to disk.

In my experiments with VISA pcb1 1024x1024 split into 16 256x246 tiles, both the basic and rescaled ran out of memory, where main memory and paged memory was filled. File system approach managed to successfully train.

For memory efficiency and speed discussions, check the section “Memory efficiency and speed of ensemble” below.

Joining predictions

Joining of predictions, is done in EnsemblePredictionJoiner. Predictions are of three shapes:

image level labels and scores
anomaly maps and segmented masks
boxes and their scores

Implementation does the following:
Joining of tiles is done for all tiled data, this first combines the tiles in tensor and does untiling with tiler.
Tile box predictions are then stacked into one tensor for each image.
Image level score is produced by averaging tile scores and labels are produced by using rule where one anomalous tile results in anomalous image.

Post processing pipeline

Once all data is predicted, post processing pipelines are executed.
First pipeline is used to obtain statistics, such as min, max and thresholds, that are needed by metrics and normalization.
After this is done, second pipeline does post-processing visualization and metric calculation. Additional steps can be added by extending EnsemblePostProcess abstract class.

SmoothJoins

To reduce the effect of tiling on anomaly maps, we apply smoothing to the tile joins. This is optional, and we can specify the region which will be smoothed (in factor of width of tile [0, 1]) as well as sigma of Gaussian.

Normalization

Normalization can be either tile level, or can be done at the end when predictions are joined. Tile level normalization is done with callback, while normalization at the end is done as part of pipeline. This also requires execution of statistics pipeline that gets min and max from validation data.

Thresholding

Thresholding can be done or each tile separately as well as at the end. It is automatically performed for tile levels, as it is needed for metrics that might be part of training. If we want to threshold at the end (if for example we normalize at the end), it can be done as part of pipeline. This also requires execution of statistics pipeline that calculates image and pixel threshold.

Visualization

Once the data is processed, normalized and thresholded, results are visualized. The result we then get is the following:

Above result is obtained using PaDiM with images resized to 256x256 and then tiled into 128x128 tiles.

Metrics

With tiling, metrics are calculated at the end when all tile predictions are joined and processed This is done inside the EnsembleMetrics class. Update is called for every batch and once all the batches are processed, compute is called. This way the scores are produced on the same format as without ensembling.

8 replies

blaz-r Jun 29, 2023
Author

This is one of the main challenges of the ensemble tiling approach. Both testing and visualization would require first stitching the tiles back together. For the visualizer, this would mean that we can't use the visualizer callback, as it generates the images at the end of the test step, when the images are still presented as tiles. One option would be to implement a separate visualization mechanism which acts at the end of testing when all tile locations have been collected and merged back together, and which uses the Visualizer class under the hood to generate the images.

I agree. As we discussed, for debugging purposes visualization can be enabled for every tile, but in final design the tiles should be stitched back together and visualization performed on those. After I implement and verify that training is working, I'll tackle this visualization.

Testing is a bit more challenging, because the performance metric computation is handled by the base anomaly module. So we can't easily disable or overwrite this. This part will require some careful design.

Yeah, this will require some clever design. In any case computation of metrics should be done on already combined and post processed images.

This sounds a bit hacky and would likely lead to hard-to-follow code. The advantage of course would be that we don't need to change the train.py entrypoint script, and it would make testing and visualization considerably easier. Another alternative to achieve this would be to make use of a custom training strategy, which would allow us to change the logic and flow of the training and evaluation loops used by the PyTorch Lightning trainer. This means we could keep using the same entrypoint script and just change the training strategy by passing an argument to the trainer. However, this is quite advanced Pytorch Lightning functionality, which is currently undergoing some changes on the Lightning side.

For now my suggestion would be to stick to your current approach of using a modified entrypoint script, at least for the proof of concept. This has the advantage of being easy to implement and test. The code in the entrypoint script could easily be ported to a custom training loop later on.

Good, I'll stick to that. Especially now in time of prototyping and benchmarking different approaches this enables easier changes and cleaner flow.

You're right, we currently don't update the input_size parameter based on the tile_size. We had a mechanism for this previously, but we removed this once we started applying tiling (single-model-learns-from-multiple-tile-locations tiling) inside the torch model, meaning the model can compute the effective image size itself. For your implementation it would likely be needed to modify update_input_size_config in anomalib/config/config.py to take the tile size into account when ensemble tiling is enabled.

I marked this a task in parent section and will find a suitable way to handle this. I think a backwards compatible modification is not hard to achieve.

It would simplify some things, but it could also complicate some other aspects. For example, where and how would we store the dataset? It would be useful to see some numbers here, how much memory would we save when using a pre-tiled dataset compared to tiling at runtime? I would think that only very large images would lead to problems here. After all, we immediately discard most of the tile locations after tiling the images in the collate function.

It does open up a lot of additional design decisions and problems. Once the basic mechanism is working, I will try to make a pre-tiled dataset and compare memory usage and speed to see the difference. I do assume that the difference will be minimal.

samet-akcay Jun 29, 2023
Maintainer

This sounds a bit hacky and would likely lead to hard-to-follow code. The advantage of course would be that we don't need to change the train.py entrypoint script, and it would make testing and visualization considerably easier. Another alternative to achieve this would be to make use of a custom training strategy, which would allow us to change the logic and flow of the training and evaluation loops used by the PyTorch Lightning trainer. This means we could keep using the same entrypoint script and just change the training strategy by passing an argument to the trainer. However, this is quite advanced Pytorch Lightning functionality, which is currently undergoing some changes on the Lightning side.

For now my suggestion would be to stick to your current approach of using a modified entrypoint script, at least for the proof of concept. This has the advantage of being easy to implement and test. The code in the entrypoint script could easily be ported to a custom training loop later on.

As @djdameln pointed out, this is part of the code base is already under a major refactor. @ashwinvaidya17 can provide a better insight, but ideally, we could have a custom fabric loop for the ensemble. So something like

anomalib fit --trainer.learning_strategy=ensemble ...

or something similar...

As you guys agreed, @blaz-r you could focus on your implementation for now, which we could refactor later on.

blaz-r Jun 29, 2023
Author

Sounds good 😃

d3tk Jul 10, 2023

@blaz-r Very interesting project! excited for the results. Do you have any updates?

blaz-r Jul 11, 2023
Author

😄There are some results already, but just for demonstration. This section was also now updated with latest information.

blaz-r · 2023-06-20T20:15:17Z

blaz-r
Jun 20, 2023
Author

Joining mechanism

This section is used for discussion on joining mechanism.
Anomalib already has a mechanism for fusion of non-overlapping as well as overlapping tiles.
This only covers untiling of tiled data, but not entire joining of predictions, which besides post processing also needs to join scores, labels and box data.

Ideas about this joining will be discussed here. One of the first that will probably come in useful is an option of smoothing of results in addition to currently supported averaging.

Joining of predictions is done inside a class named EnsemblePredictionJoiner. It takes care of tile joining, box joining and label & score joining.
Current implementation of combining is discussed in the previous section Initial implementation, subsection Joining predictions.

0 replies

blaz-r · 2023-06-20T20:27:26Z

blaz-r
Jun 20, 2023
Author

Memory efficiency and speed of ensemble

This section is used for discussion about memory efficient approach to ensemble of models.

Splitting the image into tiles enables us to process a very high resolution images, that we otherwise wouldn't be able to fit into memory.
This way we can process a smaller batch at the time, but we assume one model for all tiles.

When we use ensemble of models, we no longer have only one model that we can train in smaller batches. In this case there are many possibilities how to handle training and inference, but at this moment we don't know which exactly would be the best in terms of speed and memory efficiency.

On one hand, we need to consider additional time needed for training and that more models require more memory. This also implies that all models potentially couldn't be in the memory at the same time. One option would be to save them to file system and only keep one at the time in memory, but this raises a question of how would this effect the speed of execution.

Any ideas and advice is greatly appreciated in this regard.

3 replies

djdameln Jun 22, 2023
Maintainer

This is a good question. Since the main goal of ensemble tiling is to save memory, it is crucial to get a good understanding of the memory consumption of the different approaches. This would then allow us to come up with a strategy how we can make more effective use of the available memory across devices.

I would advise you to start by running a simple comparison. How does the memory use of 1 model with a large input (full image) compare to the memory use of multiple models with smaller inputs (tile locations)? This would give us a rough idea of how the two approaches relate to each other in terms of memory use.

This would also teach us how much memory that we can save by running multiple smaller models separately, as the ensemble approach has the advantage that the multiple smaller models don't have to be run in parallel. As you already indicated, a possibility would be to train/run the models consecutively and load them from the file system only when needed. Another option would be to temporarily move the non-active models to the CPU memory, so that we only have a single smaller model on the GPU when needed.

@samet-akcay @ashwinvaidya17 any additional ideas here?

blaz-r Jun 25, 2023
Author

I am currently working on a basic implementation that will enable this comparison. When it's working, I'll compare ensemble of 9 Padim models on patches vs 1 model on full image. As you said, this will give us a better idea how to continue so I hope I can get it working as soon as possible.

blaz-r Jun 25, 2023
Author

One thing that will also need consideration is how to effectively handle data. Because if models are trained consecutively, then entire trainset of specific patches needs to be available at the same time. Since tiling is done on entire image, that would mean that we either first tile all the images, save tiles to directory and make loader from that, making speed of training better, but using additional memory. The other option is to tile time images from current loader and discard all other patches besides the current one.

willjoy · 2023-08-23T00:15:24Z

willjoy
Aug 23, 2023

Question about defect-free tiles of a abnormal image?

Abnormal images contain defects at a whole image level. However, when they are split into several tiles, some of the tiles are defect free. For example in your image of the screw, only the left bottom tile contains defects. Do we need to move the defect-free tiles to "normal" image folder?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ensembling methods for tiling to Anomalib #1131

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 11 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Add ensembling methods for tiling to Anomalib #1131

blaz-r Jun 15, 2023

Project abstract

Purpose of GSOC discussions thread

Replies: 5 comments · 11 replies

blaz-r Jun 15, 2023 Author

Approach(es)

Datasets

Ensemble mechanism plan

blaz-r Jun 20, 2023 Author

Ensemble design

Implementation

Tiling approach

Training

Prediction

Storing predictions

Joining predictions

Post processing pipeline

SmoothJoins

Normalization

Thresholding

Visualization

Metrics

blaz-r Jun 29, 2023 Author

samet-akcay Jun 29, 2023 Maintainer

blaz-r Jun 29, 2023 Author

d3tk Jul 10, 2023

blaz-r Jul 11, 2023 Author

blaz-r Jun 20, 2023 Author

Joining mechanism

blaz-r Jun 20, 2023 Author

Memory efficiency and speed of ensemble

djdameln Jun 22, 2023 Maintainer

blaz-r Jun 25, 2023 Author

blaz-r Jun 25, 2023 Author

willjoy Aug 23, 2023

blaz-r
Jun 15, 2023

Replies: 5 comments 11 replies

blaz-r
Jun 15, 2023
Author

blaz-r
Jun 20, 2023
Author

blaz-r Jun 29, 2023
Author

samet-akcay Jun 29, 2023
Maintainer

blaz-r Jun 29, 2023
Author

blaz-r Jul 11, 2023
Author

blaz-r
Jun 20, 2023
Author

blaz-r
Jun 20, 2023
Author

djdameln Jun 22, 2023
Maintainer

blaz-r Jun 25, 2023
Author

blaz-r Jun 25, 2023
Author

willjoy
Aug 23, 2023