Add ensembling methods for tiling to Anomalib #1131
Replies: 5 comments 11 replies
-
This section will be regularly updated with plan, ideas and updates of progress. Approach(es)
DatasetsSince tiling mechanism is intended for large images, special datasets will be needed for proper evaluation, but for testing of basic mechanism already existing datsets, such as MVTec, will be used. Mechanism testing: Proper evaluation: Ensemble mechanism planIn following subsections I present the idea and plan on entire tiling ensemble mechanism, summarizing above stated as well as discussed in weekly meetings. There is also a section describing the way approach is designed. |
Beta Was this translation helpful? Give feedback.
-
Ensemble designIn the diagram below, we can see the flow in case of every tile having a separate model. This will be the initial implementation. This opens up many thing for discussion, main three being how exactly to implement ensemble to support all wanted functions, how to support different combining mechanisms, and very important, how to make training and execution of multiple models memory efficient. This section will deal with tiling implementation, other two problems have sections bellow. ImplementationDiagram bellow shows a high level flow of training and predicting using ensemble of models. Tiling approachFor our purpose, a wrapper for existing Tiler was created, called EnsembleTiler. It's main purpose is to call existing tiler and then transform tiled images into correct shape used for our purpose. The tiling of images is then done inside Dataloader. TrainingEnsemble of models is trained by separate training script. It is very similar to already existing training script, with some existing, as well as upcoming, modifications. It trains a separate model on each tile, then runs prediction. Once everything is predicted the post processing pipeline takes care of post processing, visualizing and metric calculation. PredictionTo obtain all prediction, Storing predictionsSince we are predicting on image data that can become quite large due to the nature of ensemble approach, we need to somehow handle the storage of predictions. Current approach offers 3 ways of storing data:
MemoryEnsemblePredictions stores all the data in a dictionary inside the main memory. It is best in terms of speed but uses up most memory. But for most of the cases with a dataset that is not too big, it’s the best choice. DownscaledEnsemblePredictions stores all the data in a dictionary inside the main memory, but it is downscaled when stored and then upscaled on fetch. This one is sort of a middle ground between basic and file system based one. In FileSystemEnsemblePredictions each tile location predictions are saved to the file system. This offers processing of very large datasets, at the price of speed due to loading and saving to disk. In my experiments with VISA pcb1 1024x1024 split into 16 256x246 tiles, both the basic and rescaled ran out of memory, where main memory and paged memory was filled. File system approach managed to successfully train. For memory efficiency and speed discussions, check the section “Memory efficiency and speed of ensemble” below. Joining predictionsJoining of predictions, is done in EnsemblePredictionJoiner. Predictions are of three shapes:
Implementation does the following: Post processing pipelineOnce all data is predicted, post processing pipelines are executed. SmoothJoinsTo reduce the effect of tiling on anomaly maps, we apply smoothing to the tile joins. This is optional, and we can specify the region which will be smoothed (in factor of width of tile [0, 1]) as well as sigma of Gaussian. NormalizationNormalization can be either tile level, or can be done at the end when predictions are joined. Tile level normalization is done with callback, while normalization at the end is done as part of pipeline. This also requires execution of statistics pipeline that gets min and max from validation data. ThresholdingThresholding can be done or each tile separately as well as at the end. It is automatically performed for tile levels, as it is needed for metrics that might be part of training. If we want to threshold at the end (if for example we normalize at the end), it can be done as part of pipeline. This also requires execution of statistics pipeline that calculates image and pixel threshold. VisualizationOnce the data is processed, normalized and thresholded, results are visualized. The result we then get is the following: Above result is obtained using PaDiM with images resized to 256x256 and then tiled into 128x128 tiles. MetricsWith tiling, metrics are calculated at the end when all tile predictions are joined and processed This is done inside the EnsembleMetrics class. Update is called for every batch and once all the batches are processed, compute is called. This way the scores are produced on the same format as without ensembling. |
Beta Was this translation helpful? Give feedback.
-
Joining mechanismThis section is used for discussion on joining mechanism. Ideas about this joining will be discussed here. One of the first that will probably come in useful is an option of smoothing of results in addition to currently supported averaging. Joining of predictions is done inside a class named EnsemblePredictionJoiner. It takes care of tile joining, box joining and label & score joining. |
Beta Was this translation helpful? Give feedback.
-
Memory efficiency and speed of ensembleThis section is used for discussion about memory efficient approach to ensemble of models. Splitting the image into tiles enables us to process a very high resolution images, that we otherwise wouldn't be able to fit into memory. When we use ensemble of models, we no longer have only one model that we can train in smaller batches. In this case there are many possibilities how to handle training and inference, but at this moment we don't know which exactly would be the best in terms of speed and memory efficiency. On one hand, we need to consider additional time needed for training and that more models require more memory. This also implies that all models potentially couldn't be in the memory at the same time. One option would be to save them to file system and only keep one at the time in memory, but this raises a question of how would this effect the speed of execution. Any ideas and advice is greatly appreciated in this regard. |
Beta Was this translation helpful? Give feedback.
-
Question about defect-free tiles of a abnormal image? Abnormal images contain defects at a whole image level. However, when they are split into several tiles, some of the tiles are defect free. For example in your image of the screw, only the left bottom tile contains defects. Do we need to move the defect-free tiles to "normal" image folder? |
Beta Was this translation helpful? Give feedback.
-
Project abstract
When detecting defects in high-resolution images, we encounter many challenges. One of those is that models don’t work well on such a large scale, and by downsampling, we would lose information. This issue can be solved by using a tiling mechanism, where we split the image into smaller parts and process those. This way we keep all the information and the models can still fit into memory.
Anomalib already has a tiling mechanism, but the problem is that models are trained on all tiles combined, which reduces the advantages of locally-aware models that require fixed position and orientation. For cases like this, an ensemble approach will be developed.
This involves splitting data to sections using an already existing tiling mechanism. Separate model will then be trained for each section. Finally, predictions will be merged in the post-processing stage. This approach will include evaluation and comparison of performance, while also taking efficiency into account, to clearly depict advantages and gain over non-ensemble methods.
The outcome of the project will be above described mechanism that works for all existing model architectures and any new ones that will be added.
Original proposal idea
Purpose of GSOC discussions thread
This project is a part of OpenVINO GSOC. GSOC is all about open source software and promoting of community collaboration on various projects. That is why this discussion thread will be used for active updates on the progress as well as for community to have insight and to provide suggestions.
So if you have any suggestions or questions, feel very welcome to put them bellow :)
Beta Was this translation helpful? Give feedback.
All reactions