-
Notifications
You must be signed in to change notification settings - Fork 27
Frequently Asked Questions
We collect feedbacks and sorted them into a FAQ section here:
The general guides for installing Pytorch can be summarized as follows:
- Check your NVIDIA GPU Compute Capability @ https://developer.nvidia.com/cuda-gpus
- Download CUDA Toolkit @ https://developer.nvidia.com/cuda-downloads
- Install PyTorch command can be found @ https://pytorch.org/get-started/locally/
- For a given project, the database contains the latest model id under the "iteration" column.
- A model_id of -1 means no AI model has been trained at all.
- A model_id of 0 represents the initial autoencoder.
- A model_id of >=1 represents the deep learning iteration.
- The get_latest_model_id function will pull this information from the database.
Since Quick Annotator currently does not support WSIs, the users need to divide WSI into smaller image tiles. QA provides a script cli\extract_tiles_from_wsi_openslide.py, which imports OpenSlide.
However, we received feedbacks that many Windows users had difficulty in OpenSlide. Therefore, we provide a detailed tutorial for installing and importing OpenSlide in Windows.
- Find and install OpenSlide Python with proper python version @ https://pypi.org/project/openslide-python/
- Find and install OpenSlide Window Binaries @ https://openslide.org/download/
- Add openslide\bin to system environment path. (Many of our testers forget to add OpenSlide to path, so they could not import OpenSlide properly.)
Control Panel -> System Properties -> Advanced -> Environment Variables
-> System variables -> *Path*
The efficiency improvements of QA differs from the histologic structures that use employs. In a recent paper, we presented an efficiency plot for pancreatic cell nuclei, colon tubules, breast cancer epithelium, which corresponds to small, medium, and large scales tissues.
Efficiency metric over time demonstrating the improvement in speed afforded by QA in annotating (A) nuclei, (B) tubules, and (C) epithelium. The x-axis is the human annotation time in minutes, and the y-axis is the annotation speed in terms of annotated histologic structures per minute. The trend of performance improvements varies per use case, with (a) the nuclei showing a consistent improvement in time, (b) the tubule performance plateauing after annotating a few structures, and (c) the epithelium requiring several additional iterations before reaching its plateau. These plateaus indicate the DL model is sufficiently trained to produce suggestions agreeable to the user.
The autoencoder(model 0) considers as the starting point of the training process, which affects the overall trained performance. In the Image List Page, QA has a function (Re)train Model 0. The user would like to update the autoencoder when needed. For example, the user employs QA to finish annotations to 30 image slides and upload another 30 image tiles. The user could update the autoencoder if he/she believes the newly uploaded images are dissimilar enough.
It could be used for troubleshooting by retraining the autoencoder. The autoencoder is designed to be the optimized starting point for deep learning training, but it is still possible (even unlikely) that the autoencoder is not well generated. For example, when working on a verified dataset, the user uploads many well-delineated annotations, but the prediction suggestion is worse than the expectation. This is probably because autoencoder is not well generated, affecting the training process.
QA currently does not support importing existed annotations from the user interface. However, QA does provide a python script that imports existed annotation into QA's project. When running this script, the existed annotations will be imported to a QA's server that is currently running with a specified server address. Therefore, some advanced users need to specify the --server parameter if they are running multiple Quick Annotator applications simultaneously.
Open cli folder and use import_annotations_cli.py to import existed annotations to QA's Project. Here is a basic usage tutorial.
E:\Study\Research\QA\GithubQA\QuickAnnotator\cli>python import_annotations_cli.py --help
usage: rest_workflow_example_cli.py [-h] [-s SERVER] -n PROJNAME [-p PATCHSIZE]
[-r STRIDE] [-t TRAINPERCENT] [-b]
[input_pattern [input_pattern ...]]
Import existing images with annotations into quick annotator. Note: Masks are
expected to be in a subdirectory called "masks"
positional arguments:
input_pattern Input filename pattern (try: *.png)
optional arguments:
-h, --help show this help message and exit
-s SERVER, --server SERVER
host with port, default http://localhost:5555
-n PROJNAME, --projname PROJNAME
project to create/add to
-p PATCHSIZE, --patchsize PATCHSIZE
Patchsize, default 256
-r STRIDE, --stride STRIDE
stride between ROIs, default 256
-t TRAINPERCENT, --trainpercent TRAINPERCENT
Percet of ROIs to use for training, default .8
-b, --bgremove Don't create ROI if patch is mostly white
Note: The import_annotations_cli.py imports existed annotations and extracts them into smaller patches (PATCHSIZE) ready for training DL models. Thus these patches will be assigned into training or testing sets according to a percentage, TRAINPERCENT.
There is another rest_workflow_example_cli.py script in the cli folder that is useful if someone wants to e.g., quickly set up a project or test QA. It is similar to the import_annotations_cli.py script, however including more workflow e.g., train_AE, train_DL, etc.
What should the user check when {server_ip}:{port_number} keeps giving HTTPStatus.BAD_REQUEST error message?
Recently, some users reported that they could not connect to QA's server by {server_ip}:{port_number} (e.g., localhost:5555) and the terminal keeps giving HTTPStatus.BAD_REQUEST error messages. If these error messages happen to you, please make sure you are using HTTP instead of HTTPS protocol. Nowadays, many browsers and browser plugins default to HTTPS. However, the flask server can only understand HTTP and not HTTPS since the latter would require setting up certificates and keys, which are out of the scope of basic servers. See UserManual uses HTTP instead of HTTPS.
When the user presses Retrain DL, all annotations in the project are always used. The user could adjust the training results by setting different Hyper-Parameter (e.g., edgeweight).
The general guides for installing Docker Desktop can be summarized as follows:
- Download Docker Desktop @ https://hub.docker.com/editions/community/docker-ce-desktop-windows/
- Sign up for Docker Hub @ https://hub.docker.com/
- When installing Docker on a Windows machine, the user should check
a. Hyper-V installed and working
b. Virtualization enabled in the BIOS
c. Hypervisor enabled at Windows startup
See details for the troubleshoot for the virtualization on a Windows machine here. See a video tutorial for installation here.
This part serves as a supplementary material for the Docker Tutorial in Readme.md file. User should notice the difference between docker run and docker start:
- Run: create a new container of an image, and execute the container. You can create N clones of the same image. The command is: docker run IMAGE_ID and not docker run CONTAINER_ID
- Start: Launch a container previously stopped. For example, if you had stopped a database with the command docker stop CONTAINER_ID, you can relaunch the same container with the command docker start CONTAINER_ID, and the data and settings will be the same.
We received messages that some user 'loss' their data after re-run the QA image. This is because they are actually creating another container (copy) of the QA image and assigned different CONTAINER_ID.
User could check all images available by docker images
and list all running containers by docker ps
. The user could look up an exited container ID by docker ps -a
.
Most users will user QA's docker image on a server, so it is important to clean unused images because insufficient quota is one of the most common erros.
Docker images consist of multiple layers. Dangling images are layers that have no relationship to any tagged images. They no longer serve a purpose and consume disk space. They can be located by adding the filter flag, -f with a value of dangling=true to the docker images command. When you’re sure you want to delete them, you can use the docker image prune command:
User can list the images by
docker images -f dangling=true
User can remove the images by
docker image prune
To make downloading images for build and pull faster and less redundant, Singularity uses a caching strategy. By default, Singularity will create a set of folders in your $HOME directory for docker layers, Cloud library images, and metadata.
$HOME/.singularity/cache/library
$HOME/.singularity/cache/oci
$HOME/.singularity/cache/oci-tmp
One of the most common errors happening during singularity building is Disk Quota Exceeded. The user should first check cache with
singularity cache list
It’s worth noting that by running the following command: (with no flags)
singularity cache clean
By default will just clean the blob cache, but if you do:
singularity cache clean --all
It will clean all the cache.
TCGA is a very useful dataset for studying digital pathology. In order to download WSI slides from TCGA data portal, we recommend users to use GDC Data Transfer Tool. The download link could be found here. Here is the basic usage tutorial:
PS C:\Users\Rexy> gdc-client download -h
usage: gdc-client.exe download [-h] [--debug] [--log-file LOG_FILE]
[--color_off] [-t TOKEN_FILE] [-d DIR]
[-s server] [--no-segment-md5sums]
[--no-file-md5sum] [-n N_PROCESSES]
[--http-chunk-size HTTP_CHUNK_SIZE]
[--save-interval SAVE_INTERVAL] [-k]
[--no-related-files] [--no-annotations]
[--no-auto-retry] [--retry-amount RETRY_AMOUNT]
[--wait-time WAIT_TIME] [--latest]
[--config FILE] [-m MANIFEST]
[file_id [file_id ...]]
positional arguments:
file_id The GDC UUID of the file(s) to download
optional arguments:
-h, --help show this help message and exit
--debug Enable debug logging. If a failure occurs, the program
will stop.
--log-file LOG_FILE Save logs to file. Amount logged affected by --debug
--color_off Disable colored output
-t TOKEN_FILE, --token-file TOKEN_FILE
GDC API auth token file
-d DIR, --dir DIR Directory to download files to. Defaults to current
directory
-s server, --server server
The TCP server address server[:port]
--no-segment-md5sums Do not calculate inbound segment md5sums and/or do not
verify md5sums on restart
--no-file-md5sum Do not verify file md5sum after download
-n N_PROCESSES, --n-processes N_PROCESSES
Number of client connections.
--http-chunk-size HTTP_CHUNK_SIZE, -c HTTP_CHUNK_SIZE
Size in bytes of standard HTTP block size.
--save-interval SAVE_INTERVAL
The number of chunks after which to flush state file.
A lower save interval will result in more frequent
printout but lower performance.
-k, --no-verify Perform insecure SSL connection and transfer
--no-related-files Do not download related files.
--no-annotations Do not download annotations.
--no-auto-retry Ask before retrying to download a file
--retry-amount RETRY_AMOUNT
Number of times to retry a download
--wait-time WAIT_TIME
Amount of seconds to wait before retrying
--latest Download latest version of a file if it exists
--config FILE Path to INI-type config file
-m MANIFEST, --manifest MANIFEST
GDC download manifest file
Note: The gdc-client folder should be added to the PATH for Environment Variable. Then the user could download the WSi slides from command line. There are two approaches in general:
- The user could initiate the download using the GDC Data Transfer Tool by supplying the -m or --manifest option with a manifest file:
gdc-client download -m /Users/JohnDoe/Downloads/gdc_manifest_6746fe840d924cf623b4634b5ec6c630bd4c06b5.txt
- The user could use GDC Data Transfer Tool also to download one or more individual files using UUID(s) instead of a manifest file:
gdc-client download e5976406-473a-4fbd-8c97-e95187cdc1bd fb3e261b-92ac-4027-b4d9-eb971a92a4c3
Usually the user wants to set the -d DIR option when downloading to set the the downloading directory.
Here are some test UUID for user to try.
42839ef5-37cc-468b-94a2-93e69a974f1d
544845fb-680d-45e5-996e-72f6b784dc66
e390e2b2-06de-452c-b8d3-431d4ce6b7f8
c2dc25aa-1fa8-40a3-bf16-e9b12ae942c3
d86dfeff-4bac-4a06-8966-96b4139f3860
The user could download TCGA-COAD WSI slides @ https://portal.gdc.cancer.gov/projects/TCGA-COAD. Note: we recommend users to start with diagnostics slides for better training purposes.
Although our experiments focused on H&E, given the deep learning-based backend, QA is fundamentally agnostic to stain type and can thus be used with any type of image, including non-digital pathology images.
That said, there are a few places which can likely be more optimally configured for non-H&E images. In particular, one should examine the real-time training augmentations of both the auto-encoder and the transfer-learning components, available here respectively:
When transitioning between different stain types, appropriate levels of HueSaturationValue should be selected such that the augmented images appear sufficiently “reasonable” to be a good example of an image from the cohort. I.e., Augmented images should 'look similar' to the desired target images. Intuitively, this implies that if your cohort doesn't have the color green present, then training-time augmentation should not be geared to create green exemplars for training.
Similarly, if employing QA on fluorenes images, tuning of the brightness, contrast, and gamma parameters are also likely needed, under the same guise that augmented training images should reasonably appear similar to that of test images.
Can I import the existing deep learning model (U-net) without training/re-training the baseline model?.
Instructions:
- Change the QA_db.py
iteration = db.Column(db.Integer,default=1)
Set the default iteration value from -1 to 1
- Open a command terminal and from the QA directory, start QA
python QA.py
- Create a project and import images
- Put the pre-trained deep learning model in the model folder within the created project Example:
QuickAnnotator/projects/{project_name}/models/1/
Note : The pre_trained model should be renamed as best_model.pth
- Get the prediction and revise the prediction result
QA's Wiki is complete documentation that explains to user how to use this tool and the reasons behind. Here is the catalogue for QA's wiki page:
Home: