Skip to content

Frequently Asked Questions

Tasneem edited this page Apr 22, 2022 · 14 revisions

We collect feedbacks and sorted them into a FAQ section here:

How to Install PyTorch?


The general guides for installing Pytorch can be summarized as follows:

  1. Check your NVIDIA GPU Compute Capability @ https://developer.nvidia.com/cuda-gpus
  2. Download CUDA Toolkit @ https://developer.nvidia.com/cuda-downloads
  3. Install PyTorch command can be found @ https://pytorch.org/get-started/locally/

How to keep track of model ID? What do they mean?


  • For a given project, the database contains the latest model id under the "iteration" column.
  • A model_id of -1 means no AI model has been trained at all.
  • A model_id of 0 represents the initial autoencoder.
  • A model_id of >=1 represents the deep learning iteration.
  • The get_latest_model_id function will pull this information from the database.

How to install OpenSlide?


Since Quick Annotator currently does not support WSIs, the users need to divide WSI into smaller image tiles. QA provides a script cli\extract_tiles_from_wsi_openslide.py, which imports OpenSlide.

However, we received feedbacks that many Windows users had difficulty in OpenSlide. Therefore, we provide a detailed tutorial for installing and importing OpenSlide in Windows.

  1. Find and install OpenSlide Python with proper python version @ https://pypi.org/project/openslide-python/
  2. Find and install OpenSlide Window Binaries @ https://openslide.org/download/
  3. Add openslide\bin to system environment path. (Many of our testers forget to add OpenSlide to path, so they could not import OpenSlide properly.)
    Control Panel -> System Properties -> Advanced -> Environment Variables 
    -> System variables -> *Path* 

What kind of efficiency growth should a user expect when using the quick annotator?


The efficiency improvements of QA differs from the histologic structures that use employs. In a recent paper, we presented an efficiency plot for pancreatic cell nuclei, colon tubules, breast cancer epithelium, which corresponds to small, medium, and large scales tissues.


Efficiency metric over time demonstrating the improvement in speed afforded by QA in annotating (A) nuclei, (B) tubules, and (C) epithelium. The x-axis is the human annotation time in minutes, and the y-axis is the annotation speed in terms of annotated histologic structures per minute. The trend of performance improvements varies per use case, with (a) the nuclei showing a consistent improvement in time, (b) the tubule performance plateauing after annotating a few structures, and (c) the epithelium requiring several additional iterations before reaching its plateau. These plateaus indicate the DL model is sufficiently trained to produce suggestions agreeable to the user.

When should the user retrain the autoencoder?


The autoencoder(model 0) considers as the starting point of the training process, which affects the overall trained performance. In the Image List Page, QA has a function (Re)train Model 0. The user would like to update the autoencoder when needed. For example, the user employs QA to finish annotations to 30 image slides and upload another 30 image tiles. The user could update the autoencoder if he/she believes the newly uploaded images are dissimilar enough.

It could be used for troubleshooting by retraining the autoencoder. The autoencoder is designed to be the optimized starting point for deep learning training, but it is still possible (even unlikely) that the autoencoder is not well generated. For example, when working on a verified dataset, the user uploads many well-delineated annotations, but the prediction suggestion is worse than the expectation. This is probably because autoencoder is not well generated, affecting the training process.

How to do import existed annotations?


QA currently does not support importing existed annotations from the user interface. However, QA does provide a python script that imports existed annotation into QA's project. When running this script, the existed annotations will be imported to a QA's server that is currently running with a specified server address. Therefore, some advanced users need to specify the --server parameter if they are running multiple Quick Annotator applications simultaneously.

Open cli folder and use import_annotations_cli.py to import existed annotations to QA's Project. Here is a basic usage tutorial.

E:\Study\Research\QA\GithubQA\QuickAnnotator\cli>python import_annotations_cli.py --help
usage: rest_workflow_example_cli.py [-h] [-s SERVER] -n PROJNAME [-p PATCHSIZE]
                                 [-r STRIDE] [-t TRAINPERCENT] [-b]
                                 [input_pattern [input_pattern ...]]

Import existing images with annotations into quick annotator. Note: Masks are
expected to be in a subdirectory called "masks"

positional arguments:
  input_pattern         Input filename pattern (try: *.png)

optional arguments:
  -h, --help            show this help message and exit
  -s SERVER, --server SERVER
                        host with port, default http://localhost:5555
  -n PROJNAME, --projname PROJNAME
                        project to create/add to
  -p PATCHSIZE, --patchsize PATCHSIZE
                        Patchsize, default 256
  -r STRIDE, --stride STRIDE
                        stride between ROIs, default 256
  -t TRAINPERCENT, --trainpercent TRAINPERCENT
                        Percet of ROIs to use for training, default .8
  -b, --bgremove        Don't create ROI if patch is mostly white

Note: The import_annotations_cli.py imports existed annotations and extracts them into smaller patches (PATCHSIZE) ready for training DL models. Thus these patches will be assigned into training or testing sets according to a percentage, TRAINPERCENT.

There is another rest_workflow_example_cli.py script in the cli folder that is useful if someone wants to e.g., quickly set up a project or test QA. It is similar to the import_annotations_cli.py script, however including more workflow e.g., train_AE, train_DL, etc.

What should the user check when {server_ip}:{port_number} keeps giving HTTPStatus.BAD_REQUEST error message?


Recently, some users reported that they could not connect to QA's server by {server_ip}:{port_number} (e.g., localhost:5555) and the terminal keeps giving HTTPStatus.BAD_REQUEST error messages. If these error messages happen to you, please make sure you are using HTTP instead of HTTPS protocol. Nowadays, many browsers and browser plugins default to HTTPS. However, the flask server can only understand HTTP and not HTTPS since the latter would require setting up certificates and keys, which are out of the scope of basic servers. See UserManual uses HTTP instead of HTTPS.

What annotations are included when the user presses Retrain DL to train a classier?


When the user presses Retrain DL, all annotations in the project are always used. The user could adjust the training results by setting different Hyper-Parameter (e.g., edgeweight).

How to install Docker Desktop?


The general guides for installing Docker Desktop can be summarized as follows:

  1. Download Docker Desktop @ https://hub.docker.com/editions/community/docker-ce-desktop-windows/
  2. Sign up for Docker Hub @ https://hub.docker.com/
  3. When installing Docker on a Windows machine, the user should check
   a. Hyper-V installed and working
   b. Virtualization enabled in the BIOS
   c. Hypervisor enabled at Windows startup

See details for the troubleshoot for the virtualization on a Windows machine here. See a video tutorial for installation here.

How to manage Docker Containers?


This part serves as a supplementary material for the Docker Tutorial in Readme.md file. User should notice the difference between docker run and docker start:

  1. Run: create a new container of an image, and execute the container. You can create N clones of the same image. The command is: docker run IMAGE_ID and not docker run CONTAINER_ID
  2. Start: Launch a container previously stopped. For example, if you had stopped a database with the command docker stop CONTAINER_ID, you can relaunch the same container with the command docker start CONTAINER_ID, and the data and settings will be the same.

We received messages that some user 'loss' their data after re-run the QA image. This is because they are actually creating another container (copy) of the QA image and assigned different CONTAINER_ID.

User could check all images available by docker images and list all running containers by docker ps. The user could look up an exited container ID by docker ps -a.

How to remove dangling Docker Images?


Most users will user QA's docker image on a server, so it is important to clean unused images because insufficient quota is one of the most common erros.

Docker images consist of multiple layers. Dangling images are layers that have no relationship to any tagged images. They no longer serve a purpose and consume disk space. They can be located by adding the filter flag, -f with a value of dangling=true to the docker images command. When you’re sure you want to delete them, you can use the docker image prune command:

User can list the images by

    docker images -f dangling=true

User can remove the images by

    docker image prune

How to manage Singularity Cache?


To make downloading images for build and pull faster and less redundant, Singularity uses a caching strategy. By default, Singularity will create a set of folders in your $HOME directory for docker layers, Cloud library images, and metadata.

    $HOME/.singularity/cache/library
    $HOME/.singularity/cache/oci
    $HOME/.singularity/cache/oci-tmp

One of the most common errors happening during singularity building is Disk Quota Exceeded. The user should first check cache with

    singularity cache list

It’s worth noting that by running the following command: (with no flags)

    singularity cache clean

By default will just clean the blob cache, but if you do:

    singularity cache clean --all

It will clean all the cache.

How to use GDC Data Transfer Tool?


TCGA is a very useful dataset for studying digital pathology. In order to download WSI slides from TCGA data portal, we recommend users to use GDC Data Transfer Tool. The download link could be found here. Here is the basic usage tutorial:

PS C:\Users\Rexy> gdc-client download -h
usage: gdc-client.exe download [-h] [--debug] [--log-file LOG_FILE]
                               [--color_off] [-t TOKEN_FILE] [-d DIR]
                               [-s server] [--no-segment-md5sums]
                               [--no-file-md5sum] [-n N_PROCESSES]
                               [--http-chunk-size HTTP_CHUNK_SIZE]
                               [--save-interval SAVE_INTERVAL] [-k]
                               [--no-related-files] [--no-annotations]
                               [--no-auto-retry] [--retry-amount RETRY_AMOUNT]
                               [--wait-time WAIT_TIME] [--latest]
                               [--config FILE] [-m MANIFEST]
                               [file_id [file_id ...]]

positional arguments:
  file_id               The GDC UUID of the file(s) to download

optional arguments:
  -h, --help            show this help message and exit
  --debug               Enable debug logging. If a failure occurs, the program
                        will stop.
  --log-file LOG_FILE   Save logs to file. Amount logged affected by --debug
  --color_off           Disable colored output
  -t TOKEN_FILE, --token-file TOKEN_FILE
                        GDC API auth token file
  -d DIR, --dir DIR     Directory to download files to. Defaults to current
                        directory
  -s server, --server server
                        The TCP server address server[:port]
  --no-segment-md5sums  Do not calculate inbound segment md5sums and/or do not
                        verify md5sums on restart
  --no-file-md5sum      Do not verify file md5sum after download
  -n N_PROCESSES, --n-processes N_PROCESSES
                        Number of client connections.
  --http-chunk-size HTTP_CHUNK_SIZE, -c HTTP_CHUNK_SIZE
                        Size in bytes of standard HTTP block size.
  --save-interval SAVE_INTERVAL
                        The number of chunks after which to flush state file.
                        A lower save interval will result in more frequent
                        printout but lower performance.
  -k, --no-verify       Perform insecure SSL connection and transfer
  --no-related-files    Do not download related files.
  --no-annotations      Do not download annotations.
  --no-auto-retry       Ask before retrying to download a file
  --retry-amount RETRY_AMOUNT
                        Number of times to retry a download
  --wait-time WAIT_TIME
                        Amount of seconds to wait before retrying
  --latest              Download latest version of a file if it exists
  --config FILE         Path to INI-type config file
  -m MANIFEST, --manifest MANIFEST
                        GDC download manifest file

Note: The gdc-client folder should be added to the PATH for Environment Variable. Then the user could download the WSi slides from command line. There are two approaches in general:

  • The user could initiate the download using the GDC Data Transfer Tool by supplying the -m or --manifest option with a manifest file:
    gdc-client download -m  /Users/JohnDoe/Downloads/gdc_manifest_6746fe840d924cf623b4634b5ec6c630bd4c06b5.txt
  • The user could use GDC Data Transfer Tool also to download one or more individual files using UUID(s) instead of a manifest file:
    gdc-client download e5976406-473a-4fbd-8c97-e95187cdc1bd fb3e261b-92ac-4027-b4d9-eb971a92a4c3

Usually the user wants to set the -d DIR option when downloading to set the the downloading directory.

Here are some test UUID for user to try.

42839ef5-37cc-468b-94a2-93e69a974f1d
544845fb-680d-45e5-996e-72f6b784dc66
e390e2b2-06de-452c-b8d3-431d4ce6b7f8
c2dc25aa-1fa8-40a3-bf16-e9b12ae942c3
d86dfeff-4bac-4a06-8966-96b4139f3860

The user could download TCGA-COAD WSI slides @ https://portal.gdc.cancer.gov/projects/TCGA-COAD. Note: we recommend users to start with diagnostics slides for better training purposes.

Is QA limited to H&E’s or can any it be used for other stain types? E.g. DAB or fluorescent.


Although our experiments focused on H&E, given the deep learning-based backend, QA is fundamentally agnostic to stain type and can thus be used with any type of image, including non-digital pathology images.

That said, there are a few places which can likely be more optimally configured for non-H&E images. In particular, one should examine the real-time training augmentations of both the auto-encoder and the transfer-learning components, available here respectively:

https://github.com/choosehappy/QuickAnnotator/blob/16b1a8faae818f250c65974ea510438d40e63019/train_ae.py#L165

https://github.com/choosehappy/QuickAnnotator/blob/16b1a8faae818f250c65974ea510438d40e63019/train_model.py#L200

When transitioning between different stain types, appropriate levels of HueSaturationValue should be selected such that the augmented images appear sufficiently “reasonable” to be a good example of an image from the cohort. I.e., Augmented images should 'look similar' to the desired target images. Intuitively, this implies that if your cohort doesn't have the color green present, then training-time augmentation should not be geared to create green exemplars for training.

Similarly, if employing QA on fluorenes images, tuning of the brightness, contrast, and gamma parameters are also likely needed, under the same guise that augmented training images should reasonably appear similar to that of test images.

Can I import the existing deep learning model (U-net) without training/re-training the baseline model?.


Instructions:

  1. Change the QA_db.py
iteration = db.Column(db.Integer,default=1)

Set the default iteration value from -1 to 1

  1. Open a command terminal and from the QA directory, start QA
python QA.py
  1. Create a project and import images
  2. Put the pre-trained deep learning model in the model folder within the created project Example:
QuickAnnotator/projects/{project_name}/models/1/

Note : The pre_trained model should be renamed as best_model.pth

  1. Get the prediction and revise the prediction result
Note : if you want to retrain the pre-trained model with the revised result, you need to adjust the U-net parameters of the QA according to your network setting

Quick Annotator Wiki

QA's Wiki is complete documentation that explains to user how to use this tool and the reasons behind. Here is the catalogue for QA's wiki page:

Home:

  1. Quick Annotator Pages
  1. User Guide
  1. Frequently Asked Questions
Clone this wiki locally