Skip to content

Latest commit

 

History

History

cxr-foundation

CXR Foundation

CXR Foundation is a tool to generate custom embeddings from chest x-ray (CXR) images. These embeddings can be used to develop custom machine learning models for CXR with less data and compute. You can read more about the research behind CXR Foundation in our recent publication: Simplified Transfer Learning for Chest Radiography Models Using Less Data.

How to use CXR Foundation

  1. Fill out the API access form. Your provided Gmail account will be used for access once approved for non-clinical use.

  2. Once granted you’ll get an email and can use the CXR Foundation API with your own images.

  3. If you want to get started in a no code environment, please run our GUI-based demo. This provides a web interface to:

    • import your own images in DICOM format and view these with windowing options
    • label them
    • Retrieve embeddings
    • Split data into train and eval sets
    • Train a linear probe
    • Evaluate performance on the eval set and pick an operating point
    • Running in Colab this app will let you: We’ve also linked it directly to CXR-14 data, so you can try it out on public data as well.
  4. You also have access to this GitHub repository containing Python source code to:

    1. Convert DICOM images into PNGs suitable for calling CXR Foundation
    2. Call the API to generate embeddings from the PNG
  5. Install the gcloud CLI and log in:

    gcloud auth application-default login
    
  6. Clone the Repository into a local directory

    git clone https://github.com/Google-Health/imaging-research.git
    cd imaging-research/cxr-foundation
    
  7. Install the CXR Foundation package:

    pip install .
    
  8. Run the CXR Foundation code:

    1. Upload your chest x-ray DICOMs or PNGs to a cloud bucket or use a local directory.
    2. Generate and save embeddings.
    3. Read them and use them to train your model.

See the notebooks for examples of how to use the embeddings service and this package.

Have questions? Email [email protected].

Third Party Apps

CXR Foundation is also available on Superbio.ai as an online app. After agreeing to Google’s Terms for the CXR Foundation tool, you can access and utilize the app.

Package APIs - Generating and Using Embeddings

The following code block highlights the pertinent functions. See the notebooks for demo usage.

from cxr_foundation.inference import generate_embeddings
from cxr_foundation.embeddings_data import read_tfrecord_values, read_npz_values, get_dataset


help(generate_embeddings)
help(read_tfrecord_values)
help(read_npz_values)
help(get_dataset)

Note: .npz format embeddings files generated by this package and the Foundation API, CAN be read without this package. If you want to use generated embeddings files in a Python environment, but don't want to install this package and its dependencies in the same environment, just copy the embeddings_data.read_npz_values function into one of your modules, which only requires numpy.

General Notes

  • Google does not keep a copy of any images sent.
  • Google monitors daily query volume and aggregates on a per-user and per-organization basis. Access can be revoked if a user or organization exceeds a reasonable query volume.
  • If you consented to follow-up, Google may reach out for feedback.
  • Please use the following reference for any published work:
    • Sellergren AB, Chen C, Nabulsi Z, Li Y, Maschinot A, Sarna A, Huang J, Lau C, Kalidindi SR, Etemadi M, Garcia-Vicente F, Melnick D, Liu Y, Eswaran K, Tse D, Beladia N, Krishnan D, Shetty S. Simplified Transfer Learning for Chest Radiography Models Using Less Data. Radiology. 2022 Nov;305(2):454-465. doi: 10.1148/radiol.212482. Epub 2022 Jul 19. PMID: 35852426.

Contributing

See CONTRIBUTING.md for details.

License

See LICENSE for details.

Disclaimer

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Model Card for CXR Foundation

Model Details

Overview

This model generates embeddings for chest x-rays (CXRs). Embeddings are n-dimensional vectors of floating points representing a projection of the original image into a compressed feature space capable of describing potential abnormalities that exist in the image. These embeddings are to be used by “downstream models” for final tasks such as detecting a particular abnormality in a CXR. The model uses the EfficientNet-L2 architecture (https://arxiv.org/pdf/1911.04252v4.pdf). It was trained on 821,544 CXRs from India and the US using abnormal vs. normal labels, i.e. the image contained any kind of abnormality, and the Supervised Contrastive loss (https://arxiv.org/abs/2004.11362v1). The abnormal vs. normal labels were obtained from more granular labels (e.g. pneumothorax, fracture) as well as regular expressions on radiology reports (https://pubmed.ncbi.nlm.nih.gov/34471144/).

Version

name: v1.0
date: 2022-07-19

Owners

Andrew Sellergren, [email protected]

Licenses

References

Citations

  • Sellergren A, Chen C, et al. Simplified Transfer Learning for Chest Radiography Models Using Less Data. Radiology. 2022.

Considerations

Use Cases

  • Embeddings can reduce barriers to entry for training custom models with less data, setup, and compute.
  • Embeddings can allow for quick evaluation.

Limitations

  • The model was trained using only data from the US and India and may not generalize well to data from other countries, patient populations, or manufacturers not used in training.
  • The model is only used to generate embeddings of the user-owned dataset. It does not generate any predictions or diagnosis on its own.

Ethical Considerations

  • Risk: Although Google does not store permanently any data sent to this model, it is the data owner's responsibility to ensure that Personally identifiable information (PII) and Protected Health Information (PHI) are removed prior to being sent to the model. \
  • Mitigation Strategy: Do not send data containing PII or PHI.