Skip to content
Mick Tarsel edited this page Jul 18, 2019 · 8 revisions

Project Summary

The Container Analysis project gathers information about Docker containers from public repositories. Starting with a Helm index chart file, this Python project will crawl, curate, and output helpful information about Applications which are composed of different containers.

Background Information

Since IBM Cloud Private's (ICP) inception, Applications have been created which are made up of 1 or many containers and there has been a strong need to know which hardware (architectures) a container can run on. It is important to know where a container is hosted, which versions of the containers are available, and what architectures the container can run on.

Project Methodology

Each Application contains Images and each Image contains Tags called containers.

Application's are objects called an App stored in objects/image.py.

Images are objects referred to as image_obj, since each Image is associated with an Application. An Application typically has many Images.

To crawl dockerhub, a Hub object is created to handle the authentication tokens and packet headers in order to authenticate against hub.docker.com. See objects/hub.py for more information.

Execution of get-image-info.py

Once executing python3 get-image-info.py user.yaml, the following will happen:

  1. Helm index chart is downloaded, unless --index is given
  2. Get hub list and creds from user.yaml
  3. Initialize Application object with (name, base_url, keywords) from index.yaml
  4. Setup file output
  5. Get Product name for App from first line of App's README.md
  6. Get the keywords for the App. If ---debug is given, download Chart.yaml
  7. Download values.yaml and save in Applications/{app_name}/values.yaml
  8. Parse repos, images, tags from values.yaml. Image object is initialized.
  9. Crawl dockerhub
  10. Output to csv in archives/ with date.

More info about command line options

If debug mode is specified, a file called generated_input.yaml will be created. This file contains details about each Application and it's respective Images. In addition to this file, Chart.yaml will also be downloaded and saved to the Applications/ for each app. If debug mode is not specified, keywords are taken from the index.yaml and Chart.yaml is not downloaded at all.

The fastest way to run the program is by executing: python3 get-image-info.py user.yaml -k -i index.yaml however, this script will use information that has been previously downloaded (kept) and will not provide accurate information.

Clone this wiki locally