- LADI Dataset Documentation
Data in LADI is stored in AWS S3 storage. To access or download images, metadata, and labels in LADI, you can optionally choose to work with Amazon AWS Console or not.
To use Amazon S3, you need an AWS account. If you do not have one yet, please visit the Amazon Web Services Homepage and follow the tutorial on Create and Activate an AWS Account to create an AWS account.
After you have created and activated your account, you can choose to download LADI from AWS S3 to your local machine using AWS Command Line Interface or transfer LADI dataset into your own S3 bucket.
-
Go to AWS Command Line Interface User Guide to install AWS CLI on your system. You have the options to install AWS CLI on Linux, MacOS, Windows and Virtual Environment.
-
Verify that AWS CLI is installed correctly.
$aws --version aws-cli/1.17.12 Python/3.7.3 Darwin/19.3.0 botocore/1.14.12
-
Create a new administrator IAM User. Go to Creating Your First IAM Admin User and Group to create a new IAM User. After you have created a new IAM User, please go to Navigation Pane for Users to verify.
-
Configure the AWS CLI.
$aws configure AWS Access Key ID [None]: AWS Secret Access Key [None]: Default region name [None]: Default output format [None]:
Please fill in each field step by step with your own values. The
AWS Access Key ID
andAWS Secret Access Key
can be found by going to Navigation Pane for Users, choosing the IAM User that you just created, e.g. Administrator, clicking on the Security credentials tab and tapping the Create access key button.
For more information and details, please visit Configuring the AWS CLI. -
Run the following command to download the LADI dataset. The
--recursive
flag specifies that subdirectories should be copied.$aws s3 cp s3://ladi/path/to/remote path/to/local --recursive
path/to/remote
should be replaced with the path of the data within the LADI S3 bucket andpath/to/local
represents the local path where the files will be written. -
Go to the local path specified in the previous step and verify the requested files from LADI has been downloaded.
-
Please follow Step 1 to Step 4 in the "Download LADI to Local Machine with AWS Command Line Interface" section to install and configure AWS CLI.
-
Sign in to the AWS Management Console and open the Amazon S3 console.
-
Follow the tutorial on Creating an S3 Bucket to create a bucket.
-
Go to the Amazon S3 console to verify that your new bucket has been created.
-
Run the following command to copy data from the original LADI S3 bucket to your own bucket.
$aws s3 cp s3://ladi/path/to/remote s3://yourbucketname/yourpath --recursive
path/to/remote
should be replaced with the path of the data within the LADI S3 bucket,yourbucketname
should be replaced with the name of your new bucket andyourpath
represents the bucket path where the files will be written. -
Go to the Amazon S3 console to verify that the requested files from LADI have been transferred.
Without Amazon AWS account and services, users can also access and download files in LADI by going to http://ladi.s3-us-west-2.amazonaws.com/index.html using a web browser. However, due to efficiency of downloading the data, we highly recommend to use AWS CLI.
Users can load files and data from LADI in Python 3 by using AWS Python SDK Boto 3 if the dataset is stored in AWS S3 bucket. LADI can also be load using other packages: Pandas, NumPy and PyTorch. Using the following command to install these packages:
$pip install boto3
For more information for installing Boto3, please go to Boto 3 Quickstart.
$pip install pandas
For more information for installing Pandas with Anaconda and Miniconda, please go to Pandas Installation.
$pip install numpy
For more installation options, please visit SciPy Installation.
$pip install torch torchvision
For more details on installing PyTorch, please visit PyTorch Start Locally.
Note: If you are a Mac user, replace pip with pip3 in the commands above to install those packages. If you are installing those packages via Anaconda, please refer to the links provided above for more information about commands to use for installation.
If you transferred LADI into your own AWS S3 bucket and prefer to not store the files to your local machine, Boto 3, the AWS Python SDK, can help you access and read files in S3 bucket. Example: Using Boto3 and Pandas to read a .csv file from LADI stored in S3 bucket.
-
To access the AWS S3 bucket import package Boto 3.
-
Initiate "clients" tool in Boto3. “Clients” are low-level functional interfaces which are useful to load single files.
import pandas as pd import boto3 #replace 'bucket name' to your s3 bucket name bucket_name = 'bucket_name' #replace 'ladi_images_metadata.csv' with the path of the file that you want to read file_1_path = 'ladi_images_metadata.csv' client = boto3.client('s3') obj_1 = client.get_object(Bucket = bucket_name, Key = file_1_path) image_metadata = pd.read_csv(obj_1['Body']) #If you are loading tsv file #ladi_response = pd.read_csv(obj_1['Body'],sep = '\t' )
obj
contains metadata of the file and theBody
of the object contains actual data in aStreamingBody
format. If we display first 10 rows in theimage_metadata
, we can get a table as following:
If you downloaded LADI to your local machine, you are able to read the files using Pandas and other packages without the assistance of Boto 3.
Example: Read and display an image from the LADI dataset on a local machine.
There are multiple effective ways in Python to read and display an image file. In this example, we use Python Imaging Library (PIL) and matplotlib to read and show an image.
from PIL import Image
import matplotlib.pyplot as plt
image_path = 'Images/FEMA_CAP/1012/20118/VIRB0002_fa5065eb-773a-4b41-8f2c-80a734f3770d.jpg'
im = Image.open(image_path)
plt.imshow(im, cmap='Greys_r')