AzurePublicDatasetV2

VM Trace

The trace contains a representative subset of the first-party Azure VM workload in one geographical region.
This jupyter notebook directly compares the main characteristics of the this trace and the a complete Azure workload in 2019, showing that they are qualitatively very similar (except for VM deployment sizes).

The main trace characteristics and schema are:

Main characteristics:

Dataset size: 235GB
Compressed dataset size: 156GB
Number of files: 198 files
Duration: 30 consecutive days
Total number of VMs: 2,695,548
Total number of Azure subscriptions: 6,687
Timeseries data: 5-minute VM CPU utilization readings, VM information table and subscription table (with main fields encrypted)
Total VM hours: 104,371,713
Total number of VM CPU utilization readings: 1,942,780,023
Total virtual core hours: >380,000,000

Schema:

Encrypted subscription id
Encrypted deployment id
Timestamp in seconds (starting from 0) when first VM created
Count VMs created
Deployment size (we define a “deployment” differently than Azure in our paper)
Encrypted VM id
Timestamp VM created
Timestamp VM deleted
Max CPU utilization
Avg CPU utilization
P95 of Max CPU utilization
VM category
VM virtual core count bucket
VM memory (GBs) bucket
Timestamp in seconds (every 5 minutes)
Min CPU utilization during the 5 minutes
Max CPU utilization during the 5 minutes
Avg CPU utilization during the 5 minutes
VM virtual core count bucket definition
VM memory (GBs) bucket definition

Downloading instructions

You can download the dataset from Azure Blob Storage using the links available here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AzurePublicDatasetV2.md

AzurePublicDatasetV2.md

AzurePublicDatasetV2

VM Trace

Main characteristics:

Schema:

Downloading instructions

Files

AzurePublicDatasetV2.md

Latest commit

History

AzurePublicDatasetV2.md

File metadata and controls

AzurePublicDatasetV2

VM Trace

Main characteristics:

Schema:

Downloading instructions