The trace contains a representative subset of the first-party Azure VM workload in one geographical region.
This jupyter notebook directly compares the main characteristics of the this trace and the a complete Azure workload in 2019, showing that they are qualitatively very similar (except for VM deployment sizes).
The main trace characteristics and schema are:
- Dataset size: 235GB
- Compressed dataset size: 156GB
- Number of files: 198 files
- Duration: 30 consecutive days
- Total number of VMs: 2,695,548
- Total number of Azure subscriptions: 6,687
- Timeseries data: 5-minute VM CPU utilization readings, VM information table and subscription table (with main fields encrypted)
- Total VM hours: 104,371,713
- Total number of VM CPU utilization readings: 1,942,780,023
- Total virtual core hours: >380,000,000
- Encrypted subscription id
- Encrypted deployment id
- Timestamp in seconds (starting from 0) when first VM created
- Count VMs created
- Deployment size (we define a “deployment” differently than Azure in our paper)
- Encrypted VM id
- Timestamp VM created
- Timestamp VM deleted
- Max CPU utilization
- Avg CPU utilization
- P95 of Max CPU utilization
- VM category
- VM virtual core count bucket
- VM memory (GBs) bucket
- Timestamp in seconds (every 5 minutes)
- Min CPU utilization during the 5 minutes
- Max CPU utilization during the 5 minutes
- Avg CPU utilization during the 5 minutes
- VM virtual core count bucket definition
- VM memory (GBs) bucket definition
You can download the dataset from Azure Blob Storage using the links available here.