Titan was the flagship supercomputer at the Oak Ridge Leadership Computing Facility (OLCF). It was deployed in late 2012, became the fastest supercomputer in the world and was retired on August 2, 2019.
During its production lifetime, Titan provided more than 26 billion core hours of computing time to scientists. Throughout this period, an extensive operational dataset - called the Resource Utilization Report (RUR) - was collected from the Titan system. This RUR data, on one hand, is extremely coarse-grained: in order to avoid any noticeable disturbance to the production jobs, only a small amount of data could be collected and stored. On the other hand, the RUR data is extremely comprehensive in the sense that it provides a log record for every job submitted to Titan from April 2015 to July 2019. These records provide a unique window into operational resource usage at an extreme scale and over a long term.
Year | Raw Data Size | # of Job Submissios | # of Failed Jobs |
---|---|---|---|
2015 | 1.5 GB | 1,529,972 | 292,699 |
2016 | 4.3 GB | 4,745,305 | 691,094 |
2017 | 2.8 GB | 2,814,838 | 328,187 |
2018 | 2.2 GB | 2,370,860 | 250,773 |
2019 | 1.3 GB | 1,520,971 | 104,173 |
@INPROCEEDINGS{fwang2:2019b,
author={\textbf{F. Wang} and S. Oral and S. Sen and N. Imam},
%author={\textbf{F. Wang} and Sarp Oral and Satyabrata Sen and Neena Imam},
booktitle={2019 IEEE International Conference on Cluster Computing (CLUSTER)},
title={Learning from Five-year Resource-Utilization Data of Titan System},
year={2019},
volume={},
number={},
pages={1-6},
doi={10.1109/CLUSTER.2019.8891001},
ISSN={1552-5244},
month={9},
}