-
Notifications
You must be signed in to change notification settings - Fork 20
Prefetcher
The objective of the Prefetcher is to promote and demote content which is expected to be used in the near future or frequently. The prefetcher only applies to data which is already staged within Hermes. In order to activate prefetching, a Prefetcher Trait can be attached to a Tag (or Bucket) to indicate that prefetching should be enabled and which kind of prefetching should be applied.
To enable prefetching with configuration.
To enable prefetching, attach the Prefetcher trait to a tag (or Bucket). In this example, we attach a DeterministicPrefetcherTrait to the SimulationBucket, which represents the data for the simulation workload.
TagId bkt_id = HERMES->GetBucketId("SimulationBucket")
TraitId trait_id = HERMES->GetTraitId("DeterministicPrefetcherTrait");
HERMES->AttachTrait(trait_id, bkt_id);
In order to support prefetching, we implement a tracing system within Hermes. The tracer is called for every Put and Get operation within Hermes. It stores the information called for the Put or Get internally within a multiple-producer single-consumer (MPSC) shared-memory queue, which is asynchronously digested by the prefetcher.
The tracer collects the following information:
- Operation (Put or Get)
- Blob Id
- Bucket Id
- Blob Size
- Timestamp (from program start)
- Rank (if MPI)
In the binary file, we store the following information:
- Operation (Put or Get)
- Blob Name (64-bit Hash)
- Bucket Name (64-bit Hash)
- Blob Size (64-bit)
- Timestamp (from program start)
- Rank (if MPI)
Note, we store hashes of Blob Name and Bucket Name to reduce the space complexity of the binary log file. There is no need to know the full file name, and the hashes will likely be unique enough. Note, we use the Blob Name and Bucket Name in the binary log output file instead of Blob Id and Bucket Id. This is because the IDs can change between application runs, whereas names will be consistent.
Currently, we implement the deterministic prefetcher. Many applications exhibit completely deterministic I/O patterns. Deep Learning applications for example will have the same I/O pattern when the randomness seed is fixed and all other paramters remain the same. Many HPC workloads are executed repeatedly with the same parameters for reasons such as reproducability. This prefetcher assumes that the user will supply an I/O trace log.
- Live Prefetcher: Use some sort of short-term memory models to ensure that data to a bucket