Skip to content

Latest commit

 

History

History
47 lines (42 loc) · 3.82 KB

overall_flow.md

File metadata and controls

47 lines (42 loc) · 3.82 KB

Overall Flow of Dynamic-Shaped Model Execution

flowchart TD
    A["primitive_inst::execute()"] --> B{"is dynamic?"}
    B --> |No  | H["Execute impl"]
    B --> |Yes | C["runtime in_place_concat"]
    C --> D["update_shape()"]
    D --> E{"shape changed from <br/> previous inference?"}
    E --> |No | H["Execute impl"]
    E --> |Yes| G{"Valid fusion?"}
    G --> |No | I{"Create unfused subgraph"}
    I --> II["Execute subgraph"]
    G --> |Yes | J["update_impl()"]
    J --> JJ{"Impl changed?"}
    JJ --> |No | L["Set arguments"]
    L --> H
    JJ --> |Yes | KK{"preferred weight format <br/> changed?"}
    KK --> |Yes | M["update_weights()"]
    KK --> |No  | O{"Is current memory enough <br/> for the new shape?"}
    M --> O
    O --> |No |P["reallocate output memory"]
    O --> |Yes | L
    P --> L
Loading

Figure 1 presents the basic flow of a primitive execution when its program_node has a dynamic shape. A brief explanation for each steps are as follows, and the more detailed explanation of which is to be found in the implementation details section.

  • update_shape
    • This checks if the input shape of the primitive has changed from the previous inference. If it has changed, it performs shape inference for the primitive.
      • If the byte size of the new output shape is empty, skip the execution of this primitive for the current inference.
      • Note that shape inference for some primitives is performed during the execution of other primitives' inference time optimization stage (e.g., in do_runtime_in_place_concat). In such cases, the update_shape_done_by_other flag is set to true. More detailed descriptions of these optimization stages will be provided in the near future.
  • Unfusion
    • If the primitive has fused operations but the kernel does not support fusion for the current output shape, then it performs unfusion, i.e., creating a subgraph that decomposes the current primitive and the fused primitives.
  • If either the input or output shapes are changed, the following processes are performed:
    • update_impl

      • Checks whether the expected primitive_impl can be obtained from the in-memory cache. If not, it checks whether there is a dynamic_impl (i.e., shape agnostic impl) available for the primitive. If a dynamic_impl is available, then it is selected and used. If not, it builds a new static_impl for the primitive and add it to the in_memory_cache.
      • When dynamic_impl is selected and the primitive is critical (e.g., fully_connected, gemm, convolution, deconvolution), a building task for the static kernel is enqueued to the async compilation context.
    • update_weights

      • If the impl is changed and the expected weight format is changed, the weights data are reordered to the corresponding format.
    • realloc_if_needed

      • If the current output memory is smaller than the required memory for the new shape, then allocate new memory.
  • If any kernel arguments are changed (e.g., memory address, work group size, etc), set_argument() is performed.
  • Finally, the selected impl is executed as a normal processing.