Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timings report if -ftime-report is enabled #779

Merged
merged 1 commit into from
Feb 29, 2024

Conversation

DeadSpheroid
Copy link
Contributor

@DeadSpheroid DeadSpheroid commented Feb 20, 2024

Fixes #769
Makes use of llvm::TimerGroup and llvm::Timer classes to record timings for function differentiations
SourceFile.cpp

#include "clad/Differentiator/Differentiator.h"
#include <iostream>
double fnr(double c){
  return c*3*c;
}

double fns(double z){
  return 4*z;
}

double fn(double x, double y) {
  return x*x*x*x + 2*y*fnr(y) * 3 * x * fns(x);
}

double fn2(double a, double b) {
  return 3*a*a + b * fns(b);
}

double grad_func(double p, double q){
  return p*q;
}

int main() {
  // differentiate 'fn' w.r.t 'x'.
  auto d_fn_1 = clad::differentiate(fn, "x");
  auto d_fn_2 = clad::differentiate(fn2, "a");
  double dp = -1, dq = -1;
  auto f_grad = clad::gradient(grad_func);
  f_grad.execute(3, 4, &dp, &dq);
  // computes derivative of 'fn' w.r.t 'x' when (x, y) = (3, 4).
  std::cout<<d_fn_1.execute(3, 4)<<"\n";
  std::cout<<d_fn_2.execute(2, 4)<<"\n";
  std::cout<<"dp="<<dp<<" dq="<<dq<<"\n";
  return 0;
}

Output with -ftime-report

❯ clang++ -std=c++11 -I /home/warrenjacinto/Projects/clad/include/ -fplugin=/home/warrenjacinto/Projects/inst/lib/clad.so SourceFile.cpp -lstdc++ -lm -ftime-report
===-------------------------------------------------------------------------===
                             Timers for clad funcs
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0730 seconds (0.0754 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0311 ( 44.6%)   0.0000 (  0.0%)   0.0311 ( 42.7%)   0.0329 ( 43.7%)  fn
   0.0274 ( 39.3%)   0.0000 (  0.0%)   0.0274 ( 37.6%)   0.0280 ( 37.2%)  grad_func
   0.0044 (  6.4%)   0.0032 (100.0%)   0.0077 ( 10.5%)   0.0077 ( 10.1%)  fn2
   0.0045 (  6.5%)   0.0000 (  0.0%)   0.0045 (  6.2%)   0.0045 (  6.0%)  fnr
   0.0023 (  3.2%)   0.0000 (  0.0%)   0.0023 (  3.1%)   0.0023 (  3.0%)  fns
   0.0698 (100.0%)   0.0032 (100.0%)   0.0730 (100.0%)   0.0754 (100.0%)  Total

===-------------------------------------------------------------------------===
                          Pass execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0012 seconds (0.0012 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0007 ( 61.7%)   0.0000 ( 66.7%)   0.0007 ( 61.7%)   0.0007 ( 61.6%)  AnnotationRemarksPass
   0.0004 ( 31.6%)   0.0000 ( 33.3%)   0.0004 ( 31.6%)   0.0004 ( 31.7%)  AlwaysInlinerPass
   0.0001 (  6.7%)   0.0000 (  0.0%)   0.0001 (  6.7%)   0.0001 (  6.7%)  CoroConditionalWrapper
   0.0012 (100.0%)   0.0000 (100.0%)   0.0012 (100.0%)   0.0012 (100.0%)  Total

===-------------------------------------------------------------------------===
                        Analysis execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0003 seconds (0.0003 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0002 ( 82.0%)   0.0000 (100.0%)   0.0002 ( 82.1%)   0.0002 ( 82.1%)  TargetLibraryAnalysis
   0.0000 ( 13.1%)   0.0000 (  0.0%)   0.0000 ( 13.1%)   0.0000 ( 13.1%)  ProfileSummaryAnalysis
   0.0000 (  4.8%)   0.0000 (  0.0%)   0.0000 (  4.8%)   0.0000 (  4.8%)  InnerAnalysisManagerProxy<llvm::AnalysisManager<llvm::Function>, llvm::Module>
   0.0003 (100.0%)   0.0000 (100.0%)   0.0003 (100.0%)   0.0003 (100.0%)  Total

===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.4330 ( 69.6%)   0.0021 ( 14.9%)   0.4351 ( 68.3%)   0.4520 ( 66.9%)  LLVM IR Generation Time
   0.1896 ( 30.4%)   0.0121 ( 85.1%)   0.2016 ( 31.7%)   0.2238 ( 33.1%)  Code Generation Time
   0.6226 (100.0%)   0.0142 (100.0%)   0.6368 (100.0%)   0.6758 (100.0%)  Total

===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0124 seconds (0.0174 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0028 ( 22.5%)   0.0000 ( 16.7%)   0.0028 ( 22.5%)   0.0051 ( 29.3%)  DAG Combining 1
   0.0026 ( 21.4%)   0.0000 ( 41.7%)   0.0027 ( 21.4%)   0.0038 ( 22.1%)  Instruction Selection
   0.0010 (  8.1%)   0.0000 ( 16.7%)   0.0010 (  8.1%)   0.0022 ( 12.7%)  DAG Legalization
   0.0018 ( 14.7%)   0.0000 (  0.0%)   0.0018 ( 14.7%)   0.0018 ( 10.4%)  DAG Combining 2
   0.0018 ( 14.4%)   0.0000 (  8.3%)   0.0018 ( 14.4%)   0.0018 ( 10.2%)  Instruction Scheduling
   0.0016 ( 12.6%)   0.0000 (  8.3%)   0.0016 ( 12.6%)   0.0016 (  8.9%)  Instruction Creation
   0.0005 (  4.3%)   0.0000 (  0.0%)   0.0005 (  4.3%)   0.0008 (  4.8%)  Type Legalization
   0.0001 (  1.2%)   0.0000 (  0.0%)   0.0001 (  1.2%)   0.0002 (  0.9%)  Vector Legalization
   0.0001 (  0.9%)   0.0000 (  8.3%)   0.0001 (  0.9%)   0.0001 (  0.6%)  Instruction Scheduling Cleanup
   0.0124 (100.0%)   0.0000 (100.0%)   0.0124 (100.0%)   0.0174 (100.0%)  Total

===-------------------------------------------------------------------------===
                          Pass execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 0.1363 seconds (0.1487 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0504 ( 38.5%)   0.0000 (  0.6%)   0.0504 ( 37.0%)   0.0605 ( 40.7%)  X86 DAG->DAG Instruction Selection
   0.0203 ( 15.5%)   0.0018 ( 33.1%)   0.0221 ( 16.2%)   0.0224 ( 15.1%)  X86 Assembly Printer
   0.0100 (  7.6%)   0.0009 ( 16.5%)   0.0109 (  8.0%)   0.0109 (  7.3%)  Fast Register Allocator
   0.0078 (  5.9%)   0.0006 ( 10.9%)   0.0084 (  6.1%)   0.0088 (  5.9%)  Module Verifier
   0.0074 (  5.6%)   0.0000 (  0.8%)   0.0074 (  5.4%)   0.0074 (  5.0%)  Module Verifier #2
   0.0063 (  4.8%)   0.0005 (  9.6%)   0.0068 (  5.0%)   0.0071 (  4.8%)  Prologue/Epilogue Insertion & Frame Finalization
   0.0046 (  3.5%)   0.0000 (  0.0%)   0.0046 (  3.4%)   0.0055 (  3.7%)  Pre-ISel Intrinsic Lowering
   0.0034 (  2.6%)   0.0002 (  4.2%)   0.0036 (  2.6%)   0.0036 (  2.4%)  Two-Address instruction pass
   0.0014 (  1.1%)   0.0001 (  2.1%)   0.0015 (  1.1%)   0.0015 (  1.0%)  Lower AMX type for load/store
   0.0013 (  1.0%)   0.0000 (  0.9%)   0.0013 (  1.0%)   0.0013 (  0.9%)  Finalize ISel and expand pseudo-instructions
   0.0012 (  0.9%)   0.0001 (  1.4%)   0.0013 (  0.9%)   0.0013 (  0.9%)  Machine Module Information
   0.0011 (  0.8%)   0.0001 (  1.6%)   0.0012 (  0.9%)   0.0012 (  0.8%)  Lower constant intrinsics
   0.0010 (  0.8%)   0.0001 (  1.3%)   0.0011 (  0.8%)   0.0011 (  0.7%)  Expand vector predication intrinsics
   0.0009 (  0.7%)   0.0000 (  0.7%)   0.0010 (  0.7%)   0.0010 (  0.7%)  MachineDominator Tree Construction
   0.0008 (  0.6%)   0.0001 (  1.2%)   0.0009 (  0.7%)   0.0009 (  0.6%)  Free MachineFunction
   0.0008 (  0.6%)   0.0000 (  0.7%)   0.0008 (  0.6%)   0.0008 (  0.6%)  X86 EFLAGS copy lowering
   0.0005 (  0.4%)   0.0000 (  0.8%)   0.0006 (  0.4%)   0.0008 (  0.5%)  Post-RA pseudo instruction expansion pass
   0.0007 (  0.5%)   0.0001 (  1.4%)   0.0007 (  0.5%)   0.0007 (  0.5%)  Check CFA info and insert CFI instructions if needed
   0.0005 (  0.4%)   0.0000 (  0.3%)   0.0005 (  0.4%)   0.0007 (  0.5%)  Exception handling preparation
   0.0007 (  0.5%)   0.0000 (  0.4%)   0.0007 (  0.5%)   0.0007 (  0.5%)  Expand reduction intrinsics
   0.0006 (  0.5%)   0.0000 (  0.4%)   0.0007 (  0.5%)   0.0007 (  0.4%)  Scalarize Masked Memory Intrinsics
   0.0005 (  0.4%)   0.0001 (  1.2%)   0.0006 (  0.4%)   0.0006 (  0.4%)  Remove unreachable blocks from the CFG
   0.0005 (  0.4%)   0.0000 (  0.7%)   0.0005 (  0.4%)   0.0005 (  0.4%)  Expand large div/rem
   0.0005 (  0.3%)   0.0000 (  0.2%)   0.0005 (  0.3%)   0.0005 (  0.3%)  Eliminate PHI nodes for register allocation
   0.0004 (  0.3%)   0.0000 (  0.6%)   0.0004 (  0.3%)   0.0004 (  0.3%)  Expand Atomic instructions
   0.0004 (  0.3%)   0.0000 (  0.0%)   0.0004 (  0.3%)   0.0004 (  0.3%)  Assignment Tracking Analysis
   0.0004 (  0.3%)   0.0000 (  0.2%)   0.0004 (  0.3%)   0.0004 (  0.3%)  Insert stack protectors
   0.0004 (  0.3%)   0.0000 (  0.7%)   0.0004 (  0.3%)   0.0004 (  0.3%)  Fast Tile Register Configure
   0.0004 (  0.3%)   0.0000 (  0.5%)   0.0004 (  0.3%)   0.0004 (  0.3%)  X86 pseudo instruction expansion pass
   0.0003 (  0.3%)   0.0000 (  0.5%)   0.0004 (  0.3%)   0.0004 (  0.2%)  Insert KCFI indirect call checks
   0.0003 (  0.3%)   0.0000 (  0.5%)   0.0004 (  0.3%)   0.0004 (  0.2%)  Expand large fp convert
   0.0003 (  0.2%)   0.0000 (  0.4%)   0.0003 (  0.2%)   0.0003 (  0.2%)  Unpack machine instruction bundles
   0.0003 (  0.2%)   0.0000 (  0.2%)   0.0003 (  0.2%)   0.0003 (  0.2%)  Fast Tile Register Preconfigure
   0.0003 (  0.2%)   0.0000 (  0.4%)   0.0003 (  0.2%)   0.0003 (  0.2%)  X86 Lower Tile Copy
   0.0002 (  0.2%)   0.0000 (  0.3%)   0.0003 (  0.2%)   0.0003 (  0.2%)  X86 Indirect Branch Tracking
   0.0003 (  0.2%)   0.0000 (  0.2%)   0.0003 (  0.2%)   0.0003 (  0.2%)  Expand indirectbr instructions
   0.0002 (  0.1%)   0.0000 (  0.4%)   0.0002 (  0.2%)   0.0002 (  0.1%)  Bundle Machine CFG Edges
   0.0002 (  0.1%)   0.0000 (  0.1%)   0.0002 (  0.1%)   0.0002 (  0.1%)  Argument Stack Rebase
   0.0002 (  0.1%)   0.0000 (  0.2%)   0.0002 (  0.1%)   0.0002 (  0.1%)  Insert fentry calls
   0.0001 (  0.1%)   0.0000 (  0.2%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Insert XRay ops
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Machine Optimization Remark Emitter
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Live DEBUG_VALUE analysis
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Stack Frame Layout Analysis
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 PIC Global Base Reg Initialization
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 Indirect Thunks
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 FP Stackifier
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Implement the 'patchable-function' attribute
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Lazy Machine Block Frequency Analysis
   0.0001 (  0.1%)   0.0000 (  0.2%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Machine Optimization Remark Emitter #2
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Fixup Statepoint Caller Saved
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Contiguously Lay Out Funclets
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Machine Optimization Remark Emitter #3
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Prepare callbr
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Local Stack Slot Allocation
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Remove Redundant DEBUG_VALUE analysis
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  StackMap Liveness Analysis
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Lazy Machine Block Frequency Analysis #3
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 DynAlloca Expander
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 Insert Cache Prefetches
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 Speculative Execution Side Effect Suppression
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 vzeroupper inserter
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 Return Thunks
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Lazy Machine Block Frequency Analysis #2
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 insert wait instruction
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 Load Value Injection (LVI) Ret-Hardening
   0.0001 (  0.1%)   0.0000 (  0.0%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 speculative load hardening
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Analyze Machine Code For Garbage Collection
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Machine Sanitizer Binary Metadata
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Compressing EVEX instrs to VEX encoding when possible
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.0%)  Pseudo Probe Inserter
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.0%)  X86 Discriminate Memory Operands
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Safe Stack instrumentation pass
   0.0001 (  0.0%)   0.0000 (  0.1%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Lower Garbage Collection Instructions
   0.0001 (  0.0%)   0.0000 (  0.1%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Shadow Stack GC Lowering
   0.0001 (  0.0%)   0.0000 (  0.1%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Lower AMX intrinsics
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Pass Configuration
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Branch Probability Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Profile summary info
   0.1309 (100.0%)   0.0054 (100.0%)   0.1363 (100.0%)   0.1487 (100.0%)  Total

===-------------------------------------------------------------------------===
                                 DWARF Emission
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0027 seconds (0.0028 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0025 (100.0%)   0.0002 (100.0%)   0.0027 (100.0%)   0.0028 (100.0%)  DWARF Exception Writer
   0.0025 (100.0%)   0.0002 (100.0%)   0.0027 (100.0%)   0.0028 (100.0%)  Total

===-------------------------------------------------------------------------===
                          Clang front-end time report
===-------------------------------------------------------------------------===
  Total Execution Time: 19.4630 seconds (19.6675 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  19.3916 (100.0%)   0.0715 (100.0%)  19.4630 (100.0%)  19.6675 (100.0%)  Clang front-end timer
  19.3916 (100.0%)   0.0715 (100.0%)  19.4630 (100.0%)  19.6675 (100.0%)  Total


~/Projects took 20s 

The dummy timer here

https://github.com/DeadSpheroid/clad/blob/ec56e48eab5bbd167628c8e12f2515e898f4d099/tools/ClangPlugin.cpp#L169

is because the documentation of llvm::TimerGroup fails to specify that the TimerGroup prints when all Timers in it are destroyed, the dummy timer prevents this printing from happening ensuring a single report

tools/ClangPlugin.h Outdated Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

tools/ClangPlugin.cpp Outdated Show resolved Hide resolved
tools/ClangPlugin.cpp Outdated Show resolved Hide resolved
Copy link
Owner

@vgvassilev vgvassilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need a test for this new feature.

@@ -159,7 +159,9 @@ namespace clad {
S.PerformPendingInstantiations();
m_PendingInstantiationsInFlight = false;
}

// Necessary to prevent separate timing reports due to expired timers
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably take inspiration from clang::CompilerInstance::createFrontendTimer().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on this? not sure what you mean

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang does not use dummy timers to trigger some effects. My suggestion was to look at the interface I pointed out, understand how was done there and take a similar approach.

tools/ClangPlugin.cpp Outdated Show resolved Hide resolved
tools/ClangPlugin.cpp Outdated Show resolved Hide resolved
Copy link

codecov bot commented Feb 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.87%. Comparing base (97e4092) to head (ef9bfec).
Report is 14 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #779      +/-   ##
==========================================
+ Coverage   94.71%   94.87%   +0.16%     
==========================================
  Files          49       49              
  Lines        7342     7478     +136     
==========================================
+ Hits         6954     7095     +141     
+ Misses        388      383       -5     
Files Coverage Δ
tools/ClangPlugin.cpp 93.84% <100.00%> (+3.58%) ⬆️
tools/ClangPlugin.h 88.67% <100.00%> (+0.21%) ⬆️

... and 6 files with indirect coverage changes

Files Coverage Δ
tools/ClangPlugin.cpp 93.84% <100.00%> (+3.58%) ⬆️
tools/ClangPlugin.h 88.67% <100.00%> (+0.21%) ⬆️

... and 6 files with indirect coverage changes

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

tools/ClangPlugin.cpp Outdated Show resolved Hide resolved
@DeadSpheroid
Copy link
Contributor Author

Sorry for the delay, id appreciate feedback on this implementation

@DeadSpheroid
Copy link
Contributor Author

Additionally, would a test for this new feature be simply checking if the timing report is printed when including the -ftime-report flag?

@vgvassilev
Copy link
Owner

Additionally, would a test for this new feature be simply checking if the timing report is printed when including the -ftime-report flag?

Yes.

@vgvassilev
Copy link
Owner

Sorry for the delay, id appreciate feedback on this implementation

Can you update the output that this PR produces now? I also see that we have still the SimpleTimer. That needs to go in favor of the new implementation.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

tools/ClangPlugin.h Outdated Show resolved Hide resolved
tools/ClangPlugin.h Outdated Show resolved Hide resolved
tools/ClangPlugin.h Outdated Show resolved Hide resolved
tools/ClangPlugin.h Outdated Show resolved Hide resolved
tools/ClangPlugin.h Outdated Show resolved Hide resolved
tools/ClangPlugin.h Outdated Show resolved Hide resolved
@DeadSpheroid
Copy link
Contributor Author

updated output, difference being that its printed at the end

~/Projects 
❯ clad++ issue769.cpp -o issue769 -ftime-report
===-------------------------------------------------------------------------===
                          Pass execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0008 seconds (0.0008 wall clock)

   ---User Time---   --User+System--   ---Wall Time---  --- Name ---
   0.0005 ( 59.9%)   0.0005 ( 59.9%)   0.0005 ( 59.6%)  AnnotationRemarksPass
   0.0003 ( 33.5%)   0.0003 ( 33.5%)   0.0003 ( 33.6%)  AlwaysInlinerPass
   0.0000 (  6.6%)   0.0000 (  6.6%)   0.0001 (  6.7%)  CoroConditionalWrapper
   0.0008 (100.0%)   0.0008 (100.0%)   0.0008 (100.0%)  Total

===-------------------------------------------------------------------------===
                        Analysis execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0002 seconds (0.0002 wall clock)

   ---User Time---   --User+System--   ---Wall Time---  --- Name ---
   0.0001 ( 79.9%)   0.0001 ( 79.9%)   0.0001 ( 79.8%)  TargetLibraryAnalysis
   0.0000 ( 14.7%)   0.0000 ( 14.7%)   0.0000 ( 14.6%)  ProfileSummaryAnalysis
   0.0000 (  5.4%)   0.0000 (  5.4%)   0.0000 (  5.6%)  InnerAnalysisManagerProxy<llvm::AnalysisManager<llvm::Function>, llvm::Module>
   0.0002 (100.0%)   0.0002 (100.0%)   0.0002 (100.0%)  Total

===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.2580 ( 69.9%)   0.0034 ( 22.3%)   0.2615 ( 67.9%)   0.2778 ( 65.8%)  LLVM IR Generation Time
   0.1114 ( 30.1%)   0.0120 ( 77.7%)   0.1234 ( 32.1%)   0.1447 ( 34.2%)  Code Generation Time
   0.3694 (100.0%)   0.0155 (100.0%)   0.3849 (100.0%)   0.4225 (100.0%)  Total

===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0076 seconds (0.0122 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0016 ( 23.9%)   0.0002 ( 18.6%)   0.0018 ( 23.3%)   0.0038 ( 31.5%)  DAG Combining 1
   0.0014 ( 20.3%)   0.0002 ( 23.5%)   0.0016 ( 20.6%)   0.0027 ( 21.9%)  Instruction Selection
   0.0007 ( 10.1%)   0.0000 (  4.6%)   0.0007 (  9.5%)   0.0019 ( 15.4%)  DAG Legalization
   0.0009 ( 13.8%)   0.0002 ( 18.5%)   0.0011 ( 14.3%)   0.0011 (  8.9%)  Instruction Scheduling
   0.0009 ( 13.8%)   0.0001 ( 15.0%)   0.0011 ( 13.9%)   0.0011 (  8.7%)  DAG Combining 2
   0.0008 ( 12.0%)   0.0001 ( 13.9%)   0.0009 ( 12.2%)   0.0009 (  7.6%)  Instruction Creation
   0.0003 (  4.2%)   0.0000 (  3.8%)   0.0003 (  4.1%)   0.0006 (  4.8%)  Type Legalization
   0.0001 (  1.3%)   0.0000 (  1.3%)   0.0001 (  1.3%)   0.0001 (  0.8%)  Vector Legalization
   0.0001 (  0.9%)   0.0000 (  0.7%)   0.0001 (  0.8%)   0.0001 (  0.5%)  Instruction Scheduling Cleanup
   0.0068 (100.0%)   0.0009 (100.0%)   0.0076 (100.0%)   0.0122 (100.0%)  Total

===-------------------------------------------------------------------------===
                          Pass execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0842 seconds (0.0963 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0266 ( 35.6%)   0.0040 ( 43.1%)   0.0307 ( 36.4%)   0.0404 ( 41.9%)  X86 DAG->DAG Instruction Selection
   0.0131 ( 17.5%)   0.0009 ( 10.1%)   0.0140 ( 16.7%)   0.0144 ( 14.9%)  X86 Assembly Printer
   0.0057 (  7.7%)   0.0010 ( 11.2%)   0.0068 (  8.1%)   0.0068 (  7.0%)  Fast Register Allocator
   0.0047 (  6.3%)   0.0005 (  5.4%)   0.0052 (  6.2%)   0.0056 (  5.8%)  Module Verifier
   0.0039 (  5.2%)   0.0005 (  5.0%)   0.0043 (  5.1%)   0.0043 (  4.5%)  Module Verifier #2
   0.0037 (  5.0%)   0.0005 (  5.7%)   0.0043 (  5.1%)   0.0043 (  4.4%)  Prologue/Epilogue Insertion & Frame Finalization
   0.0030 (  4.0%)   0.0000 (  0.1%)   0.0030 (  3.6%)   0.0039 (  4.0%)  Pre-ISel Intrinsic Lowering
   0.0018 (  2.3%)   0.0004 (  4.3%)   0.0022 (  2.6%)   0.0022 (  2.2%)  Two-Address instruction pass
   0.0009 (  1.2%)   0.0001 (  0.8%)   0.0009 (  1.1%)   0.0010 (  1.0%)  Lower AMX type for load/store
   0.0003 (  0.4%)   0.0000 (  0.3%)   0.0003 (  0.4%)   0.0009 (  0.9%)  Exception handling preparation
   0.0007 (  1.0%)   0.0001 (  1.0%)   0.0008 (  1.0%)   0.0008 (  0.8%)  Finalize ISel and expand pseudo-instructions
   0.0007 (  0.9%)   0.0001 (  0.8%)   0.0008 (  0.9%)   0.0007 (  0.8%)  Machine Module Information
   0.0007 (  0.9%)   0.0001 (  0.6%)   0.0007 (  0.8%)   0.0007 (  0.7%)  Lower constant intrinsics
   0.0006 (  0.8%)   0.0000 (  0.5%)   0.0007 (  0.8%)   0.0006 (  0.7%)  Expand vector predication intrinsics
   0.0006 (  0.8%)   0.0001 (  0.7%)   0.0006 (  0.8%)   0.0006 (  0.7%)  MachineDominator Tree Construction
   0.0003 (  0.4%)   0.0001 (  0.6%)   0.0004 (  0.5%)   0.0006 (  0.6%)  Post-RA pseudo instruction expansion pass
   0.0005 (  0.7%)   0.0000 (  0.4%)   0.0006 (  0.7%)   0.0006 (  0.6%)  Free MachineFunction
   0.0004 (  0.6%)   0.0001 (  0.8%)   0.0005 (  0.6%)   0.0005 (  0.5%)  X86 EFLAGS copy lowering
   0.0004 (  0.5%)   0.0001 (  0.7%)   0.0005 (  0.6%)   0.0005 (  0.5%)  Check CFA info and insert CFI instructions if needed
   0.0004 (  0.5%)   0.0000 (  0.3%)   0.0004 (  0.5%)   0.0004 (  0.4%)  Expand reduction intrinsics
   0.0004 (  0.5%)   0.0000 (  0.3%)   0.0004 (  0.5%)   0.0004 (  0.4%)  Scalarize Masked Memory Intrinsics
   0.0004 (  0.5%)   0.0000 (  0.4%)   0.0004 (  0.5%)   0.0004 (  0.4%)  Remove unreachable blocks from the CFG
   0.0003 (  0.4%)   0.0000 (  0.3%)   0.0003 (  0.4%)   0.0003 (  0.4%)  Expand large div/rem
   0.0002 (  0.3%)   0.0001 (  1.3%)   0.0003 (  0.4%)   0.0003 (  0.3%)  Eliminate PHI nodes for register allocation
   0.0003 (  0.3%)   0.0000 (  0.2%)   0.0003 (  0.3%)   0.0003 (  0.3%)  Insert stack protectors
   0.0002 (  0.3%)   0.0000 (  0.4%)   0.0002 (  0.3%)   0.0003 (  0.3%)  X86 pseudo instruction expansion pass
   0.0002 (  0.3%)   0.0000 (  0.2%)   0.0003 (  0.3%)   0.0003 (  0.3%)  Expand Atomic instructions
   0.0002 (  0.3%)   0.0000 (  0.2%)   0.0003 (  0.3%)   0.0002 (  0.3%)  Assignment Tracking Analysis
   0.0002 (  0.3%)   0.0000 (  0.4%)   0.0003 (  0.3%)   0.0002 (  0.3%)  Fast Tile Register Configure
   0.0002 (  0.3%)   0.0000 (  0.3%)   0.0002 (  0.3%)   0.0002 (  0.2%)  Insert KCFI indirect call checks
   0.0002 (  0.3%)   0.0000 (  0.2%)   0.0002 (  0.3%)   0.0002 (  0.2%)  Expand large fp convert
   0.0002 (  0.2%)   0.0000 (  0.2%)   0.0002 (  0.2%)   0.0002 (  0.2%)  Unpack machine instruction bundles
   0.0002 (  0.2%)   0.0000 (  0.1%)   0.0002 (  0.2%)   0.0002 (  0.2%)  Expand indirectbr instructions
   0.0001 (  0.2%)   0.0000 (  0.3%)   0.0002 (  0.2%)   0.0002 (  0.2%)  X86 Lower Tile Copy
   0.0001 (  0.2%)   0.0000 (  0.2%)   0.0002 (  0.2%)   0.0002 (  0.2%)  Fast Tile Register Preconfigure
   0.0001 (  0.2%)   0.0000 (  0.2%)   0.0002 (  0.2%)   0.0002 (  0.2%)  X86 Indirect Branch Tracking
   0.0001 (  0.2%)   0.0000 (  0.2%)   0.0001 (  0.2%)   0.0001 (  0.2%)  Bundle Machine CFG Edges
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Argument Stack Rebase
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Insert fentry calls
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Insert XRay ops
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Live DEBUG_VALUE analysis
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Machine Optimization Remark Emitter
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Stack Frame Layout Analysis
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 PIC Global Base Reg Initialization
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 Indirect Thunks
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 FP Stackifier
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Contiguously Lay Out Funclets
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Lazy Machine Block Frequency Analysis
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Implement the 'patchable-function' attribute
   0.0001 (  0.1%)   0.0000 (  0.0%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Fixup Statepoint Caller Saved
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Prepare callbr
   0.0001 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Machine Optimization Remark Emitter #2
   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Machine Optimization Remark Emitter #3
   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 Speculative Execution Side Effect Suppression
   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  StackMap Liveness Analysis
   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Remove Redundant DEBUG_VALUE analysis
   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 Insert Cache Prefetches
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0001 (  0.1%)  Lazy Machine Block Frequency Analysis #3
   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 Load Value Injection (LVI) Ret-Hardening
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0001 (  0.1%)  X86 insert wait instruction
   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 vzeroupper inserter
   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  X86 DynAlloca Expander
   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0001 (  0.1%)   0.0001 (  0.1%)  Local Stack Slot Allocation
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0000 (  0.1%)  Lazy Machine Block Frequency Analysis #2
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0000 (  0.1%)  Analyze Machine Code For Garbage Collection
   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.0%)  X86 Discriminate Memory Operands
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0000 (  0.0%)  X86 Return Thunks
   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Machine Sanitizer Binary Metadata
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Compressing EVEX instrs to VEX encoding when possible
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Pseudo Probe Inserter
   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.0%)  X86 speculative load hardening
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Safe Stack instrumentation pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Lower Garbage Collection Instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Shadow Stack GC Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Lower AMX intrinsics
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Pass Configuration
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Branch Probability Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Profile summary info
   0.0749 (100.0%)   0.0094 (100.0%)   0.0842 (100.0%)   0.0963 (100.0%)  Total

===-------------------------------------------------------------------------===
                                 DWARF Emission
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0016 seconds (0.0017 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0016 (100.0%)   0.0001 (100.0%)   0.0016 (100.0%)   0.0017 (100.0%)  DWARF Exception Writer
   0.0016 (100.0%)   0.0001 (100.0%)   0.0016 (100.0%)   0.0017 (100.0%)  Total

===-------------------------------------------------------------------------===
                          Clang front-end time report
===-------------------------------------------------------------------------===
  Total Execution Time: 11.4778 seconds (11.6399 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  11.4218 (100.0%)   0.0560 (100.0%)  11.4778 (100.0%)  11.6399 (100.0%)  Clang front-end timer
  11.4218 (100.0%)   0.0560 (100.0%)  11.4778 (100.0%)  11.6399 (100.0%)  Total

===-------------------------------------------------------------------------===
                             Timers for Clad Funcs
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0431 seconds (0.0431 wall clock)

   ---User Time---   --User+System--   ---Wall Time---  --- Name ---
   0.0182 ( 42.1%)   0.0182 ( 42.1%)   0.0182 ( 42.1%)  fn
   0.0162 ( 37.5%)   0.0162 ( 37.5%)   0.0162 ( 37.5%)  grad_func
   0.0045 ( 10.5%)   0.0045 ( 10.5%)   0.0045 ( 10.5%)  fn2
   0.0028 (  6.4%)   0.0028 (  6.4%)   0.0028 (  6.4%)  fnr
   0.0015 (  3.4%)   0.0015 (  3.4%)   0.0015 (  3.4%)  fns
   0.0431 (100.0%)   0.0431 (100.0%)   0.0431 (100.0%)  Total

@DeadSpheroid
Copy link
Contributor Author

DeadSpheroid commented Feb 24, 2024

I've implemented it as a stack of pointers to only the active llvm::Timers, keeping atleast one llvm::Timer(running or not) alive at all times.
Output is the same

@DeadSpheroid
Copy link
Contributor Author

DeadSpheroid commented Feb 24, 2024

ive preserved the LIBCLAD_TIMING flag that was used by the SimpleTimer since i observed its referenced in the performance tests, should this be changed as well?
LIBCLAD_TIMING prints only the timings of clad funcs, while the -ftime-report prints all the compilation timings

tools/ClangPlugin.cpp Show resolved Hide resolved
tools/ClangPlugin.cpp Outdated Show resolved Hide resolved
}
void clad::CladTimerGroup::StopTimer() {
Timers.back()->stopTimer();
if (Timers.size() != 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we call pop_back when size is 1?

Copy link
Contributor Author

@DeadSpheroid DeadSpheroid Feb 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its the llvm::TimerGroup implementation, as given here

The description says it prints when the llvm::TimerGroup is destroyed but the implementation here line 339 prints it even if the TimerGroup has no alive timers(before the TimerGroup is destroyed).

So by keeping atleast one Timer(running or not) alive, it prints only one report at the very end, rather than split reports

Far as i can tell, llvm keeps some of the Timers alive, in their usage of TimerGroup as seen here Line 56 in the StringMap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative i believe would be to leave all the Timers alive, which seems like a waste of memory

Copy link
Contributor Author

@DeadSpheroid DeadSpheroid Feb 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could use separate TimerGroups for each function differentiation request, so the output would look something like this

===-------------------------------------------------------------------------===
                             Timing Report for fn(forward mode)
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0431 seconds (0.0431 wall clock)

   ---User Time---    --User+System--   ---Wall Time---  --- Name ---
   0.0182 ( 42.1%)   0.0182 ( 42.1%)   0.0182 ( 42.1%)  fn
   0.0162 ( 37.5%)   0.0162 ( 37.5%)   0.0162 ( 37.5%)  nested1_fn
   0.0045 ( 10.5%)   0.0045 ( 10.5%)   0.0045 ( 10.5%)  nested2_fn
   0.0431 (100.0%)   0.0431 (100.0%)   0.0431 (100.0%)  Total
===-------------------------------------------------------------------------===
                             Timing Report for fnr(reverse mode)
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0431 seconds (0.0431 wall clock)

   ---User Time---   --User+System--   ---Wall Time---  --- Name ---
   0.0028 (  6.4%)     0.0028 (  6.4%)      0.0028 (  6.4%)   fnr
   0.0015 (  3.4%)     0.0015 (  3.4%)      0.0015 (  3.4%)   nested_fnr
   0.0431 (100.0%)   0.0431 (100.0%)   0.0431 (100.0%)  Total

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could use separate TimerGroups for each function differentiation request, so the output would look something like this

===-------------------------------------------------------------------------===
                             Timing Report for fn(forward mode)
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0431 seconds (0.0431 wall clock)

   ---User Time---    --User+System--   ---Wall Time---  --- Name ---
   0.0182 ( 42.1%)   0.0182 ( 42.1%)   0.0182 ( 42.1%)  fn
   0.0162 ( 37.5%)   0.0162 ( 37.5%)   0.0162 ( 37.5%)  nested1_fn
   0.0045 ( 10.5%)   0.0045 ( 10.5%)   0.0045 ( 10.5%)  nested2_fn
   0.0431 (100.0%)   0.0431 (100.0%)   0.0431 (100.0%)  Total
===-------------------------------------------------------------------------===
                             Timing Report for fnr(reverse mode)
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0431 seconds (0.0431 wall clock)

   ---User Time---   --User+System--   ---Wall Time---  --- Name ---
   0.0028 (  6.4%)     0.0028 (  6.4%)      0.0028 (  6.4%)   fnr
   0.0015 (  3.4%)     0.0015 (  3.4%)      0.0015 (  3.4%)   nested_fnr
   0.0431 (100.0%)   0.0431 (100.0%)   0.0431 (100.0%)  Total

Can you elaborate what the grouping would be? It would be for all calls to clad::differentiate and all calls to clad::gradient?

Copy link
Contributor Author

@DeadSpheroid DeadSpheroid Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each differentiate or gradient call in the source file would receive its own report and this report would include all the differentiate calls that were made for that particular function in the source file.
So yes your understanding is correct

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that's better, @parth-07 what do you think?

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

tools/ClangPlugin.cpp Outdated Show resolved Hide resolved
tools/ClangPlugin.cpp Outdated Show resolved Hide resolved
tools/ClangPlugin.h Outdated Show resolved Hide resolved
@DeadSpheroid
Copy link
Contributor Author

DeadSpheroid commented Feb 25, 2024

The test is failing because the times taken by the funcs can vary and therefore the order can vary, ill remove that part, and just check if the header of the timings ("Timers for clad funcs") is printed.

@DeadSpheroid
Copy link
Contributor Author

Most of the CI should pass now, but i couldnt figure out why certain tests seem to be failing from the same error, for eg this run, would appreciate some assistance figuring this error out, if it still persists after the most recent commits

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@DeadSpheroid
Copy link
Contributor Author

Any idea what could be causing some of the tests to fail? Outside of the clang-format one @vgvassilev @parth-07

@vgvassilev
Copy link
Owner

Any idea what could be causing some of the tests to fail? Outside of the clang-format one @vgvassilev @parth-07

"To ssh into the GitHub runner on which tests are failing, click on re-run actions and select the debug logging checkbox. If necessary increase the value of timeout-minutes key in .github/workflows/ci.yml to a suitable value for debugging – 30 - 60 minutes should generally be enough.". More you can find here.

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@DeadSpheroid
Copy link
Contributor Author

Found the issue, older clang versions(<11) never used the TimePasses despite it being defined, instead using ShowTimers, according to the clang version, clad should use the right one depending on the version

@vgvassilev
Copy link
Owner

@DeadSpheroid, can you squash all commits and add a good commit message?

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@vgvassilev
Copy link
Owner

Can you apply git-clang-format to your changes so that the bots are happy?

…vassilev#769

Clad will now print a timings report for all clad function calls
Adds CladTimerGroup class to time the clad functions
Repurposes LIBCLAD_TIMING flag to print only the timings report for clad
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Owner

@vgvassilev vgvassilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@vgvassilev vgvassilev merged commit f242077 into vgvassilev:master Feb 29, 2024
83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement differentiation time statistics upon setting clang -ftime-report
3 participants