-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CoreML] more performace flag #22975
base: main
Are you sure you want to change the base?
Conversation
218fe66
to
bf095dd
Compare
d64d79c
to
5249880
Compare
This reverts commit 5249880.
@@ -151,7 +151,7 @@ bool BatchNormalizationOpBuilder::IsOpSupportedImpl(const Node& node, const OpBu | |||
return false; | |||
} | |||
|
|||
#if defined(TARGET_OS_IOS) && defined(TARGET_CPU_X86_64) | |||
#if defined(TARGET_OS_IOS) && defined(TARGET_CPU_X86_64) && TARGET_OS_IOS && TARGET_CPU_X86_64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What values do these #defines have that means we need to check if they're defined and the value is non-zero?
If it's iOS and x86_64 that's only the simulator right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. TARGET_CPU_X86_64
is defiend in #include <TargetConditionals.h>
for either 0 or 1
// it's impossible to have empty exes input for unsqueeze | ||
if (!axes.empty()) { | ||
// coreml squeeze op does support negative axes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we clarify these commands? this code is now hit for squeeze and unsqueeze so I assume both handle negative axes. and don't both squeeze and unsqueeze require axes to be meaningful otherwise they're no-ops (and we should drop in level 1 if we don't already)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to onnx spec; squeeze can have empty axes(optional input), which means to drop all dimens which equal to 1.
But for unsqueeze, axes
is required.
@@ -300,14 +301,56 @@ Status GetMLMultiArrayCopyInfo(const MLMultiArray* _Nonnull array, | |||
return Status::OK(); | |||
} | |||
|
|||
void ProfileComputePlan(NSURL* compileUrl, MLModelConfiguration* config) { | |||
#if defined(__APPLE__) && defined(__clang__) && __clang_major__ >= 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the #if defined...
if we have the @available
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to pass this CI https://github.com/microsoft/onnxruntime/actions/runs/12118598969/job/33783428976?pr=22975
But it didn't work
// Set the specialization strategy to FastPrediction for macOS 10.15+ | ||
#if defined(__APPLE__) && defined(__clang__) && __clang_major__ >= 15 | ||
if (HAS_COREML8_OR_LATER) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to check the clang version?
is this just for macOS or does it set it for iOS as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have suggestions on passing the clang-tidy https://github.com/microsoft/onnxruntime/actions/runs/12118598969/job/33783428976?pr=22975?
static const char* const kCoremlProviderOption_SpecializationStrategy = "SpecializationStrategy"; | ||
static const char* const kCoremlProviderOption_ProfileComputePlan = "ProfileComputePlan"; | ||
static const char* const kCoremlProviderOption_AllowLowPrecisionAccumulationOnGPU = "AllowLowPrecisionAccumulationOnGPU"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there documentation on how/when/why these options should be used and what is valid input for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some notes from https://developer.apple.com/documentation.
But it's not that precise so far to indicate how much gains it will take.
Description
refactor unsquzee's implementation
add more flags to boost peformance.
add profile flag
Motivation and Context