Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CoreML] more performace flag #22975

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

[CoreML] more performace flag #22975

wants to merge 8 commits into from

Conversation

wejoncy
Copy link
Contributor

@wejoncy wejoncy commented Nov 29, 2024

Description

refactor unsquzee's implementation
add more flags to boost peformance.
add profile flag

Motivation and Context

@wejoncy wejoncy requested review from skottmckay and edgchen1 and removed request for skottmckay November 29, 2024 12:05
@wejoncy wejoncy marked this pull request as ready for review November 29, 2024 12:05
@@ -151,7 +151,7 @@ bool BatchNormalizationOpBuilder::IsOpSupportedImpl(const Node& node, const OpBu
return false;
}

#if defined(TARGET_OS_IOS) && defined(TARGET_CPU_X86_64)
#if defined(TARGET_OS_IOS) && defined(TARGET_CPU_X86_64) && TARGET_OS_IOS && TARGET_CPU_X86_64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What values do these #defines have that means we need to check if they're defined and the value is non-zero?

If it's iOS and x86_64 that's only the simulator right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. TARGET_CPU_X86_64 is defiend in #include <TargetConditionals.h> for either 0 or 1

Comment on lines +102 to +104
// it's impossible to have empty exes input for unsqueeze
if (!axes.empty()) {
// coreml squeeze op does support negative axes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we clarify these commands? this code is now hit for squeeze and unsqueeze so I assume both handle negative axes. and don't both squeeze and unsqueeze require axes to be meaningful otherwise they're no-ops (and we should drop in level 1 if we don't already)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to onnx spec; squeeze can have empty axes(optional input), which means to drop all dimens which equal to 1.
But for unsqueeze, axes is required.

@@ -300,14 +301,56 @@ Status GetMLMultiArrayCopyInfo(const MLMultiArray* _Nonnull array,
return Status::OK();
}

void ProfileComputePlan(NSURL* compileUrl, MLModelConfiguration* config) {
#if defined(__APPLE__) && defined(__clang__) && __clang_major__ >= 15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the #if defined... if we have the @available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +456 to +458
// Set the specialization strategy to FastPrediction for macOS 10.15+
#if defined(__APPLE__) && defined(__clang__) && __clang_major__ >= 15
if (HAS_COREML8_OR_LATER) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to check the clang version?

is this just for macOS or does it set it for iOS as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 52 to 54
static const char* const kCoremlProviderOption_SpecializationStrategy = "SpecializationStrategy";
static const char* const kCoremlProviderOption_ProfileComputePlan = "ProfileComputePlan";
static const char* const kCoremlProviderOption_AllowLowPrecisionAccumulationOnGPU = "AllowLowPrecisionAccumulationOnGPU";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there documentation on how/when/why these options should be used and what is valid input for them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some notes from https://developer.apple.com/documentation.
But it's not that precise so far to indicate how much gains it will take.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants