Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ansor][AutoTVM v2.0] Phase 1: Access Analyzer #6103

Merged
merged 8 commits into from
Jul 25, 2020

Conversation

merrymercy
Copy link
Member

@merrymercy merrymercy commented Jul 21, 2020

For the full upstream plan, see Ansor RFC.

This pr

  • Introduces the access analyzer which analyzes the read-write relations in a compute declaration.
    The search policy will use the analysis results to make decisions such as doing multi-level tiling for an op or strictly inlining an op.
  • Moves ansor related header files to include/tvm/auto_scheduler and polishes some comments.

@merrymercy
Copy link
Member Author

@junrushao
Copy link
Member

junrushao commented Jul 21, 2020

It is one of the core part of the system. Will take a look later tomorrow night.

@merrymercy
Copy link
Member Author

This PR is ready for review. But It has some dependency on #6073. I will make it pass the ci test after #6073 is merged

@jcf94
Copy link
Contributor

jcf94 commented Jul 21, 2020

This PR is ready for review. But It has some dependency on #6073. I will make it pass the ci test after #6073 is merged

#6073 's macOS build failure seems caused by some infrastructure error, I think it's ready for merge? I'll continue to upstream the left steps after #6073 .

@merrymercy merrymercy force-pushed the pr-access-analyzer branch 2 times, most recently from ea153e2 to 56b0187 Compare July 22, 2020 05:25
@merrymercy merrymercy force-pushed the pr-access-analyzer branch from 95ef46c to 966c3cc Compare July 22, 2020 05:54
include/tvm/auto_scheduler/auto_schedule.h Show resolved Hide resolved
include/tvm/auto_scheduler/loop_state.h Outdated Show resolved Hide resolved
include/tvm/auto_scheduler/loop_state.h Outdated Show resolved Hide resolved
include/tvm/auto_scheduler/compute_dag.h Outdated Show resolved Hide resolved
include/tvm/auto_scheduler/compute_dag.h Show resolved Hide resolved
include/tvm/auto_scheduler/compute_dag.h Outdated Show resolved Hide resolved
/*! \brief Store whether the operation is an output operation */
OperationMap<bool> is_output;
/*! \brief Store the topological order of operations */
Array<te::Operation> ops_topo_order;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the relationship between this array and ComputeDAG::ops?

Copy link
Member Author

@merrymercy merrymercy Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the same. I store it in AccessAnalyzer because it is used first here.
In the constructor of ComputeDAG, it copies ops_topo_order as its ops

include/tvm/auto_scheduler/compute_dag.h Outdated Show resolved Hide resolved
include/tvm/auto_scheduler/compute_dag.h Outdated Show resolved Hide resolved
include/tvm/auto_scheduler/compute_dag.h Outdated Show resolved Hide resolved
namespace auto_scheduler {

/*! \brief Static analysis result for a ComputeDAG */
class AccessAnalyzerNode : public Object {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like AccessAnalyzer itself can be a much more principled and extensible component of the system, so shall we put it in a separate file instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Maybe have an analysis.h to expect more analyzers in the future. On the other hand, another direction might be renaming AccessAnalyzer to ComputeDAGAnalyzer, because it provides some APIs for the ops in a compute DAG, such as NeedMultiLevelTiling, IsOutput, etc.

@tqchen
Copy link
Member

tqchen commented Jul 22, 2020

cc @jwfromm @mbrookhart @jroesch

src/auto_scheduler/compute_dag.cc Outdated Show resolved Hide resolved
src/auto_scheduler/compute_dag.cc Outdated Show resolved Hide resolved
src/auto_scheduler/compute_dag.cc Outdated Show resolved Hide resolved
src/auto_scheduler/compute_dag.cc Outdated Show resolved Hide resolved
@@ -126,6 +555,7 @@ class FlopEstimator : public ExprFunctor<double(const PrimExpr& n)> {
fail_ = true;
break;
}
cur_type_code_ = pop->output_dtype(0).code();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you elaborate why we need cur_type_code_? how do we deal with the case that computation is mixed with int8 and fp32?

Copy link
Member Author

@merrymercy merrymercy Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FLOP information is only used for printing and debugging. It is okay to just give a rough estimation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@merrymercy I agree that it is totally okay if we can just use rough information (in fact it is highly non-trivial to get accurate info without backend info). My point is that cur_type_code_ comes from the dtype of output, but it is totally possible that a compute dag contains computation of different type code (int8, fp16)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not common at all. The common case is either int8->int16/32 or fp16/fp16->fp32

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Agree with @junrushao1994 on most comments.

namespace auto_scheduler {

/*! \brief Static analysis result for a ComputeDAG */
class AccessAnalyzerNode : public Object {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Maybe have an analysis.h to expect more analyzers in the future. On the other hand, another direction might be renaming AccessAnalyzer to ComputeDAGAnalyzer, because it provides some APIs for the ops in a compute DAG, such as NeedMultiLevelTiling, IsOutput, etc.

src/auto_scheduler/compute_dag.cc Outdated Show resolved Hide resolved
src/auto_scheduler/compute_dag.cc Show resolved Hide resolved
src/auto_scheduler/compute_dag.cc Outdated Show resolved Hide resolved
node->read_from[op] = OperationMap<std::vector<std::vector<PrimExpr>>>();
} else if (auto cop = op.as<te::ComputeOpNode>()) {
TensorAccessExtractor extractor;
for (const auto& exp : cop->body) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious: did we test with ComputeOp with multiple bodies?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It will be addressed later.

src/auto_scheduler/compute_dag.cc Outdated Show resolved Hide resolved
src/auto_scheduler/compute_dag.cc Outdated Show resolved Hide resolved
Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally done this round of review... Thanks for the contribution!

Comment on lines +270 to +271
const Array<PrimExpr>& output_shape = op->output_shape(0);
const Array<PrimExpr>& producer_shape = producer->output_shape(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do it only work for te::Operation with a single output? Do we have a fallback solution for operators with multiple outputs like argmax?

Copy link
Member Author

@merrymercy merrymercy Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with multiple outputs, the shape will be the same

src/auto_scheduler/compute_dag.cc Show resolved Hide resolved
src/auto_scheduler/compute_dag.cc Outdated Show resolved Hide resolved
src/auto_scheduler/compute_dag.cc Show resolved Hide resolved
src/auto_scheduler/compute_dag.cc Outdated Show resolved Hide resolved
src/auto_scheduler/compute_dag.cc Outdated Show resolved Hide resolved
src/auto_scheduler/compute_dag.cc Outdated Show resolved Hide resolved
src/auto_scheduler/compute_dag.cc Outdated Show resolved Hide resolved
src/auto_scheduler/compute_dag.cc Show resolved Hide resolved
@merrymercy merrymercy force-pushed the pr-access-analyzer branch 2 times, most recently from 99e9ab3 to b01fcf8 Compare July 23, 2020 23:08
@merrymercy
Copy link
Member Author

@junrushao1994 @jcf94 @comaniac Most of the comments are addressed. I added more doc and make the name convention more consistent and meaningful. Please take another look

@merrymercy merrymercy force-pushed the pr-access-analyzer branch from b01fcf8 to f63929b Compare July 23, 2020 23:15
Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@merrymercy merrymercy force-pushed the pr-access-analyzer branch from f63929b to c690c32 Compare July 25, 2020 00:03
Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@merrymercy merrymercy force-pushed the pr-access-analyzer branch from 4cfe0cf to e9933c6 Compare July 25, 2020 00:14
@merrymercy merrymercy force-pushed the pr-access-analyzer branch from e9933c6 to e032686 Compare July 25, 2020 00:18
@merrymercy merrymercy merged commit c9cbd04 into apache:master Jul 25, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
* add access analyzer

* add test cases

* move header files and polish comments

* fix lint

* update

* fix lint

* address comments

* fix lint
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
* add access analyzer

* add test cases

* move header files and polish comments

* fix lint

* update

* fix lint

* address comments

* fix lint
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
* add access analyzer

* add test cases

* move header files and polish comments

* fix lint

* update

* fix lint

* address comments

* fix lint
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Sep 2, 2020
* add access analyzer

* add test cases

* move header files and polish comments

* fix lint

* update

* fix lint

* address comments

* fix lint
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Sep 3, 2020
* add access analyzer

* add test cases

* move header files and polish comments

* fix lint

* update

* fix lint

* address comments

* fix lint
@merrymercy merrymercy deleted the pr-access-analyzer branch September 27, 2020 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants