Skip to content

Commit

Permalink
[Ansor][AutoTVM v2.0] Part 0: Ansor minimum system for auto schedule …
Browse files Browse the repository at this point in the history
…generating (apache#5962)

* Code migration Start (#1)

* Init commit: Code migration Start

* Add loop_state.cc/h

* Add ComputeDAG basic test

* Split transform_step out & Update more UTs (#3)

* Split transform_step out

* Update GetProducers & GetConsumers

* Update UTs

* Add UT for CacheReadWrite & Some bug fix

* Add search_task, measure and serialization (#4)

* Add FollowSplit & FollowFusedSplit tests

* Update dag.InferBound & its UT

* Add search_task, measure and serialization

* Update Serialization UT

* Add MetaTileRewritePolicy (apache#5)

* Add feature

* Add cost_model, meta_tile_rewrite_policy

* Add MetaTileRewritePolicy basic UT

* Basic Python API for State (apache#6)

* Add Basic Python API for State

* Add UTs for State

* Add Python API: Measure & Task (apache#7)

* Update the return value of state operation

* Add task

* Copy measure.py & utils.py

* Fix LocalBuilder

* Fix LocalRunner

* Add ansor.auto_schedule() API; First AutoSchedule working version(apache#8)

* Add basic Python support for ansor.auto_schedule

* Update AutoSchedule API

* Bug fix for get the attach point of a fused iter

* Update UT after infer bug fix

* Bug fix & Add python serialization API (apache#10)

* Delete C++ UT hack since Python is ready

* Add ndarray.non_empty

* Update Serialization python API

* Improve code style, python wrapper and test cases (apache#11)

* Update c++ code style and unit test

* Update python State wrapper and test cases

* fix unit tests

* Add RPCRunner & OpenCL/CUDA test (apache#12)

* Add RPCRunner & OpenCL search test

* Add CUDA search test

* Add RPCRunner test

* rebase to upstream/master

* Add Ansor basic tutorial (apache#13)

* Add basic tutorial

* migrate feature extraction (apache#14)

* Add XGBModel & RPCRunnerWarpper (apache#15)

* Add XGBModel & RPCRunnerWarpper

* Revert "Add Parallel Granularity Mutation"

* Migrate workload_registry.py (apache#16)

* add workload registry

* update

* update

* add task scheduler (apache#17)

* Add conv2d cuda tutorial with workload registry (apache#18)

* add tune_test.py (the old tune_wkl.py) (apache#19)

* add tune_test.py (the old tune_wkl.py)

* update

* fix measure

* fix for gpu

* Code refine for tune_test.py & Add a pre load callback (apache#20)

* Bug fix for tutorials

* Add PreLoadMeasuredStates

* Add search_callback support for task tuner

* Code refine for tune_test.py

* Update

* Update

* Update

* Update

* Bug fix

* Add python custom sketch rule (apache#21)

* Add custom sketch rule

* Bug fix

* Ansor Relay Integration (without layout rewrite) (apache#22)

* relay integration

* Add tune_op_subgraph.py & Some code clean for tune_network.py (apache#23)

* Add single op tune scripts

* Add tune subgraph support

* Merge all op & all subgraph to one file

* Rename file

* add explicit_unroll_max_extent (apache#25)

* Add Index simplification & API update (apache#26)

* Add vectorized cooperative_fetching test

* Update math simplify for vectorized CF

* File rename

* Update tune_network

* API update

* Update PreLoadMeasuredStates & Some bug fix (apache#27)

* Add a threading wrapper to fix the test bug

* Set default TVM_USE_AUTO_SCHEDULER to false

* Update PreLoadMeasuredStates callback

* Add tensorize step for loop_state (apache#31)

* Add tensorize step

* State python api update (apache#33)

* Start to update api

* Add compute_dag to state

* API update

* kernel layout rewrite (apache#28)

* kernel layout rewrite

* remove some hacks

* add defuse_ops pass and move kernel_layout_rewrite pass after fuse_ops pass

* set TVM_RELAY_DISABLE_BUILD_CACHE for task extraction and prepare_layout_rewrite

* [cache flush] port cache flush to ansor (apache#32)

* Improve relay integration (apache#34)

* tmp checkpoint

* Improve relay integration

* Improve relay integration

* Fix xgb error & Simplify dispatcher (apache#35)

* Rename "MetaTileRewritePolicy" to "SketchPolicy". (apache#36)

* Rename "MetaTileRewritePolicy" to "SketchPolicy".

* Add a new class for auto_unroll_max_step, storage_offset in StageNode

* fix tune_op_subgraph.py

* rebase

* Migrate all node::make to noderef's construct function (apache#37)

* Start to move xxxnode::make to noderef()

* Update

* Update

* Finish transform_step

* Finish comute dag & auto schedule

* Update

* Update

* Update

* Update

* Update

* Code refine

* Code refine

* Code refine

* Update

* Update

* Some lint fix & Recover the double constructor of tvm::PrimExpr (apache#39)

* lint fix

* clang-format-fix

* pylint fix

* Update

* Recover the double constructor of tvm::PrimExpr

* Fix pylint

* pylint fix

* pylint fix

* Add MutateComputeLocation and MutateParallel in evolutionary search (apache#40)

* Add MutateComputeLocation and MutateParallel in evolutionary search

* fix lint

* Improve loop state python API (stage_tensors -> stage_ops) (apache#41)

* improve loop state python API (stage_tensors -> stage_ops)

* fix

* ComputeDAG bug fix & Add Custom TensorCore Matmul Example (apache#42)

* Bug Fix

* Sample example of Custom TensorCore Matmul

* Rever Commits, Start to build minimum Ansor system

* Code clean for minimum Ansor system

* Bug fix & Delete AccessAnalyzer

* Delete attachmap & Code clean

* Doc update

Update statenode::stages from vector to Array

* Headfile update & Python doc update

* clang-format fix

* pylint fix

* Update

* Doc update

* Update

* Bug fix after code merge to the new master

* clang-format fix

* Update

* Update

* Update std::vector to Array; Update verbosity setting; Some commemts
addressed

* std::vector->Array & std::string->String

* Add init_state to ComputeDAG

* Update

* Update some unordered_map to Map

* clang-format fix

* Comments addressed
Delete ReplayAndInferBound
Delete ReplaySteps & InferBoundCommon

* Lint fix

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Rename ansor namespace to auto_schedule

* Update

* Rename ThreadPool to ParallelFor

* Add parallel_for

* Remove ThreadPool

* Update python/tvm/auto_schedule/auto_schedule.py

* trigger CI

Co-authored-by: Lianmin Zheng <[email protected]>
Co-authored-by: Minmin Sun (孙敏敏) <[email protected]>
Co-authored-by: Zhao Wu <[email protected]>
  • Loading branch information
4 people authored Jul 15, 2020
1 parent a23592c commit 456c58d
Show file tree
Hide file tree
Showing 35 changed files with 6,266 additions and 0 deletions.
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,7 @@ assign_source_group("Include" ${GROUP_INCLUDE})

# Source file lists
file(GLOB_RECURSE COMPILER_SRCS
src/auto_schedule/*.cc
src/node/*.cc
src/ir/*.cc
src/arith/*.cc
Expand Down
34 changes: 34 additions & 0 deletions python/tvm/auto_schedule/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# pylint: disable=unused-import, redefined-builtin
""" Namespace for TVM Auto-scheduler. """

from . import compute_dag
from . import measure
from . import measure_record
from . import loop_state
from . import utils
from . import workload_registry

# Shortcut
from .compute_dag import ComputeDAG
from .auto_schedule import SearchTask, TuningOptions, HardwareParams, \
auto_schedule, EmptyPolicy
from .measure import MeasureInput, LocalBuilder, LocalRunner
from .measure_record import RecordToFile, RecordReader, load_best, \
load_records, save_records
from .workload_registry import register_workload, make_workload_key
22 changes: 22 additions & 0 deletions python/tvm/auto_schedule/_ffi_api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

""" Register FFI APIs from C++ for the namespace tvm.auto_schedule. """
import tvm._ffi


tvm._ffi._init_api("auto_schedule", __name__)
194 changes: 194 additions & 0 deletions python/tvm/auto_schedule/auto_schedule.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

"""
User interface for TVM Auto-scheduler.
The basic schedule search process for TVM Auto-scheduler is designed to be:
`Program sampling` -> `Performance Tuning`.
In `Program sampling`, we use some predefined precise or heuristic rules to generate several
initial schedules. Based on these initial starting points, we perform `Performance Tuning` which
uses cost model based evolutionary search to select schedules with the best performance.
Candidate schedules are measured against the specific hardware target.
"""

import tvm._ffi
from tvm.runtime import Object
from .measure import LocalBuilder, LocalRunner
from . import _ffi_api


@tvm._ffi.register_object("auto_schedule.HardwareParams")
class HardwareParams(Object):
""" The parameters of target hardware used to guide the search policy
TODO(jcf94): This is considered to be merged with the new Target specification:
https://discuss.tvm.ai/t/rfc-tvm-target-specification/6844
Parameters
----------
num_cores : int
The number of device cores.
vector_unit_bytes : int
The width of vector units in bytes.
cache_line_bytes : int
The size of cache line in bytes.
"""
def __init__(self, num_cores, vector_unit_bytes, cache_line_bytes):
self.__init_handle_by_constructor__(_ffi_api.HardwareParams, num_cores,
vector_unit_bytes, cache_line_bytes)


@tvm._ffi.register_object("auto_schedule.SearchTask")
class SearchTask(Object):
""" The computation information and hardware parameters for a specific schedule search task.
Parameters
----------
dag : ComputeDAG
The ComputeDAG for the corresponding compute declaration.
workload_key : str
The workload key for the corresponding compute declaration.
target : tvm.target.Target
The target device of this search task.
target_host : Optional[tvm.target.Target]
The target host device of this search task.
hardware_params : Optional[HardwareParams]
Hardware parameters used in this search task.
"""
def __init__(self, dag, workload_key, target, target_host=None,
hardware_params=None):
self.__init_handle_by_constructor__(_ffi_api.SearchTask, dag,
workload_key, target, target_host,
hardware_params)


@tvm._ffi.register_object("auto_schedule.SearchPolicy")
class SearchPolicy(Object):
""" The base class of search policies. """


@tvm._ffi.register_object("auto_schedule.EmptyPolicy")
class EmptyPolicy(SearchPolicy):
""" This is an example empty search policy which will always generate
the init state of ComputeDAG.
"""
def __init__(self):
self.__init_handle_by_constructor__(_ffi_api.EmptyPolicy)


@tvm._ffi.register_object("auto_schedule.TuningOptions")
class TuningOptions(Object):
""" This controls the options of performance tuning.
Parameters
----------
num_measure_trials: int = 0
The number of measurement trials.
The search policy measures `num_measure_trials` schedules in total and returns the best one
among them.
With `num_measure_trials` == 0, the policy will do the schedule search but won't involve
measurement. This can be used to get a runnable schedule quickly without auto-tuning.
early_stopping: Optional[int]
Stop the tuning early if getting no improvement after n measurements.
num_measures_per_round: int = 64
The number of schedules to be measured at each search round.
The whole schedule search process will try a total number of `num_measure_trials` in several
rounds.
verbose: int = 1
Verbosity level. 0 for silent, 1 to output information during schedule search.
builder: Union[ProgramBuilder, str] = 'local'
ProgramBuilder which builds the program.
runner: Union[ProgramRunner, str] = 'local'
ProgramRunner which runs the program and measures time costs.
measure_callbacks: Optional[List[MeasureCallback]]
Callback functions called after each measurement.
Candidates:
- auto_schedule.RecordToFile
pre_search_callbacks: Optional[List[SearchCallback]]
Callback functions called before the search process.
Candidates:
- auto_schedule.PreloadMeasuredStates
- auto_schedule.PreloadCustomSketchRule
TODO(jcf94): Add these implementation in later PRs.
"""
def __init__(self, num_measure_trials=0, early_stopping=None, num_measures_per_round=64,
verbose=1, builder='local', runner='local', measure_callbacks=None,
pre_search_callbacks=None):
if isinstance(builder, str):
if builder == 'local':
builder = LocalBuilder()
else:
raise ValueError("Invalid builder: " + builder)
elif not isinstance(builder, tvm.auto_schedule.measure.ProgramBuilder):
raise ValueError("Invalid builder: " + builder +
" . TuningOptions expects a ProgramBuilder or string.")

if isinstance(runner, str):
if runner == 'local':
runner = LocalRunner()
else:
raise ValueError("Invalid runner: " + runner)
elif not isinstance(runner, tvm.auto_schedule.measure.ProgramRunner):
raise ValueError("Invalid runner: " + runner +
" . TuningOptions expects a ProgramRunner or string.")

self.__init_handle_by_constructor__(
_ffi_api.TuningOptions, num_measure_trials, early_stopping if early_stopping else -1,
num_measures_per_round, verbose, builder, runner, measure_callbacks,
pre_search_callbacks)


def auto_schedule(task, search_policy='default', tuning_options=None):
""" Do auto scheduling for a computation declaration.
The task parameter can be a `string` as workload_key, or directly
passing a `SearchTask` as input.
Parameters
----------
task : SearchTask
The SearchTask for the computation declaration.
search_policy : Union[SearchPolicy, str] = 'default'
The search policy to be used for schedule search.
tuning_options : Optional[TuningOptions]
Tuning and measurement options.
Returns
-------
A `te.schedule` and the a list of `te.Tensor` to be used in `tvm.lower` or `tvm.build`.
"""
if not isinstance(task, SearchTask):
raise ValueError("Invalid task: " + task +
" . `auto_schedule.auto_schedule` expects a SearchTask.")

if isinstance(search_policy, str):
if search_policy == 'default':
# TODO(jcf94): This is an example policy for minimum system, will be upgrated to
# formal search policy later.
search_policy = EmptyPolicy()
else:
raise ValueError("Invalid search policy: " + search_policy)
elif not isinstance(search_policy, SearchPolicy):
raise ValueError("Invalid search policy: " + search_policy +
" . `auto_schedule.auto_schedule` expects a SearchPolicy or a string.")

sch, tensors = _ffi_api.AutoSchedule(task, search_policy,
tuning_options if tuning_options else TuningOptions())
return sch, tensors
Loading

0 comments on commit 456c58d

Please sign in to comment.