New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add Adreno GPU target and topi supporting textures with dynamically allocated textures #11161

Merged

csullivan merged 9 commits into apache:main from Deelvin:scout/adreno

May 13, 2022

Contributor

elvin-n commented Apr 28, 2022 •

edited

Loading

There are 5 compute/schedules: conv2d for NCHW/NHWC, depthwise_conv2d
for NCHW/NHWC, average/max pooling
Fix of dynamically allocated textures caching
Add texture-nhwc scope
Fix issue with codegen of vars having non acceptable symbols

elvin-n force-pushed the scout/adreno branch 2 times, most recently from ddfa320 to fb29643 Compare

May 2, 2022 16:24

Contributor Author

elvin-n commented May 2, 2022

@csullivan Could you please take a look?

csullivan self-requested a review

May 2, 2022 20:36

csullivan requested changes

View reviewed changes

Contributor

csullivan left a comment

Looks great @elvin-n! I've reviewed everything except the schedules which I will do in a follow up pass.

Note as this is a squash I would suggest use of Co-authored-by in the commit to reflect the co-authorship.

src/runtime/opencl/opencl_common.h Outdated

@@ @@ -345,6 +345,7 @@ struct BufferDescriptor { @@
                    *         e.g. image2d[height=O, width=IHW]
                    */
                   kImage2DWeight,
+                  kTexture2DNHWC,

Contributor

csullivan May 2, 2022 •

edited

Loading

Note: We can now support arbitrary layouts with transform_layout which I will suggest we move to. It will require some rework on the TIR lowering. I don't suggest this block these schedules from being upstreamed now, but we should circle back on this soon.

Contributor Author

elvin-n May 3, 2022

Should we add any AR/TODO into the code?

Contributor

csullivan May 3, 2022

I like that idea. Something like,

TODO(tvm-team): Uncouple use of storage scope and data layout by using the transform_layout schedule primitive to express the desired texture layout. This will require supporting Nd indices in BufferLoad and BufferStore in CodegenOpenCL, and ensuring Nd allocations for texture are correctly routed to the AllocateTexture packed function in the OpenCL DeviceAPI.

Contributor Author

elvin-n May 5, 2022

Done

python/tvm/relay/op/strategy/adreno.py Outdated

+                      elif data_layout == "NHWC4c":
+                          ic = data.shape[3] * data.shape[4]
+                      else:
+                          # TODO(amalyshe) add proper error raising

Contributor

csullivan May 2, 2022

Address the TODOs

Contributor Author

elvin-n May 5, 2022

Done

python/tvm/topi/adreno/conv2d_alter_op.py Outdated

+              # specific language governing permissions and limitations
+              # under the License.
+              # pylint: disable=invalid-name,unused-variable,unused-argument,no-member
+              """Conv2D alter op and legalize functions for x86"""

Contributor

csullivan May 2, 2022

Not x86

Contributor Author

elvin-n May 4, 2022

done

python/tvm/topi/adreno/utils.py Outdated

		from ..utils import get_const_tuple


		def getDiv(value, start):

Contributor

csullivan May 2, 2022

snake_case to match the rest of the file

Contributor Author

elvin-n May 4, 2022

done

python/tvm/topi/adreno/utils.py Outdated

+                  ----------
+                  out: tuple of the (chunks, block, tail)
+                  """
+                  tail = trip_count % 4

Contributor

csullivan May 2, 2022

Use block throughout

Contributor Author

elvin-n May 4, 2022

done

python/tvm/topi/adreno/utils.py Outdated

+                  in_channel_tail: int
+                      Tail in the latest chunk diffing original number of channels vs blocked one
+                      If in_channel_tail != in_channel_block:
+                        original_channels = in_channel_chunks * in_channel_block - in_channel_tail

Contributor

csullivan May 2, 2022

nit: consider referring to this as padding_tail so that it's clear this isn't the remainder of a floordiv. anything to make this a little more clear upfront, took me a bit to understand given the current naming convention. Same comment for filter api below.

Contributor Author

elvin-n May 4, 2022

tried to do my best

python/tvm/topi/adreno/utils.py Outdated

Comment on lines 110 to 133

+                  def _reorder_data_nchw(*indices):
+                      condition = []
+                      condition.append(indices[1] == in_channel_chunks - 1)
+                      condition.append(indices[4] >= in_channel_tail)
+                      condition = tvm.tir.all(*condition)
+                      return tvm.tir.if_then_else(
+                          condition,
+                          pad_value,
+                          Input[indices[0], indices[1] * in_channel_block + indices[4], indices[2], indices[3]],
+                      )
+                  def _reorder_data_nhwc(*indices):
+                      condition = []
+                      condition.append(indices[3] == in_channel_chunks - 1)
+                      condition.append(indices[4] >= in_channel_tail)
+                      condition = tvm.tir.all(*condition)
+                      return tvm.tir.if_then_else(
+                          condition,
+                          pad_value,
+                          Input[indices[0], indices[1], indices[2], indices[3] * in_channel_block + indices[4]],
+                      )

Contributor

csullivan May 2, 2022

Note: Explicit buffer layout padding as part of transform_layout is on the roadmap and will appear in RFC soon. Putting a note here to note that explicit layout transformations like this should be unnecessary in the future.

Contributor Author

elvin-n May 4, 2022

added comment and reference to rfc

python/tvm/topi/adreno/utils.py

+                  in_height, in_width, kernel_h, kernel_w, dilation_h, dilation_w, padding, stride_h, stride_w
+              ):
+                  """
+                  Expands spatial dimensions to be dividable by factor 4. This will allow us to do extrimely

Contributor

csullivan May 2, 2022

Typos

Suggested change

      
                Expands spatial dimensions to be dividable by factor 4. This will allow us to do extrimely
          
                Expands spatial dimensions to be dividable by factor 4. This will allow us

Contributor Author

elvin-n May 4, 2022

could you please point where typos are?

python/tvm/topi/adreno/utils.py Outdated

+                      Height of the feature map
+                  in_width: int
+                      Width of the featrue map

Contributor

csullivan May 2, 2022

Suggested change

      
                    Width of the featrue map
          
                    Width of the feature map

Contributor Author

elvin-n May 4, 2022

done

python/tvm/topi/adreno/utils.py Outdated

+                  # certain limitation of the Qualcomm devices. Subject to be determined for certain device
+                  # individually, but until we have access to remote device during compilation, we have to
+                  # define it uniformly for all target devices
+                  limit = 16384

Contributor

csullivan May 2, 2022

Let us use the Target attributes for this, and specifically use the attribute preprocessor as is done for cuda here. Add image extent to the attribute list for the device api and use it when calling DetectDeviceFlag to query the size limits of the opencl image on the remote device.

Contributor Author

elvin-n May 5, 2022

I added new texture_spatial_limit attribute to opencl target, added to the DeviceAttrKind and runtime_ctypes in python, but not sure if it was required since I don;t know how and when to use DetectDeviceFlag as well I have an access to the texture_spatial_limit in the python part through tvm.target.Target.current().attrs["texture_spatial_limit"]
I would consider this as "addressed" but need to understand if my solution is applicable and if we need parts related to DeviceAttrKind


          Add Adreno GPU target and topi supporting textures

e1580ec

- There are 5 compute/schedules: conv2d for NCHW/NHWC, depthwise_conv2d
  for NCHW/NHWC, average pooling
- Fix of dynamically allocated textures caching
- Add texture-nhwc scope
- Fix issue with codegen of vars having non acceptable symbols

Co-authored-by: Chris Sullivan <[email protected]>
Co-authored-by: Egor Churaev <[email protected]>

elvin-n force-pushed the scout/adreno branch from 7923f71 to e1580ec Compare

May 4, 2022 21:18

Contributor Author

elvin-n commented May 4, 2022

Note as this is a squash I would suggest use of Co-authored-by in the commit to reflect the co-authorship.

Done

elvin-n and others added 2 commits

May 5, 2022 14:07


          Address comments

31368e4


          Add vectorization into some adreno pool flow

21e85d0

Co-authored-by: Li <[email protected]>

elvin-n force-pushed the scout/adreno branch from 8ca5959 to 21e85d0 Compare

May 6, 2022 07:37

elvin-n added 2 commits

May 7, 2022 10:21


          Merge branch 'main' into scout/adreno

842cf49


          Merge branch 'main' into scout/adreno

86429e6

csullivan reviewed

View reviewed changes

python/tvm/topi/adreno/conv2d_nhwc.py


		pad_data, kernel = s[conv].op.input_tensors

		s[pad_data].compute_inline()

Contributor

csullivan May 11, 2022

Are you meaning to inline padding here? Your comment above implies that you intend to do otherwise.

Contributor Author

elvin-n May 12, 2022

It is inlined into next stage - cache read for textures

    AT = s.cache_read(pad_data, "global.texture", [conv])
    bind_data_copy(s[AT])

If I do not add s[pad_data].compute_inline() the schedule would not be complete and would claim about missing of some bindings

csullivan reviewed

View reviewed changes

tests/python/relay/test_conv2d_nhwc_texture.py Outdated

		from tvm.contrib import graph_runtime


		def get_reference(mod, params1, input_shape, inputs):

Contributor

csullivan May 11, 2022

Common utility shared in other test files, consider adding to the utils subdir.

Contributor Author

elvin-n May 12, 2022

moved shared functions into utils/adreno_utils.py

csullivan reviewed

View reviewed changes

tests/python/relay/test_conv2d_nchw_texture.py Outdated



		# build module run with opencl and cpu, compare results
		def build_run_compare(

Contributor

csullivan May 11, 2022

Common utility

Contributor Author

elvin-n May 12, 2022

moved shared functions into utils/adreno_utils.py

csullivan reviewed

View reviewed changes

tests/python/relay/test_conv2d_nchw_texture.py



		@tvm.testing.requires_opencl
		def test_conv2d_yolov3_v2_nchw_3c():

Contributor

csullivan May 11, 2022

Do these tests pass on a local opencl device (e.g. with an nvgpu?). If not, it would be good to skip the tests that depend on the RPC tracker env vars if they are not set if they require a remote device.

Contributor Author

elvin-n May 12, 2022

I have not verified with nvidia gpu, but they pass successfully on intel integrated graphics and enabled opencl in the platform and tvm. I need to verify if tests run in the CI, but cannot do this due to issues with GPU build in CI

Contributor Author

elvin-n May 13, 2022

@csullivan Looked into CI test results and got an impression that all opencl tests are disabled. It seems we need to enable them in CI but in separate PR

Contributor

csullivan May 13, 2022

That's accurate, and I agree we can consider enabling them in CI in a separate PR. If you see that these tests pass when running locally and without and RPC tracker that is sufficient.

csullivan reviewed

View reviewed changes

src/target/target_kind.cc

@@ @@ -324,6 +324,7 @@ TVM_REGISTER_TARGET_KIND("opencl", kDLOpenCL) @@
                   .add_attr_option<Bool>("system-lib")
                   .add_attr_option<Integer>("max_num_threads", Integer(256))
                   .add_attr_option<Integer>("thread_warp_size", Integer(1))
+                  .add_attr_option<Integer>("texture_spatial_limit", Integer(16384))

Contributor

csullivan May 11, 2022

Thanks for adding this. An improvement would be to query the remote device using a call to the device api GetAttr using the target attr preprocessor.

Contributor Author

elvin-n May 12, 2022

I still do not fully understand the usage model. I left for a while only definition of texture_spatial_limit in opencl target and access in python because adding of kTextureSpatialLimit in DeviceAttrKind caused a fail during compilation of cuda and as I do not fully understand usage model, don't know how to fix this properly. If I need to extend cuda as well for this constant or just ignore and if ignore in which place kTextureSpatialLimit should be used

csullivan approved these changes

View reviewed changes

Contributor

csullivan left a comment

LGTM with a few final nits

elvin-n added 4 commits

May 12, 2022 11:49


          Fix adreno tests for running on the opencl host platform

8445fbe


          remove unnecessary kDriverVersion in DeviceAttrKind


          Move utils adreno functinos to separate shared file

fd99b70


          fix black hits

780b2cb

csullivan merged commit c2d1905 into apache:main

Contributor

csullivan commented May 13, 2022

Many thanks for the great work @elvin-n, @echuraev, @lhez. This is merged.

csullivan mentioned this pull request

[Texture support][Part 2] Add opencl adreno target, topi schedules, and relay op strategies #7687

Closed

mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request


          Add Adreno GPU target and topi supporting textures with dynamically a…

2d28b08

…llocated textures (apache#11161)

* Add Adreno GPU target and topi supporting textures

- There are 5 compute/schedules: conv2d for NCHW/NHWC, depthwise_conv2d
  for NCHW/NHWC, average pooling
- Fix of dynamically allocated textures caching
- Add texture-nhwc scope
- Fix issue with codegen of vars having non acceptable symbols

Co-authored-by: Chris Sullivan <[email protected]>
Co-authored-by: Egor Churaev <[email protected]>

* Address comments

* Add vectorization into some adreno pool flow

Co-authored-by: Li <[email protected]>

* Fix adreno tests for running on the opencl host platform

* remove unnecessary kDriverVersion in DeviceAttrKind

* Move utils adreno functinos to separate shared file

* fix black hits

Co-authored-by: Chris Sullivan <[email protected]>
Co-authored-by: Egor Churaev <[email protected]>
Co-authored-by: Li <[email protected]>

shtinsa pushed a commit to Deelvin/tvm that referenced this pull request


          Add Adreno GPU target and topi supporting textures with dynamically a…

bc6eef2

…llocated textures (apache#11161)

* Add Adreno GPU target and topi supporting textures

- There are 5 compute/schedules: conv2d for NCHW/NHWC, depthwise_conv2d
  for NCHW/NHWC, average pooling
- Fix of dynamically allocated textures caching
- Add texture-nhwc scope
- Fix issue with codegen of vars having non acceptable symbols

Co-authored-by: Chris Sullivan <[email protected]>
Co-authored-by: Egor Churaev <[email protected]>

* Address comments

* Add vectorization into some adreno pool flow

Co-authored-by: Li <[email protected]>

* Fix adreno tests for running on the opencl host platform

* remove unnecessary kDriverVersion in DeviceAttrKind

* Move utils adreno functinos to separate shared file

* fix black hits

Co-authored-by: Chris Sullivan <[email protected]>
Co-authored-by: Egor Churaev <[email protected]>
Co-authored-by: Li <[email protected]>

shingjan pushed a commit to shingjan/tvm that referenced this pull request


          Add Adreno GPU target and topi supporting textures with dynamically a…

e9dd9c3

…llocated textures (apache#11161)

* Add Adreno GPU target and topi supporting textures

- There are 5 compute/schedules: conv2d for NCHW/NHWC, depthwise_conv2d
  for NCHW/NHWC, average pooling
- Fix of dynamically allocated textures caching
- Add texture-nhwc scope
- Fix issue with codegen of vars having non acceptable symbols

Co-authored-by: Chris Sullivan <[email protected]>
Co-authored-by: Egor Churaev <[email protected]>

* Address comments

* Add vectorization into some adreno pool flow

Co-authored-by: Li <[email protected]>

* Fix adreno tests for running on the opencl host platform

* remove unnecessary kDriverVersion in DeviceAttrKind

* Move utils adreno functinos to separate shared file

* fix black hits

Co-authored-by: Chris Sullivan <[email protected]>
Co-authored-by: Egor Churaev <[email protected]>
Co-authored-by: Li <[email protected]>

argrento mentioned this pull request

[WIP] [OpenCL] Enable OpenCL for GPU tasks #11408

Closed

driazati mentioned this pull request

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet