Add triton server #3

yuekaizhang · 2022-05-25T03:49:33Z

I added this triton server and client for offline ASR. Moslty following wenet triton server https://github.com/wenet-e2e/wenet/tree/main/runtime/server/x86_gpu

TODO (Maybe after this PR?)

Benchmark test
Readme Improvement

csukuangfj · 2022-05-25T03:53:43Z

Thanks! I am looking at it.

yuekaizhang · 2022-05-25T04:24:06Z

triton/scripts/conformer_triton.py

+)
+from torch import Tensor, nn
+
+#from icefall.utils import make_pad_mask


This conformer_triton.py is mainly for remove the k2 dependency. Currently, only support greedy_search_batch, so I didn't install k2. It could be removed after dockerfile improvement with k2, lhoste install.

yuekaizhang · 2022-05-25T04:29:40Z

triton/scripts/export_jit.py

+                bias=False,
+            )
+
+    def forward(self, y: torch.Tensor) -> torch.Tensor:


remove need_pad parameters, since it always false when infer, so we don't need to prepare input_need_pad for triton.

yuekaizhang · 2022-05-25T04:31:01Z

triton/scripts/export_jit.py

+        self.decoder_proj = ScaledLinear(decoder_dim, joiner_dim)
+        self.output_linear = ScaledLinear(joiner_dim, vocab_size)
+
+    def forward(


remove projected_input here, I didn't have ForwardEncProj and DecProj outside joiner. Otherwise, it would add two more submodules under triton and make things complicated.

yuekaizhang · 2022-05-25T04:32:58Z

triton/scripts/export_jit.py

+
+        return logit
+
+class AttributeDict(dict):


These duplicate functions below also for avoid k2 dependency. When we add more decoding algorithms like fast_beam_search we would not need these duplicate functions.

yuekaizhang · 2022-05-25T04:33:52Z

triton/model_repo/greedy_search/1/model.py

+        self.vocab_size = sp.get_piece_size()
+        self.sp = sp
+
+    def execute(self, requests):


This is the start of greedy_beam_sarch_batch.

csukuangfj · 2022-05-27T08:06:07Z

triton/model_repo/feature_extractor/1/model.py

+            value = value["string_value"]
+            if key == "num_mel_bins":
+                opts.mel_opts.num_bins = int(value)
+            # elif key == "frame_shift_in_ms":


I would suggest either removing frame_shift and frame_length from the config file, or reading them if the config file provides them.

Sure, I would uncomment them.

csukuangfj · 2022-05-27T08:56:35Z

triton/model_repo/feature_extractor/1/model.py

@@ -0,0 +1,144 @@
+import triton_python_backend_utils as pb_utils


Do you need to install https://github.com/triton-inference-server/python_backend?
Its readme.md says you have to install it in the docker image.

Correct me if I am wrong.

I think we don't need to install it. Its readme looks like install a python_backend example. Just tritonserver official docker could offer all triton ensentials we need.

csukuangfj

Thanks! Leave some minor comments.

csukuangfj · 2022-05-27T12:26:28Z

triton/model_repo/feature_extractor/1/model.py

+                speech[i, 0: f_l, :] = f.to(self.output0_dtype)
+                speech_lengths[i][0] = f_l
+            # put speech feature on device will cause empty output
+            # we will follow this issue and now temporarily put it on cpu


What does this mean?

Sorry, it was triton issue. I have deleted them and tested.

csukuangfj · 2022-05-27T12:30:01Z

triton/model_repo/greedy_search/config.pbtxt

+  },
+  {
+    key: "bpe_model",
+    value: { string_value: "/ws/yuekaiz/icefall-asr-librispeech-pruned-transducer-stateless3-2022-04-29/data/lang_bpe_500/bpe.model"}


Is this path available in the docker container?

Sorry, would update it.

csukuangfj · 2022-05-27T12:30:59Z

triton/model_repo/greedy_search/config.pbtxt

+  {
+    name: "encoder_out__0"
+    data_type: TYPE_FP32
+    dims: [-1, 512] # [-1, feature_size]


Suggested change

dims: [-1, 512] # [-1, feature_size]

dims: [-1, 512] # [-1, encoder_out_dim]

csukuangfj · 2022-05-27T12:47:09Z

triton/client/utils.py

@@ -0,0 +1,59 @@
+import numpy as np


This file has no copyright info and it is not clear where it is from.

Could you use https://github.com/pzelasko/kaldialign
to calculate the WER instead of reinventing the wheel.

We are alreadying using it in icefall.

csukuangfj · 2022-05-27T12:51:04Z

triton/client/client.py

@@ -0,0 +1,127 @@
+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.


Comment for the file triton/client/1089-134686-0001.json

Where is it used? Can we remove it?

It's generated by "python3 generate_perf_input.py test_wavs/1089-134686-0001.wav <input.json>", which is used by perf_anyalzer. I would remove it and update the reademe.md.

yuekaizhang · 2022-05-29T13:21:39Z

@csukuangfj Thanks for your review. I have commited changes according to your comments, would you mind checking them?

csukuangfj · 2022-05-29T13:39:00Z

Thanks! Looks good to me. Is it ready to merge?

yuekaizhang · 2022-05-29T14:19:21Z

Thanks! Looks good to me. Is it ready to merge?

I think so. I would like add benchmark results in next PR with features like fast beam search algorithms. So we could also have a comparison of WER, throughput. Also, I would update dockerfile to support k2 for fastbeam search.

Add triton offline ASR server

fa0bbbf

yuekaizhang commented May 25, 2022

View reviewed changes

csukuangfj reviewed May 27, 2022

View reviewed changes

Dockerfile update, replace wer, hardcode path change etc.

c58e147

csukuangfj merged commit 259d2b9 into k2-fsa:master May 29, 2022

yuekaizhang deleted the offline_triton branch May 30, 2022 00:24

uni-manjunath-ke mentioned this pull request Mar 13, 2023

Sherpa support for Nemo ctc models via torchscript #303

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add triton server #3

Add triton server #3

yuekaizhang commented May 25, 2022

csukuangfj commented May 25, 2022

yuekaizhang May 25, 2022

yuekaizhang May 25, 2022

yuekaizhang May 25, 2022

yuekaizhang May 25, 2022

yuekaizhang May 25, 2022

csukuangfj May 27, 2022

yuekaizhang May 29, 2022

csukuangfj May 27, 2022

yuekaizhang May 29, 2022

csukuangfj left a comment

csukuangfj May 27, 2022

yuekaizhang May 29, 2022

csukuangfj May 27, 2022

yuekaizhang May 29, 2022

csukuangfj May 27, 2022

yuekaizhang May 29, 2022

csukuangfj May 27, 2022

yuekaizhang May 29, 2022

csukuangfj May 27, 2022

yuekaizhang May 29, 2022

yuekaizhang commented May 29, 2022

csukuangfj commented May 29, 2022

yuekaizhang commented May 29, 2022

		@@ -0,0 +1,144 @@
		import triton_python_backend_utils as pb_utils

	dims: [-1, 512] # [-1, feature_size]
	dims: [-1, 512] # [-1, encoder_out_dim]

		@@ -0,0 +1,127 @@
		# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.

Add triton server #3

Add triton server #3

Conversation

yuekaizhang commented May 25, 2022

csukuangfj commented May 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csukuangfj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuekaizhang commented May 29, 2022

csukuangfj commented May 29, 2022

yuekaizhang commented May 29, 2022