Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during build of Horizon BPU (cross-compilation) runtime #2639

Open
kanpapa opened this issue Oct 16, 2024 · 13 comments
Open

Error during build of Horizon BPU (cross-compilation) runtime #2639

kanpapa opened this issue Oct 16, 2024 · 13 comments
Labels
bug Something isn't working

Comments

@kanpapa
Copy link

kanpapa commented Oct 16, 2024

Describe the bug
Error during build of Horizon BPU (cross-compilation) runtime.

To Reproduce
The build procedure follows https://github.com/wenet-e2e/wenet/blob/main/runtime/horizonbpu/README.md.

Steps to reproduce the behavior:

  1. In Step 1 to install Horizon packages, the following command was executed. (No error occurred in the procedure up to this step)
pip install wheels/* -i https://mirrors.aliyun.com/pypi/simple
  1. The following error message was displayed.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
onnx 1.17.0 requires protobuf>=3.20.2, but you have protobuf 3.19.4 which is incompatible.
tensorboard 2.14.0 requires protobuf>=3.19.6, but you have protobuf 3.19.4 which is incompatible.
tensorboardx 2.6.2.2 requires protobuf>=3.20, but you have protobuf 3.19.4 which is incompatible.
tiktoken 0.7.0 requires requests>=2.26.0, but you have requests 2.22.0 which is incompatible.
  1. To resolve the version conflict, I ran the following command
pip install protobuf==3.20.2
  1. The following new error occurs.
horizon-tc-ui 1.11.2 requires protobuf<=3.19.4,>=3.8.0, but you have protobuf 3.20.2 which is incompatible.
  1. It is an error in Step 1, but I built decoder_main in Step 2. The following command was executed.
cmake -B build -DBPU=ON -DONNX=OFF -DTORCH=OFF -DWEBSOCKET=OFF -DGRPC=OFF -DCMAKE_TOOLCHAIN_FILE=toolchains/aarch64-linux-gnu.toolchain.cmake
cmake --build build
  1. The following error message was displayed and the process was stopped.
[ 63%] Building CXX object post_processor/CMakeFiles/post_processor.dir/post_processor.cc.o
In file included from /home/ocha/wenet/runtime/horizonbpu/post_processor/post_processor.cc:16:
/home/ocha/wenet/runtime/horizonbpu/post_processor/post_processor.h:22:10: fatal error: processor/wetext_processor.h: No such file or directory
   22 | #include "processor/wetext_processor.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
gmake[2]: *** [post_processor/CMakeFiles/post_processor.dir/build.make:76: post_processor/CMakeFiles/post_processor.dir/post_processor.cc.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:1906: post_processor/CMakeFiles/post_processor.dir/all] Error 2
gmake: *** [Makefile:156: all] Error 2

I would like your advice on how to deal with this problem.

Expected behavior
The build should complete without problems.

Screenshots
none.

Desktop (please complete the following information):

  • OS: Ubuntu 22.04 LTS
  • Conda: 24.7.1 (latest miniconda)

Additional context
none.

@cdliang11
Copy link
Collaborator

May need a lower version of onnx:

protobuf                 3.19.4
onnx                     1.12.0

The problem with TensorBoard can be ignored

@kanpapa
Copy link
Author

kanpapa commented Oct 16, 2024

Thanks, the protobuf and onnx versions are fixed.

The error with cmake build seems to be similar to #2032, but the situation is different.

@kanpapa kanpapa changed the title Error during build of Horizon BPU (cross-compilation) Error during build of Horizon BPU (cross-compilation) runtime Oct 16, 2024
@cdliang11
Copy link
Collaborator

Add include(wetextprocessing) to horizonbpu/CMakeLists.txt

@kanpapa
Copy link
Author

kanpapa commented Oct 16, 2024

I made the following changes and Step 2 was completed successfully. Thanks,

(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ git diff CMakeLists.txt
diff --git a/runtime/horizonbpu/CMakeLists.txt b/runtime/horizonbpu/CMakeLists.txt
index 9e179006..3d3ff629 100644
--- a/runtime/horizonbpu/CMakeLists.txt
+++ b/runtime/horizonbpu/CMakeLists.txt
@@ -37,6 +37,8 @@ include_directories(
   ${CMAKE_CURRENT_SOURCE_DIR}/kaldi
 )
 
+include(wetextprocessing)
+
 # Build all libraries
 add_subdirectory(utils)
 add_subdirectory(frontend)
(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ 

@kanpapa
Copy link
Author

kanpapa commented Oct 16, 2024

The following error occurred in step 3.

(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ tar -xzf model_subsample8_parameter110M.tar.gz
(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ python3 $WENET_DIR/tools/onnx2horizonbin.py \
  --config ./model_subsample8_parameter110M/train.yaml \
  --checkpoint ./model_subsample8_parameter110M/final.pt \
  --output_dir ./model_subsample8_parameter110M/sample50_chunk8_leftchunk16 \
  --chunk_size 8 \
  --num_decoding_left_chunks 16 \
  --max_samples 50 \
  --dict ./model_subsample8_parameter110M/units.txt \
  --cali_datalist ./model_subsample8_parameter110M/calibration_data/data.list
Traceback (most recent call last):
  File "/home/ocha/wenet/runtime/horizonbpu/../..//tools/onnx2horizonbin.py", line 49, in <module>
    from wenet.utils.common import remove_duplicates_and_blank
ImportError: cannot import name 'remove_duplicates_and_blank' from 'wenet.utils.common' (/home/ocha/wenet/wenet/utils/common.py)

I will investigate.

@cdliang11
Copy link
Collaborator

The following error occurred in step 3.

(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ tar -xzf model_subsample8_parameter110M.tar.gz
(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ python3 $WENET_DIR/tools/onnx2horizonbin.py \
  --config ./model_subsample8_parameter110M/train.yaml \
  --checkpoint ./model_subsample8_parameter110M/final.pt \
  --output_dir ./model_subsample8_parameter110M/sample50_chunk8_leftchunk16 \
  --chunk_size 8 \
  --num_decoding_left_chunks 16 \
  --max_samples 50 \
  --dict ./model_subsample8_parameter110M/units.txt \
  --cali_datalist ./model_subsample8_parameter110M/calibration_data/data.list
Traceback (most recent call last):
  File "/home/ocha/wenet/runtime/horizonbpu/../..//tools/onnx2horizonbin.py", line 49, in <module>
    from wenet.utils.common import remove_duplicates_and_blank
ImportError: cannot import name 'remove_duplicates_and_blank' from 'wenet.utils.common' (/home/ocha/wenet/wenet/utils/common.py)

I will investigate.

fix: wenet.utils.common import remove_duplicates_and_blank --> from wenet.utils.ctc_utils import remove_duplicates_and_blank

@cdliang11 cdliang11 added the bug Something isn't working label Oct 16, 2024
@kanpapa
Copy link
Author

kanpapa commented Oct 16, 2024

The following fixes have resolved this issue.

diff --git a/tools/onnx2horizonbin.py b/tools/onnx2horizonbin.py
index 96bc4061..0d9b7272 100755
--- a/tools/onnx2horizonbin.py
+++ b/tools/onnx2horizonbin.py
@@ -46,7 +46,8 @@ import numpy as np
 
 from torch.utils.data import DataLoader
 
-from wenet.utils.common import remove_duplicates_and_blank
+#from wenet.utils.common import remove_duplicates_and_blank
+from wenet.utils.ctc_utils import remove_duplicates_and_blank
 from wenet.dataset.dataset import Dataset
 from wenet.utils.checkpoint import load_checkpoint
 from wenet.utils.init_model import init_model

However, when I ran it again, I got the following error message.

(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ python3 $WENET_DIR/tools/onnx2horizonbin.py \
  --config ./model_subsample8_parameter110M/train.yaml \
  --checkpoint ./model_subsample8_parameter110M/final.pt \
  --output_dir ./model_subsample8_parameter110M/sample50_chunk8_leftchunk16 \
  --chunk_size 8 \
  --num_decoding_left_chunks 16 \
  --max_samples 50 \
  --dict ./model_subsample8_parameter110M/units.txt \
  --cali_datalist ./model_subsample8_parameter110M/calibration_data/data.list
Traceback (most recent call last):
  File "/home/ocha/wenet/runtime/horizonbpu/../..//tools/onnx2horizonbin.py", line 51, in <module>
    from wenet.dataset.dataset import Dataset
  File "/home/ocha/wenet/wenet/dataset/dataset.py", line 20, in <module>
    from wenet.dataset.datapipes import (WenetRawDatasetSource,
  File "/home/ocha/wenet/wenet/dataset/datapipes.py", line 27, in <module>
    from torch.utils.data.datapipes.iter.sharding import (
ModuleNotFoundError: No module named 'torch.utils.data.datapipes.iter.sharding'

Is the module not present because pytorch is out of date?
I am investigating this as well.

@kanpapa
Copy link
Author

kanpapa commented Oct 17, 2024

I checked the status of changes in pytorch and wenet.

In Wenet, the torch.utils.data.datapipes.iter.sharding module was added to datapipes.py in fix #2316.

pytorch added the torch.utils.data.datapipes.iter.sharding module in a recent refactoring.
pytorch/pytorch#94095

As a result, this module is not present in pytorch 1.13.0, which is targeted by the Horizon BPU runtime.

@kanpapa
Copy link
Author

kanpapa commented Oct 17, 2024

I tried it with the release tag v2.2.0 source when horizonbpu was first supported by WENET.

git clone -b v2.2.0 https://github.com/wenet-e2e/wenet.git

The version of onnx was specified as 1.12.0.

pip install torch==1.13.0 torchaudio==0.13.0 torchvision==0.14.0 onnx==1.12.0 onnxruntime -i https://mirrors.aliyun.com/pypi/simple

The versions of the package are as follows

# Name                    Version                   Build  Channel
protobuf                  3.19.4                   pypi_0    pypi
onnx                      1.12.0                   pypi_0    pypi
onnxruntime               1.19.2                   pypi_0    pypi
torch                     1.13.0                   pypi_0    pypi
torchaudio                0.13.0                   pypi_0    pypi
torchvision               0.14.0                   pypi_0    pypi

There was no problem until Step 2, but the following error occurred in Step 3.

(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ python3 $WENET_DIR/tools/onnx2horizonbin.py \
  --config ./model_subsample8_parameter110M/train.yaml \
  --checkpoint ./model_subsample8_parameter110M/final.pt \
  --output_dir ./model_subsample8_parameter110M/sample50_chunk8_leftchunk16 \
  --chunk_size 8 \
  --num_decoding_left_chunks 16 \
  --max_samples 50 \
  --dict ./model_subsample8_parameter110M/units.txt \
  --cali_datalist ./model_subsample8_parameter110M/calibration_data/data.list
Traceback (most recent call last):
  File "/home/ocha/wenet/runtime/horizonbpu/../..//tools/onnx2horizonbin.py", line 53, in <module>
    from wenet.utils.init_model import init_model
  File "/home/ocha/wenet/wenet/utils/init_model.py", line 16, in <module>
    from wenet.transducer.joint import TransducerJoint
  File "/home/ocha/wenet/wenet/transducer/joint.py", line 5, in <module>
    from typeguard import check_argument_types
ImportError: cannot import name 'check_argument_types' from 'typeguard' (/home/ocha/miniconda3/envs/horizonbpu/lib/python3.8/site-packages/typeguard/__init__.py)

check_argument_types is a function introduced in the 2.x series of typeguard. However, this function was later deprecated and removed in version 3.0.0 and later.
Therefore, typeguard was downgraded.

pip install typeguard==2.13.3

I tried running it again with this condition, but it resulted in an error.

(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ python3 $WENET_DIR/tools/onnx2horizonbin.py \
  --config ./model_subsample8_parameter110M/train.yaml \
  --checkpoint ./model_subsample8_parameter110M/final.pt \
  --output_dir ./model_subsample8_parameter110M/sample50_chunk8_leftchunk16 \
  --chunk_size 8 \
  --num_decoding_left_chunks 16 \
  --max_samples 50 \
  --dict ./model_subsample8_parameter110M/units.txt \
  --cali_datalist ./model_subsample8_parameter110M/calibration_data/data.list
Failed to import k2 and icefall.         Notice that they are necessary for hlg_onebest and hlg_rescore
Please install onnx and onnxruntime!

I have onnx and onnxruntime installed.
It is possible that k2 and icefall are not installed correctly.

@xingchensong
Copy link
Member

git reset to this PR #1597, and try again

@kanpapa
Copy link
Author

kanpapa commented Nov 9, 2024

I tried again.
First I did a git reset to PR #1597.

git clone https://github.com/wenet-e2e/wenet.git
cd wenet
git fetch origin pull/1597/head:pr-1597
git checkout pr-1597
git reset --hard pr-1597

Step 1 and Step 2 were completed without problems.

However, the same error occurred in step 3.

(horizonbpu2) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ tar -xzf model_subsample8_parameter110M.tar.gz
(horizonbpu2) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ python3 $WENET_DIR/tools/onnx2horizonbin.py \
  --config ./model_subsample8_parameter110M/train.yaml \
  --checkpoint ./model_subsample8_parameter110M/final.pt \
  --output_dir ./model_subsample8_parameter110M/sample50_chunk8_leftchunk16 \
  --chunk_size 8 \
  --num_decoding_left_chunks 16 \
  --max_samples 50 \
  --dict ./model_subsample8_parameter110M/units.txt \
  --cali_datalist ./model_subsample8_parameter110M/calibration_data/data.list
Traceback (most recent call last):
  File "/home/ocha/wenet/runtime/horizonbpu/../..//tools/onnx2horizonbin.py", line 53, in <module>
    from wenet.utils.init_model import init_model
  File "/home/ocha/wenet/wenet/utils/init_model.py", line 16, in <module>
    from wenet.transducer.joint import TransducerJoint
  File "/home/ocha/wenet/wenet/transducer/joint.py", line 5, in <module>
    from typeguard import check_argument_types
ImportError: cannot import name 'check_argument_types' from 'typeguard' (/home/ocha/miniconda3/envs/horizonbpu2/lib/python3.8/site-packages/typeguard/__init__.py)
(horizonbpu2) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ 

There was no improvement for Step 3.

@xingchensong
Copy link
Member

well there might be env conflict ,i have no idea

@cdliang11
Copy link
Collaborator

I tried again. First I did a git reset to PR #1597.

git clone https://github.com/wenet-e2e/wenet.git
cd wenet
git fetch origin pull/1597/head:pr-1597
git checkout pr-1597
git reset --hard pr-1597

Step 1 and Step 2 were completed without problems.

However, the same error occurred in step 3.

(horizonbpu2) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ tar -xzf model_subsample8_parameter110M.tar.gz
(horizonbpu2) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ python3 $WENET_DIR/tools/onnx2horizonbin.py \
  --config ./model_subsample8_parameter110M/train.yaml \
  --checkpoint ./model_subsample8_parameter110M/final.pt \
  --output_dir ./model_subsample8_parameter110M/sample50_chunk8_leftchunk16 \
  --chunk_size 8 \
  --num_decoding_left_chunks 16 \
  --max_samples 50 \
  --dict ./model_subsample8_parameter110M/units.txt \
  --cali_datalist ./model_subsample8_parameter110M/calibration_data/data.list
Traceback (most recent call last):
  File "/home/ocha/wenet/runtime/horizonbpu/../..//tools/onnx2horizonbin.py", line 53, in <module>
    from wenet.utils.init_model import init_model
  File "/home/ocha/wenet/wenet/utils/init_model.py", line 16, in <module>
    from wenet.transducer.joint import TransducerJoint
  File "/home/ocha/wenet/wenet/transducer/joint.py", line 5, in <module>
    from typeguard import check_argument_types
ImportError: cannot import name 'check_argument_types' from 'typeguard' (/home/ocha/miniconda3/envs/horizonbpu2/lib/python3.8/site-packages/typeguard/__init__.py)
(horizonbpu2) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ 

There was no improvement for Step 3.

downgrade to typeguard==2.13.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants