Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SenseVoice在FunASR的1.2.0版本时间戳与字符概率对不齐的问题 #2324

Open
psk-github opened this issue Dec 20, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@psk-github
Copy link

🐛 Bug

SenseVoice在FunASR的1.2.0版本支持字符时间戳功能下,字符数与时间戳个数不一致。

To Reproduce

使用官方SenseVoice时间戳demo,在Notebook中直接识别音频文件

音频文件如下:
temp.zip

Code sample

import torch
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model_dir = "iic/SenseVoiceSmall"

model = AutoModel(
model=model_dir,
vad_model="fsmn-vad",
vad_kwargs={"max_single_segment_time": 30000},
device="cpu",
ncpu=8,
disable_update=True,
disable_pbar=True,
)
print("模型加载完成")

#torch.set_num_threads(8)
#torch.set_num_interop_threads(8)

res = model.generate(
input=f"sd_pr_right.wav",
cache={},
language="zh", # "zh", "en", "yue", "ja", "ko", "nospeech"
use_itn=True,
batch_size_s=60,
merge_vad=False, #
merge_length_s=15,
output_timestamp=True,
return_raw_text=True,
)
print(res)
text = rich_transcription_postprocess(res[0]["text"])
print(text)

Expected behavior

识别结果字符数与时间戳个数不一致,字符个数为49个,时间戳个数只有47个。且英文部分识别效果较差。
识别结果与时间戳内容不对齐

Environment

Notebook CPU 8核32G
FunASR 1.2.0

Additional context

上述问题是今天早上在一个Notebook实例上出现的,在触发1小时闲置后,换了一个实例运行,结果就是字符数与时间戳数一致。但是在本地的python3.8 docker镜像中手动安装FunASR 1.2.0的环境下识别上述音频文件,可以稳定复现上述问题。

@psk-github psk-github added the bug Something isn't working label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant