SenseVoice在FunASR的1.2.0版本时间戳与字符概率对不齐的问题 #2324

psk-github · 2024-12-20T08:41:18Z

🐛 Bug

SenseVoice在FunASR的1.2.0版本支持字符时间戳功能下，字符数与时间戳个数不一致。

To Reproduce

使用官方SenseVoice时间戳demo，在Notebook中直接识别音频文件

音频文件如下：
temp.zip

Code sample

import torch
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model_dir = "iic/SenseVoiceSmall"

model = AutoModel(
model=model_dir,
vad_model="fsmn-vad",
vad_kwargs={"max_single_segment_time": 30000},
device="cpu",
ncpu=8,
disable_update=True,
disable_pbar=True,
)
print("模型加载完成")

#torch.set_num_threads(8)
#torch.set_num_interop_threads(8)

res = model.generate(
input=f"sd_pr_right.wav",
cache={},
language="zh", # "zh", "en", "yue", "ja", "ko", "nospeech"
use_itn=True,
batch_size_s=60,
merge_vad=False, #
merge_length_s=15,
output_timestamp=True,
return_raw_text=True,
)
print(res)
text = rich_transcription_postprocess(res[0]["text"])
print(text)

Expected behavior

识别结果字符数与时间戳个数不一致，字符个数为49个，时间戳个数只有47个。且英文部分识别效果较差。

Environment

Notebook CPU 8核32G
FunASR 1.2.0

Additional context

上述问题是今天早上在一个Notebook实例上出现的，在触发1小时闲置后，换了一个实例运行，结果就是字符数与时间戳数一致。但是在本地的python3.8 docker镜像中手动安装FunASR 1.2.0的环境下识别上述音频文件，可以稳定复现上述问题。

psk-github added the bug Something isn't working label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SenseVoice在FunASR的1.2.0版本时间戳与字符概率对不齐的问题 #2324

SenseVoice在FunASR的1.2.0版本时间戳与字符概率对不齐的问题 #2324

psk-github commented Dec 20, 2024

SenseVoice在FunASR的1.2.0版本时间戳与字符概率对不齐的问题 #2324

SenseVoice在FunASR的1.2.0版本时间戳与字符概率对不齐的问题 #2324

Comments

psk-github commented Dec 20, 2024

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Additional context