We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SenseVoice在FunASR的1.2.0版本支持字符时间戳功能下,字符数与时间戳个数不一致。
使用官方SenseVoice时间戳demo,在Notebook中直接识别音频文件
音频文件如下: temp.zip
import torch from funasr import AutoModel from funasr.utils.postprocess_utils import rich_transcription_postprocess
model_dir = "iic/SenseVoiceSmall"
model = AutoModel( model=model_dir, vad_model="fsmn-vad", vad_kwargs={"max_single_segment_time": 30000}, device="cpu", ncpu=8, disable_update=True, disable_pbar=True, ) print("模型加载完成")
#torch.set_num_threads(8) #torch.set_num_interop_threads(8)
res = model.generate( input=f"sd_pr_right.wav", cache={}, language="zh", # "zh", "en", "yue", "ja", "ko", "nospeech" use_itn=True, batch_size_s=60, merge_vad=False, # merge_length_s=15, output_timestamp=True, return_raw_text=True, ) print(res) text = rich_transcription_postprocess(res[0]["text"]) print(text)
识别结果字符数与时间戳个数不一致,字符个数为49个,时间戳个数只有47个。且英文部分识别效果较差。
Notebook CPU 8核32G FunASR 1.2.0
上述问题是今天早上在一个Notebook实例上出现的,在触发1小时闲置后,换了一个实例运行,结果就是字符数与时间戳数一致。但是在本地的python3.8 docker镜像中手动安装FunASR 1.2.0的环境下识别上述音频文件,可以稳定复现上述问题。
The text was updated successfully, but these errors were encountered:
No branches or pull requests
🐛 Bug
SenseVoice在FunASR的1.2.0版本支持字符时间戳功能下,字符数与时间戳个数不一致。
To Reproduce
使用官方SenseVoice时间戳demo,在Notebook中直接识别音频文件
音频文件如下:
temp.zip
Code sample
import torch
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess
model_dir = "iic/SenseVoiceSmall"
model = AutoModel(
model=model_dir,
vad_model="fsmn-vad",
vad_kwargs={"max_single_segment_time": 30000},
device="cpu",
ncpu=8,
disable_update=True,
disable_pbar=True,
)
print("模型加载完成")
#torch.set_num_threads(8)
#torch.set_num_interop_threads(8)
res = model.generate(
input=f"sd_pr_right.wav",
cache={},
language="zh", # "zh", "en", "yue", "ja", "ko", "nospeech"
use_itn=True,
batch_size_s=60,
merge_vad=False, #
merge_length_s=15,
output_timestamp=True,
return_raw_text=True,
)
print(res)
text = rich_transcription_postprocess(res[0]["text"])
print(text)
Expected behavior
识别结果字符数与时间戳个数不一致,字符个数为49个,时间戳个数只有47个。且英文部分识别效果较差。
Environment
Notebook CPU 8核32G
FunASR 1.2.0
Additional context
上述问题是今天早上在一个Notebook实例上出现的,在触发1小时闲置后,换了一个实例运行,结果就是字符数与时间戳数一致。但是在本地的python3.8 docker镜像中手动安装FunASR 1.2.0的环境下识别上述音频文件,可以稳定复现上述问题。
The text was updated successfully, but these errors were encountered: