Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next WeNet Roadmap #1683

Closed
robin1001 opened this issue Feb 8, 2023 · 4 comments
Closed

Next WeNet Roadmap #1683

robin1001 opened this issue Feb 8, 2023 · 4 comments
Labels
documentation Improvements or additions to documentation Stale

Comments

@robin1001
Copy link
Collaborator

robin1001 commented Feb 8, 2023

We will mainly focus on the following two problems in Next WeNet.

  1. NN based contextual biasing and LM solution. On the one hand, a pure end-to-end model is our final goal, including contextual biasing and LM. On the other hand, there are a lot of problems in our current contextual biasing and LM, such as poor rare word performance in contextual biasing, complicated LM solution since FST and token passing beam search are introduced, and so on. Also, we are looking for new paradigm, such as joint text/audio learning, prompt learning, and so on.
  2. Open source big model, pretrained model, and mutimodal model exploration. We can see the increasing capability, influence, and interest in these models, and we believe it may give a final solution to general AI. It's hard for us to directly do such things due to the lack of research and computation resources. However, we can explore the usage of the models in speech recognition applications as open source big models + task/private data may be the new paradigm for the next AI.

We are open for other proposals. WeNet is a community-driven project and we love your feedback and proposals on where we should be heading. Feel free to volunteer yourself if you are interested in trying out some items(they do not have to be on the list).

@robin1001 robin1001 pinned this issue Feb 8, 2023
@xingchensong xingchensong added the documentation Improvements or additions to documentation label Feb 21, 2023
@Mddct
Copy link
Collaborator

Mddct commented Mar 12, 2023

From Google's recent USM paper, we can see the following three points:

1 injecting tezt

2 Simpler pre-training

3 Text to speech intermediate representation

I think these three are the ultimate weapons for speech recognition, whether it is from the signal level or the text level。

And the community is a good way to cooperate to make the big model or the road of the new pipeline

@Mddct
Copy link
Collaborator

Mddct commented Mar 14, 2023

From Google's recent USM paper, we can see the following three points:

1 injecting tezt

2 Simpler pre-training

3 Text to speech intermediate representation

I think these three are the ultimate weapons for speech recognition, whether it is from the signal level or the text level。

And the community is a good way to cooperate to make the big model or the road of the new pipeline

For 2: sipmpler pretrin: May be bestrq is good start : https://github.com/wenet-e2e/wenet/tree/Mddct-bestrq/wenet/ssl/bestrq

@robin1001
Copy link
Collaborator Author

@Mddct shows his insight on general speech recognition task, it's great.

@robin1001 robin1001 unpinned this issue Nov 3, 2023
Copy link

This issue has been automatically closed due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation Stale
Projects
None yet
Development

No branches or pull requests

3 participants