We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
minimind-v是一个不错的「玩具」用于学习VLM结构和训练流程。
从效果来看,受LLM本身能力的限制,minimind-v目前离「可用生产力」还差得很远,虽然它能粗粒度识别场景和画面对象,但在理解上还存在明显的偏差和幻觉。
在此基础上,期待minimind-v可以发挥其抛砖引玉的作用,成为一块有用的「砖」为大家深入研究铺路
The text was updated successfully, but these errors were encountered:
如果层数加到80 维度加到1024或者2048 。再加大训练数据量(比如几个T的tokens),其它都不改,效果应该会有明显提升吧?当然minimind也要对应改
Sorry, something went wrong.
那就是5B左右的语言模型,效果会得到保证
No branches or pull requests
minimind-v是一个不错的「玩具」用于学习VLM结构和训练流程。
从效果来看,受LLM本身能力的限制,minimind-v目前离「可用生产力」还差得很远,虽然它能粗粒度识别场景和画面对象,但在理解上还存在明显的偏差和幻觉。
在此基础上,期待minimind-v可以发挥其抛砖引玉的作用,成为一块有用的「砖」为大家深入研究铺路
The text was updated successfully, but these errors were encountered: