Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

51B模型ckpt转换的疑问,模型iter_0000000可以直接使用吗? #82

Open
yuki772049 opened this issue Jan 3, 2024 · 3 comments

Comments

@yuki772049
Copy link

问题如题
根据提供的下载链接,51B模型文件是iter_0000000,运行过程会报错parsing metadata file error

  File "/Yuan-2.0/megatron/checkpointing.py", line 163, in read_metadata
    assert iteration > 0 or release, 'error parsing metadata file {}'.format(
AssertionError: error parsing metadata file /Yuan2.0-51B/51B/latest_checkpointed_iteration.txt
0       /tmp/yuan2.0/ckpt-51B-mid

检查megatron/checkpointing.py代码发现似乎要iteration > 0,否则会报error?
尝试修改51B iter_00的大小和latest_checkpointed_iteration内容为1就能正常。

def read_metadata(tracker_filename):
    # Read the tracker file and either set the iteration or
    # mark it as a release checkpoint.
    iteration = 0
    release = False
    with open(tracker_filename, 'r') as f:
        metastring = f.read().strip()
        try:
            iteration = int(metastring)
        except ValueError:
            release = metastring == 'release'
            if not release:
                print_rank_0('ERROR: Invalid metadata file {}. Exiting'.format(
                    tracker_filename))
                sys.exit()
    assert iteration > 0 or release, 'error parsing metadata file {}'.format(
        tracker_filename)

请帮忙解答一下这个问题

@zhaoxudong01
Copy link
Collaborator

#40

@Shawn-IEITSystems
Copy link
Collaborator

@yuki772049 请问问题是否已经解决?

@yuki772049
Copy link
Author

@Shawn-IEITSystems 非问题,只是疑问,12月份提供的模型文件是iter_0000000,但是脚本限制iteration > 0,这是自相矛盾的。不过看到新上传的模型是iter_0000001了,不再存在上述疑问。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants