-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WMT16 into dataset. #7661
Add WMT16 into dataset. #7661
Conversation
6a387f6
to
0775d65
Compare
0775d65
to
9a97c7f
Compare
python/paddle/v2/dataset/wmt16.py
Outdated
UNK_MARK = "<unk>" | ||
|
||
|
||
def __build_dict__(tar_file, dict_size, save_path, lang): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be careful about the naming style, since built-in functions in python are always named to __XXX__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. Do you think it is necessary to change the function named __xx__
into __xx
also in other datasets (like wmt14) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that naming a function to _xx
is enough to declare the function to be private. I have no idea whether __xx
is a better naming style. It would be better to unify the naming style, however it is a tedious work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. According to Google style, https://google.github.io/styleguide/pyguide.html#Naming I think __xx
is ok. I will have a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
f2f583c
to
2f344e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
You can use this dataset like this: