Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input template to T5 in V2 #72

Open
yixuan-qiao opened this issue Jun 1, 2022 · 0 comments
Open

input template to T5 in V2 #72

yixuan-qiao opened this issue Jun 1, 2022 · 0 comments

Comments

@yixuan-qiao
Copy link

Hi,

I'm curious about the input template you use when generating the queries in V2.
In V1, i found it in convert_msmarco_doc_to_t5_format.py

segment = doc_title + ' ' + ' '.join(sentences[i:i + args.max_length])

Maybe in V2, it seems like the following

segment = doc_title + '\n' + doc_headings + '\n' + ' '.join(sentences[i:i + args.max_length])

When training the doc2query-T5, we just use the qrels which each passage do not have other info like doc_title or doc_headings, but in query generation stage, we concatenate all infos about each passage, is there a distribution mismatch to affect the final performance? Or would it be better to use these additional infos?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant