input template to T5 in V2 #72

yixuan-qiao · 2022-06-01T03:09:22Z

Hi,

I'm curious about the input template you use when generating the queries in V2.
In V1, i found it in convert_msmarco_doc_to_t5_format.py

segment = doc_title + ' ' + ' '.join(sentences[i:i + args.max_length])

Maybe in V2, it seems like the following

segment = doc_title + '\n' + doc_headings + '\n' + ' '.join(sentences[i:i + args.max_length])

When training the doc2query-T5, we just use the qrels which each passage do not have other info like doc_title or doc_headings, but in query generation stage, we concatenate all infos about each passage, is there a distribution mismatch to affect the final performance? Or would it be better to use these additional infos?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

input template to T5 in V2 #72

input template to T5 in V2 #72

yixuan-qiao commented Jun 1, 2022

input template to T5 in V2 #72

input template to T5 in V2 #72

Comments

yixuan-qiao commented Jun 1, 2022