GitHub

A few explorations:

Better further pretraining
- compare ELECTRA objective (contain more objectives and shown effective in downstream retrieval [1])
- compare CoT-MAE autoencoder (compared to the Condenser, its CLS is used only to unmask remote context words, and the decoder does not depend on encoder lower layers [2])
- scheduled warmup
ColBERT Contextual embeddings and CLS compression (alternatively, convert dense vectors to lexical/sparse vectors and compress, e.g., aggregator?)
- reduce dimension using robust PCA [4] (better in dealing with noise), and compare it against linear pooling and PCA.
- try to detect dimension importance, prune them in dot product by mutual info
- transform representation by rate reduction [3]
better hybrid with structure search
- train click-log models to guide/gate 3 passes, i.e., math structure, text word bm25, and dense retriever. Expected to be better than linear fusion.
[optional] train a cross encoder and do distillation?
- previous models using existing labels do not show too much advantage. I can verify these if time permits.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback