The following codes are the solutions (1st place, private score: 0.9911) for the dacon competition.
git clone https://github.com/GNOEYHEAT/CodeSim_cpp.git
cd CodeSim_cpp
pip install -r requirements.txt
├── Dataset
│ ├── train_code
│ │ ├── problem001
│ │ ├── ...
│ │ └── problem500
│ ├── sample_submission.csv
│ ├── sample_train.csv
│ └── test.csv
├── Utils
│ ├── CodeLM_utils.py
│ └── Preprocessing_utils.py
├── Preprocess.py
└── CodeLM.py
python Preprocess.py
python CodeLM.py
The final submission is GraphCodeBERT+UniXcoder.
- The hyperparameters are as follows:
- truncation_side='left', bm25='bm25plus'
- The hyperparameters are as follows:
- truncation_side='left', optimizer='adamw', learning_rate=0.00003
index | CodeBERT Model | frac | text_len | Pr Acc | Pl Acc | Val Acc |
---|---|---|---|---|---|---|
exp_30 | GraphCodeBERT | 0.01 | 512 | 0.98859 | 0.98831 | 0.99641 |
exp_31 | GraphCodeBERT | 0.02 | 512 | 0.98909 | 0.98892 | 0.99794 |
exp_32 | UniXcoder | 0.01 | 1024 | 0.98942 | 0.98911 | 0.99606 |
exp_31+32 | GraphCodeBERT+UniXcoder | - | - | 0.99111 | 0.99084 | - |