Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tuning on custom data #3

Open
siamakzd opened this issue Mar 30, 2022 · 3 comments
Open

Fine-tuning on custom data #3

siamakzd opened this issue Mar 30, 2022 · 3 comments

Comments

@siamakzd
Copy link

Thank you for sharing your great work!

If I want to fine-tune on a custom dataset, what should be the steps? i.e.

  • What is input data format for training, testing and inference?

-Which scripts we need to modify?

Thanks in advance!

@jpWang
Copy link
Owner

jpWang commented Mar 30, 2022

Hi,
I think the main steps should be:

  • Organize your dataset into the format of FUNSD/XFUND, depending on your dataset is monolingual/multilingual.
  • Put YourDataset.py under LiLTfinetune/data/datasets/. You can refer to funsd.py/xfun.py.
  • Put run_YourDataset_YourTask.py under examples/. You can refer to run_funsd.py/run_xfun_re.py/run_xfun_ser.py.

If you want to do something beyond training/evaluating, You can add your code to the lines after the model makes predictions, such as https://github.com/jpWang/LiLT/blob/main/examples/run_funsd.py#L345 in run_funsd.py.

@jpWang jpWang pinned this issue Apr 6, 2022
@NielsRogge
Copy link

Hi,

See also my demo notebooks here: https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LiLT

@hamzabchiri
Copy link

hamzabchiri commented May 14, 2023

Hello,

Could you let me know when you have a Custom dataset and how to organize your dataset into the format of FUNSD/XFUND?

and do you recommend any tutorial for this step?

Thank you in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants