Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WizardCoder-Python-34B-V1.0 only support for python? #223

Open
yiyepiaoling0715 opened this issue Dec 5, 2023 · 2 comments
Open

WizardCoder-Python-34B-V1.0 only support for python? #223

yiyepiaoling0715 opened this issue Dec 5, 2023 · 2 comments

Comments

@yiyepiaoling0715
Copy link

as the title, wheter other language as Java could use the base llm?

@ChiYeungLaw
Copy link
Contributor

Other language can be used too.

@CrisRodriguez
Copy link

CrisRodriguez commented Apr 9, 2024

Hi @zheng5yu9,
I post this so anyone having the same doubt can easily find an answer :)

WizardCoder-Python-34B-V1.0 is based on Code-lama 34B python

  • It is a non-instruct model
  • Data that the model saw :
    • Same as llama-2 foundation model
    • 500B tokens from a code-heavy dataset
    • 100B tokens using a Python-heavy dataset
      image
      Image taken from code-llama paper
  • It covers then a diverse set of programming languages. Among them: Python, C++, Java, PHP, TypeScript, C#, and
    Bash (with results reported in the paper)

WizardCoder-Python-34B-V1.0 is finetuned

  • Complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code
  • Base dataset for evol-instruct: Code Alpaca
  • Code Alpaca covers also a diverse set of programming languages

So, even if WizardCoder-Python-34B-V1.0 is specialized in Python it covers still plenty of other programming languages!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants