Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gen #775

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

gen #775

wants to merge 9 commits into from

Conversation

abhishekkrthakur
Copy link
Member

No description provided.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions github-actions bot added the stale label Oct 25, 2024
@abhishekkrthakur
Copy link
Member Author

not stale

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 6 out of 12 changed files in this pull request and generated 5 suggestions.

Files not reviewed (6)
  • src/autotrain/app/templates/index.html: Language not supported
  • src/autotrain/cli/autotrain.py: Evaluated as low risk
  • src/autotrain/datagen/clients.py: Evaluated as low risk
  • src/autotrain/app/static/scripts/listeners.js: Evaluated as low risk
  • src/autotrain/trainers/text_classification/params.py: Evaluated as low risk
  • src/autotrain/trainers/text_classification/main.py: Evaluated as low risk

return
cmd = f"autotrain --config {params.training_config}"
logger.info(f"Running AutoTrain: {cmd}")
cmd = [str(c) for c in cmd]
Copy link
Preview

Copilot AI Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command should be split into a list of arguments, not characters. Use cmd = cmd.split() instead.

Suggested change
cmd = [str(c) for c in cmd]
cmd = cmd.split()

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
path = os.path.join(output_dir, "gen_params.json")
# save formatted json
with open(path, "w", encoding="utf-8") as f:
f.write(self.model_dump_json(indent=4))
Copy link
Preview

Copilot AI Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method 'model_dump_json' does not exist in Pydantic's BaseModel. It should be 'self.json(indent=4)'.

Suggested change
f.write(self.model_dump_json(indent=4))
f.write(self.json(indent=4))

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
)

if message is None:
logger.warning("Failed to generate data. Retrying...")
Copy link
Preview

Copilot AI Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code retries indefinitely if the message is None, which could lead to an infinite loop. Consider adding a maximum retry limit.

Suggested change
logger.warning("Failed to generate data. Retrying...")
if message is None and counter < self.params.max_retries:

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
TEXT_CLASSIFICATION_DATA_PROMPT = """
The dataset for text classification is in JSON format.
Each line should be a JSON object with the following keys: text and target.
Make sure each text sample has atleast {min_words} words.
Copy link
Preview

Copilot AI Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word 'atleast' should be 'at least'.

Suggested change
Make sure each text sample has atleast {min_words} words.
Make sure each text sample has at least {min_words} words.

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
SEQ2SEQ_DATA_PROMPT = """
The dataset for sequence-to-sequence is in JSON format.
Each line should be a JSON object with the following keys: text and target.
Make sure each text sample has atleast {min_words} words.
Copy link
Preview

Copilot AI Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word 'atleast' should be 'at least'.

Suggested change
Make sure each text sample has atleast {min_words} words.
Make sure each text sample has at least {min_words} words.

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants