Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.0 merge #910

Merged
merged 120 commits into from
Sep 2, 2024
Merged

v1.0 merge #910

merged 120 commits into from
Sep 2, 2024

Conversation

bghira
Copy link
Owner

@bghira bghira commented Sep 1, 2024

Here's a summary of the major changes in version 1.0. These are huge changes, so it would be nice if others can retrieve the main branch and run some old configurations and ensure things still look as you'd expect.

Refactoring and Enhancements:

  1. Refactor train.py into a Trainer Class:

    • The core logic of train.py has been restructured into a Trainer class, improving modularity and maintainability.
    • Exposes an SDK for reuse elsewhere.
  2. Model Family Unification:

    • References to specific model types (--sd3, --flux, etc.) have been replaced with a unified --model_family argument, streamlining model specification and reducing clutter in configurations.
  3. Configuration System Overhaul:

    • Switched from .env configuration files to JSON (config.json), with multiple backends supporting JSON configuration loading. This allows more flexible and readable configuration management.
    • Updated the configuration loader to auto-detect the best backend when launching.
  4. Enhanced Argument Handling:

    • Deprecated old argument references and moved argument parsing to helpers/configuration/cmd_args.py for better organization.
    • Introduced support for new arguments such as --model_card_safe_for_work, --flux_schedule_shift, and --disable_bucket_pruning.
  5. Improved Hugging Face Integration:

    • Modified configure.py to avoid asking for Hugging Face model name details unless required.
    • Added the ability to pass the SFW (safe-for-work) argument into the training script.
  6. Optimizations and Bug Fixes:

    • Fixed several references to learning rate (lr) initialization and corrected --optimizer usage.
    • Addressed issues with attention masks swapping and fixed the persistence of text encoders in RAM after refactoring.
  7. Training and Validation Enhancements:

    • Added better dataset examples with support for multiple resolutions and mixed configurations.
    • Configured training scripts to disable gradient accumulation steps by default and provided better control over training options via the updated config files.
  8. Enhanced Logging and Monitoring:

    • Improved the handling of Weights & Biases (wandb) logs and updated tracker argument references.
  9. Documentation Updates:

    • Revised documentation to reflect changes in model family handling, argument updates, and configuration management.
    • Added guidance on setting up the new configuration files and examples for multi-resolution datasets.
  10. Miscellaneous Improvements:

    • Enabled support for NSFW tags in model cards enabled by default.
    • Updated train.sh to minimal requirements, reducing complexity and streamlining the training process.

bghira and others added 30 commits August 28, 2024 17:53
…ushing it there; allow user to configure multidatabackend using multi-res and crop by default
The docs show `instance_prompt` when the value should be `instanceprompt`.

```
2024-08-29 15:28:24,677 [ERROR] (__main__) Unsupported caption strategy: instance_prompt. Supported: 'filename', 'textfile', 'parquet', 'instanceprompt', traceback: Traceback (most recent call last):
```
Dataloader Docs - Correct caption strategy for instance prompt
bghira and others added 28 commits September 1, 2024 14:07
fix constant_with_warmup not being so constant or warming up
fix multigpu schedule issue with LR on resume
multiply the resume state step by the number of GPUs in an attempt to overcome accelerate v0.33 issue
default to json/toml before the env file in case multigpu is configured
bypass some "helpful" diffusers logic that makes random decisions to run on CPU
@bghira bghira marked this pull request as ready for review September 2, 2024 20:53
@bghira bghira merged commit bf4951d into release Sep 2, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants