Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy load subapps/subcommands #276

Open
mrjsj opened this issue Dec 29, 2024 · 7 comments
Open

Lazy load subapps/subcommands #276

mrjsj opened this issue Dec 29, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@mrjsj
Copy link

mrjsj commented Dec 29, 2024

Hi - I love the ergonomics of this. It's really a breeze to work with.

I've built a CLI with a lot of nested subapps/subcommands, and since it goes through and registers everything on every invocation, it is a bit slow.

Is there any way to lazy load subapps and its subcommands? Here's an example.

When I run mycli --help only data-pipeline and environment subapps should be loaded, but not their subapps/commands.
When I run mycli environment then only children of the environment subapp should be loaded.

├── data-pipeline [app]
│   ├── create [command]
│   ├── delete [command]
│   ├── get [command]
│   ├── list [command]
│   └── update [command]
├── environment [app]
│   ├── create [command]
│   ├── delete [command]
│   ├── get [command]
│   ├── list [command]
│   ├── update [command]
│   ├── spark-compute [app]
│   │   ├── get-published-settings [command]
│   │   ├── get-staging-settings [command]
│   │   └── update-staging-settings [command]
│   └── spark-libraries [app]
│       ├── cancel-publish [command]
│       ├── delete-staging-library [command]
│       ├── get-published-libraries [command]
│       ├── get-staging-libraries [command]
│       ├── publish-environment [command]
│       └── upload-staging-library [command]

And this is how I do it currently:

from cyclopts import App

from .create import create_command
from .delete import delete_command
from .get import get_command
from .list import list_command
from .spark_compute import spark_compute_app
from .spark_libraries import spark_libraries_app
from .update import update_command

environment_app = App(name="environment", help="[yellow]Commands for environment[/yellow]")
environment_app.command(create_command, name="create")
environment_app.command(delete_command, name="delete")
environment_app.command(get_command, name="get")
environment_app.command(list_command, name="list")
environment_app.command(update_command, name="update")
environment_app.command(spark_compute_app)
environment_app.command(spark_libraries_app)
@mrjsj mrjsj changed the title Lazy load subcommands Lazy load subapps/subcommands Dec 29, 2024
@BrianPugh
Copy link
Owner

Is your project publicly available? I'd like to run a profiler on it. Lazy loading is one solution, but I wonder if something else is slow (and lazy loading would be a premature optimization that wouldn't help too much).

@mrjsj
Copy link
Author

mrjsj commented Dec 29, 2024

Yes.
https://github.com/mrjsj/msfabricutils/tree/feat/new-cli/cli

Would be lovely. Thanks!

@mrjsj
Copy link
Author

mrjsj commented Dec 30, 2024

@BrianPugh Thanks for profiling! It seems cyclopts itself is very efficient with loading subapps/commands, so the issue is mostly loading the 3rd party libs in those.

I can move some of those imports into the commands themselves which helps.

Maybe lazy loading was the wrong term. I looked into your documentation regarding Meta Apps - maybe that's what I'm looking for? I.e. launching the admin_app only when the admin command is run. I couldn't figure out how to implement it in my solution.
Or maybe something similar typers callback?

@BrianPugh
Copy link
Owner

Hmmm sorry I still don't quite follow 😅. Can you explain in more detail/example about what you are trying to achieve?

And I'll be merging in these optimizations and cutting a release later today 👍.

@mrjsj
Copy link
Author

mrjsj commented Dec 30, 2024

I apologize. I might just be oblivious because I'm not that experienced with CLIs.

The entrypoint application can have commands registered, and it can have applications registered as commands.

If we look at the example from the post, I have a data-pipeline app, and an environment app. These could in theory be run as individual apps, i.e. python cli/commands/core/data-pipeline/__init__.py.

So I'm thinking that the main app only need to know about the two commands: data-pipeline and environment. The main app doesn't need to know the commands within those sub-apps.
But when I then run python cli/main.py data-pipeline then it essentially runs python cli/commands/core/data-pipeline/__init__.py. So at that time we have only registered the commands: data-pipeline, environment, and the subcommands of data-pipeline.

If we instead run python cli/main.py environment, the we would run the environment application.

So if I for example run python cli/main.py environment spark-compute --help I would expect the following to happen:

  1. The main entry point is run
  2. The commands data-pipeline and environment are registered
  3. The environment command is executed which starts the environment app.
  4. The create, delete, list update, spark-compute and spark-libraries are registered as commands
  5. The spark-compute command is executed which starts the spark-compute app.
  6. The --help command is executed showing the help-page for the spark-compute app.

In this example, we skipped over registering the following commands (marked with asterisk). If we could skip registing those when the main app is launched, I would imagine a great improvement which only increases, the more complex/nested command structure is. In the project I linked, I have over 200 commands, but they're spread across sub-applications, so if we could only register a subset of nested commands, I imagine it a great performance boost.

  • data-pipeline
    • create*
    • delete*
    • get*
    • list*
    • update*
  • environment
    • spark-libraries
      • cancel-publish*
      • delete-staging-library*
      • get-published-libraries*
      • get-staging-libraries*
      • publish-environment*
      • upload-staging-library*

@BrianPugh
Copy link
Owner

I see what you mean; currently I think Cyclopts is fast enough that it probably doesn't matter to much; I think we could only save a few dozens of milliseconds in very large applications from lazily importing/parsing subapplications. A potentially attractive option would be to register subapplications as a reference string (sort of how scripts work in python-packaging), where Cyclopts would dynamically import it if necessary:

from cyclopts import App
app = App()
app.command("msfabricutils.cli.core.kql_database:create", name="create")  # something like this

The real benefit here would be if specific subapp's import modules that are relatively expensive to import, that it wouldn't be come additive over the entire app. I'll need to think about this a little more.

Some minor clarification of how Cyclopts works internally: ideally, it would work as you describe (subcommands directly invoking subapps who subsequently invoke their own subapps; it would be beautifully recursive). Unfortunately, due to implementation details of how "help" works (e.g. you need the entire list of commands that lead up to the current subcommand), the root app actually directly invokes a subapp, no matter how far down the invocation chain it is.


Second aside; I see you reaching in to the private app._commands attribute; perhaps I should add a public App.update method (similar to dict.update) that takes in an App and copies over all it's commands.

@mrjsj
Copy link
Author

mrjsj commented Dec 30, 2024

I see what you mean; currently I think Cyclopts is fast enough that it probably doesn't matter to much;

Absolutely. That is evident from the profling you did.

The real benefit here would be if specific subapp's import modules that are relatively expensive to import, that it wouldn't be come additive over the entire app. I'll need to think about this a little more.

Exactly. One specific subapp could have a really expensive import which would slow the entire app, so being able to optionally lazy load it would be helpful (I think). Either directly exposed like app.command(my_expensive_app, lazy=True) or maybe just restructuring the code somehow

import time

from cyclopts import App

app = App()

def expensive_command():
    time.sleep(2)

@app.command
def admin_command():
    admin_app = App()
    admin_app.command(expensive_command, name="expensive")

    admin_app()


if __name__ == "__main__":
    app()

Second aside; I see you reaching in to the private app._commands attribute; perhaps I should add a public App.update method (similar to dict.update) that takes in an App and copies over all it's commands.

Yes, a little hack to skip over the "core" command while maintaining the code-generated folder/file structure. Eventually, I would just not create the "core" app in the first place and register its commands directly on the main app.

@BrianPugh BrianPugh added the enhancement New feature or request label Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants