Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure-cli #488

Merged
merged 5 commits into from
Oct 6, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -281,7 +281,7 @@ For more advanced use cases, you can use the `DaskTransformComponent` instead.
Once you have a pipeline you can easily run (and compile) it by using the built-in CLI:

```bash
fondant run pipeline.py --local
fondant run local pipeline.py
```

To see all available arguments you can check the fondant CLI help pages
Expand Down
2 changes: 1 addition & 1 deletion docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ git clone https://github.com/ml6team/fondant.git
```
Make sure that Docker Compose is running, navigate to `fondant/examples/pipelines/filter-cc-25m`, and initiate the pipeline by executing:
```
fondant run pipeline --local
fondant run local pipeline.py
```
Note: For local testing purposes, the pipeline will only download the first 10,000 images. If you want to download the full dataset, you will need to modify the component arguments in the pipeline.py file, specifically the following part:
```python
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/build_a_simple_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ Two key actions are taking place here:

To test the pipeline, you can execute the following command within the pipeline directory:
```
fondant run pipeline --local
fondant run local pipeline.py
```

The pipeline execution will start, initiating the download of the dataset from HuggingFace.
Expand Down
6 changes: 3 additions & 3 deletions docs/pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ The Kubeflow compiler will take your pipeline and compile it to a Kubeflow pipel

- Using the CLI:
```bash
fondant compile <pipeline_ref> --kubeflow --output <path_to_output>
fondant compile kubeflow --output <path_to_output> <pipeline_ref>
```

- Using the compiler directly:
Expand All @@ -163,7 +163,7 @@ There are 2 ways to run a Kubeflow compiled pipeline:

- Using the CLI:
```bash
fondant run <pipeline_ref> --kubeflow --host <kubeflow_host>
fondant run kubeflow --host <kubeflow_host> <pipeline_ref>
```
NOTE: that the pipeline ref is the path to the compiled pipeline spec OR a reference to an fondant pipeline in which case the compiler will compile the pipeline first before running.

Expand Down Expand Up @@ -251,7 +251,7 @@ docker compose up

Or you can use the fondant cli to run the pipeline:
```bash
fondant run <pipeline_ref> --local
fondant run local <pipeline_ref>
```

NOTE: that the pipeline ref is the path to the compiled pipeline spec OR a reference to an fondant pipeline in which case the compiler will compile the pipeline first before running.
Expand Down
4 changes: 2 additions & 2 deletions examples/pipelines/filter-cc-25m/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Accordingly, the getting started documentation, we can run the pipeline by using
as follow:

```bash
fondant run pipeline --local
fondant run local pipeline.py
```

> Note: The 'load_from_hub' component accepts an argument that defines the dataset size.
Expand All @@ -68,7 +68,7 @@ fondant run pipeline --local
If you wish to run the entire pipeline, including the filtering step, use the following command:

```bash
fondant run filter_pipeline --local
fondant run local filter_pipeline
```

After the pipeline is succeeded you can explore the data by using the fondant data explorer:
Expand Down
235 changes: 127 additions & 108 deletions src/fondant/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,20 +80,6 @@ def entrypoint():
args.func(args)


def set_default_output(args: argparse.Namespace):
"""Set the default output path depending on the runner type."""
if args.output_path is None:
if args.local:
args.output_path = "docker-compose.yml"
elif args.kubeflow:
args.output_path = "pipeline.yaml"
else:
msg = "One of the arguments --local --kubeflow is required"
raise ValueError(msg)

return args


def register_explore(parent_parser):
parser = parent_parser.add_parser(
"explore",
Expand Down Expand Up @@ -184,72 +170,85 @@ def register_compile(parent_parser):
formatter_class=argparse.RawDescriptionHelpFormatter,
description=textwrap.dedent(
"""
Compile a fondant pipeline into either a docker-compose.yml(local) or kubeflow spec file.
Compile a fondant pipeline into pipeline specification file file.

The pipeline argument is a formatstring. The compiler will try to import the pipeline from the module specified in the formatstring.
(NOTE: path is patched to include the current working directory so you can do relative imports)

The --local or --kubeflow flag specifies the mode in which the pipeline will be compiled.
You can use the --extra-volumes flag to specify extra volumes to mount in the containers this can be used:

- to mount data directories to be used by the pipeline (note that if your pipeline's base_path is local it will already be mounted for you).
- to mount cloud credentials (see examples))
You can use different modes for fondant runners. Current existing modes are local and kubeflow.

Example:
fondant compile my_project.my_pipeline.py --local --extra-volumes $HOME/.aws/credentials:/root/.aws/credentials
Examples of compiling component:
fondant compile local --extra-volumes $HOME/.aws/credentials:/root/.aws/credentials my_project.my_pipeline.py

fondant compile my_project.my_pipeline.py --kubeflow --extra-volumes $HOME/.config/gcloud/application_default_credentials.json:/root/.config/gcloud/application_default_credentials.json
fondant compile kubeflow --extra-volumes $HOME/.config/gcloud/application_default_credentials.json:/root/.config/gcloud/application_default_credentials.json my_project.my_pipeline.py
""",
),
)

compiler_subparser = parser.add_subparsers()

parser.add_argument(
"ref",
help="""Reference to the pipeline to run, can be a path to a spec file or
a module containing the pipeline instance that will be compiled first (e.g. pipeline.py)
""",
action="store",
)
# add a mutually exclusive group for the mode
mode_group = parser.add_mutually_exclusive_group(required=True)
mode_group.add_argument("--local", action="store_true")
mode_group.add_argument("--kubeflow", action="store_true")

parser.add_argument(
local_parser = compiler_subparser.add_parser(name="local", help="Local compiler")
kubeflow_parser = compiler_subparser.add_parser(
name="kubeflow",
help="Kubeflow compiler",
)

# Local runner parser
local_parser.add_argument(
"--output-path",
"-o",
help="Output directory",
default=None,
help="Output path of compiled pipeline",
default="docker_compose.yml",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for going away from the default docker-compose naming?

Suggested change
default="docker_compose.yml",
default="docker-compose.yml",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, forgot that it was referenced like this

)
parser.add_argument(
local_parser.add_argument(
"--extra-volumes",
help="Extra volumes to mount in containers",
help="""Extra volumes to mount in containers. You can use the --extra-volumes flag to specify extra volumes to mount in the containers this can be used:
- to mount data directories to be used by the pipeline (note that if your pipeline's base_path is local it will already be mounted for you).
- to mount cloud credentials""",
nargs="+",
)
parser.add_argument(
local_parser.add_argument(
"--build-arg",
action="append",
help="Build arguments to pass to `docker build`. Format {key}={value}.",
default=[],
)

parser.set_defaults(func=compile)
# Kubeflow parser
kubeflow_parser.add_argument(
"--output-path",
"-o",
help="Output path of compiled pipeline",
default="pipeline.yaml",
)

local_parser.set_defaults(func=compile_local)
kubeflow_parser.set_defaults(func=compile_kfp)

def compile(args):
args = set_default_output(args)

def compile_local(args):
pipeline = pipeline_from_module(args.ref)
compiler = DockerCompiler()
compiler.compile(
pipeline=pipeline,
extra_volumes=args.extra_volumes,
output_path=args.output_path,
build_args=args.build_arg,
)

if args.local:
compiler = DockerCompiler()
compiler.compile(
pipeline=pipeline,
extra_volumes=args.extra_volumes,
output_path=args.output_path,
build_args=args.build_arg,
)
elif args.kubeflow:
compiler = KubeFlowCompiler()
compiler.compile(pipeline=pipeline, output_path=args.output_path)

def compile_kfp(args):
pipeline = pipeline_from_module(args.ref)
compiler = KubeFlowCompiler()
compiler.compile(pipeline=pipeline, output_path=args.output_path)


def register_run(parent_parser):
Expand All @@ -262,88 +261,108 @@ def register_run(parent_parser):
pipeline (see fondant compile --help for more info)
OR a path to a spec file in which case it will compile the pipeline first and then run it.

The --local or --kubeflow flag specifies the mode in which the pipeline will be ran.
You can use the --extra-volumes flag to specify extra volumes to mount in the containers this can be used:
You can use different modes for fondant runners. Current existing modes are `local` and `kubeflow`.
You can run `fondant <mode> --help` to find out more about the specific arguments for each mode.

Example:
fondant run my_project.my_pipeline.py --local --extra-volumes $HOME/.aws/credentials:/root/.aws/credentials
fondant run ./my_compiled_kubeflow_pipeline.tgz --kubeflow
Examples of running component:
fondant run local --extra-volumes $HOME/.aws/credentials:/root/.aws/credentials my_project.my_pipeline.py
fondant run kubeflow ./my_compiled_kubeflow_pipeline.tgz
""",
),
)

runner_subparser = parser.add_subparsers()
# Define the "ref" argument once
parser.add_argument(
"ref",
help="""Reference to the pipeline to run, can be a path to a spec file or
a module containing the pipeline instance that will be compiled first (e.g. pipeline.py)
""",
action="store",
)
# add a mutually exclusive group for the mode
mode_group = parser.add_mutually_exclusive_group(required=True)
mode_group.add_argument("--local", action="store_true")
mode_group.add_argument("--kubeflow", action="store_true")

parser.add_argument(
local_parser = runner_subparser.add_parser(name="local", help="Local runner")
kubeflow_parser = runner_subparser.add_parser(
name="kubeflow",
help="Kubeflow runner",
)

# Local runner parser
local_parser.add_argument(
"--output-path",
"-o",
help="Output directory",
default=None,
help="Output path of compiled pipeline",
default="docker_compose.yml",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above.

Suggested change
default="docker_compose.yml",
default="docker-compose.yml",

)
parser.add_argument(
local_parser.add_argument(
"--extra-volumes",
help="Extra volumes to mount in containers",
nargs="+",
help="""Extra volumes to mount in containers. You can use the --extra-volumes flag to specify extra volumes to mount in the containers this can be used:
- to mount data directories to be used by the pipeline (note that if your pipeline's base_path is local it will already be mounted for you).
- to mount cloud credentials""",
)
parser.add_argument(
local_parser.add_argument(
"--build-arg",
action="append",
help="Build arguments to pass to `docker build`. Format {key}={value}.",
help="Build arguments for `docker build`",
)
parser.add_argument("--host", help="KubeFlow pipeline host url", required=False)
parser.set_defaults(func=run)


def run(args):
args = set_default_output(args)

if args.local:
try:
pipeline = pipeline_from_module(args.ref)
except ModuleNotFoundError:
spec_ref = args.ref
else:
spec_ref = args.output_path
logging.info(
"Found reference to un-compiled pipeline... compiling to {spec_ref}",
)
compiler = DockerCompiler()
compiler.compile(
pipeline=pipeline,
extra_volumes=args.extra_volumes,
output_path=spec_ref,
build_args=args.build_arg,
)
finally:
DockerRunner().run(spec_ref)

elif args.kubeflow:
if not args.host:
msg = "--host argument is required for running on Kubeflow"
raise ValueError(msg)
try:
pipeline = pipeline_from_module(args.ref)
except ModuleNotFoundError:
spec_ref = args.ref
else:
spec_ref = args.output_path
logging.info(
f"Found reference to un-compiled pipeline... compiling to {spec_ref}",
)
compiler = KubeFlowCompiler()
compiler.compile(pipeline=pipeline, output_path=spec_ref)
finally:
runner = KubeflowRunner(host=args.host)
runner.run(input_spec=spec_ref)
local_parser.set_defaults(func=run_local)

# kubeflow runner parser
kubeflow_parser.add_argument(
"--output-path",
"-o",
help="Output path of compiled pipeline",
default="pipeline.yaml",
)
kubeflow_parser.add_argument(
"--host",
help="KubeFlow pipeline host url",
required=True,
)

kubeflow_parser.set_defaults(func=run_kfp)


def run_local(args):
try:
pipeline = pipeline_from_module(args.ref)
except ModuleNotFoundError:
spec_ref = args.ref
else:
spec_ref = args.output_path
logging.info(
"Found reference to un-compiled pipeline... compiling to {spec_ref}",
)
compiler = DockerCompiler()
compiler.compile(
pipeline=pipeline,
extra_volumes=args.extra_volumes,
output_path=spec_ref,
build_args=args.build_arg,
)
finally:
DockerRunner().run(spec_ref)


def run_kfp(args):
if not args.host:
msg = "--host argument is required for running on Kubeflow"
raise ValueError(msg)
try:
pipeline = pipeline_from_module(args.ref)
except ModuleNotFoundError:
spec_ref = args.ref
else:
spec_ref = args.output_path
logging.info(
"Found reference to un-compiled pipeline... compiling to {spec_ref}",
)
compiler = KubeFlowCompiler()
compiler.compile(pipeline=pipeline, output_path=spec_ref)
finally:
runner = KubeflowRunner(host=args.host)
runner.run(input_spec=spec_ref)


def register_execute(parent_parser):
Expand Down
Loading
Loading