Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Vertex runner #393

Closed
14 tasks done
RobbeSneyders opened this issue Aug 29, 2023 · 5 comments
Closed
14 tasks done

Create Vertex runner #393

RobbeSneyders opened this issue Aug 29, 2023 · 5 comments
Assignees
Labels
Infrastructure Infrastructure and deployment

Comments

@GeorgesLorre
Copy link
Collaborator

GeorgesLorre commented Sep 5, 2023

Vertex runs kubeflow pipelines in a more managed way since we already have a kubeflow runner (and compiler), vertex should be a logical next step.

For Vertex we should use the "new" kubeflow pipelines v2 which we currently do not support.

There are 3 options to make this work:

  • we keep our current kfp runner v1 and create a new runner that uses kfp v2
    --> not possible since we can't have 2 versions of kfp installed at the same time
    --> we might be able to hack it in but It wont be nice

  • we use the v2-compatibility-mode to both support v1 and v2. This means importing from kfp.v2 for v2 functionality. This works in versions of kfp==1.8.0 and up.
    --> seems like the logica way forward but the kfpv1.v2 and kfpv2 do not have the same interfaces and the documentation is very lacking
    --> feels like a temporary fix

  • we only use v2 and port our current code of the kubeflow runner to v2
    --> this means that we need to rewrite the kubeflow compiler and create the vertex compiler but there is a lot of overlapping code. The compiler can be the same and the runnen could be specific AFAIK.
    --> we will need to create a new kfpv2 cluster and force our users to migrate since v1 will not be supported anymore
    --> there are many breaking changes
    --> V2 seems way more strict on typing which is probably a good think but affects a lot of the existing fondant components

  max_aspect_ratio:
    description: Maximum aspect ratio of the images.
    type: float
    default: 'inf'

the default 'inf' is not a float

@GeorgesLorre
Copy link
Collaborator

GeorgesLorre commented Sep 5, 2023

Notes on generating kubeflow components specs.

In kfpv1 a component spec looks something like this:

name: Add
description: |
    Component to add two numbers
inputs:
- name: op-1
  type: Integer
- name: op2
  type: Integer
outputs:
- name: sum
  type: Integer
implementation:
  container:
    image: google/cloud-sdk:latest
    command:
    - sh
    - -c
    - |
      set -e -x
      echo "$(($0+$1))" | gsutil cp - "$2"
    - {inputValue: op-1}
    - {inputValue: op2}
    - {outputPath: sum}

** it is not documented very well how to use v2 features in the old component spec format**

In kfpv2 a component spec has been unified along with the pipeline spec into IR YAML
This looks like this:

{
  "components": {
    "comp-fondant-component": {
      "executorLabel": "exec-fondant-component",
      "inputDefinitions": {
        "artifacts": {
          "input_manifest_path": {
            "artifactType": {
              "schemaTitle": "system.Artifact",
              "schemaVersion": "0.0.1"
            },
            "isOptional": true
          }
        },
        "parameters": {
          "component_spec": {
            "defaultValue": {},
            "isOptional": true,
            "parameterType": "STRUCT"
          },
          "input_partition_rows": {
            "isOptional": true,
            "parameterType": "STRING"
          },
          "metadata": {
            "parameterType": "STRING"
          }
        }
      },
      "outputDefinitions": {
        "artifacts": {
          "output_manifest_path": {
            "artifactType": {
              "schemaTitle": "system.Artifact",
              "schemaVersion": "0.0.1"
            }
          }
        }
      }
    }
  },
  "deploymentSpec": {
    "executors": {
      "exec-fondant-component": {
        "container": {
          "args": [
            "--input_manifest_path",
            "{{$.inputs.artifacts['input_manifest_path'].uri}}",
            "--metadata",
            "{{$.inputs.parameters['metadata']}}",
            "--component_spec",
            "{{$.inputs.parameters['component_spec']}}",
            "--input_partition_rows",
            "{{$.inputs.parameters['input_partition_rows']}}",
            "--output_manifest_path",
            "{{$.outputs.artifacts['output_manifest_path'].uri}}"
          ],
          "command": [
            "python3",
            "main.py"
          ],
          "image": "some_image"
        }
      }
    }
  },
  "pipelineInfo": {
    "name": "fondant-component"
  },
  "root": {
    "dag": {
      "outputs": {
        "artifacts": {
          "output_manifest_path": {
            "artifactSelectors": [
              {
                "outputArtifactKey": "output_manifest_path",
                "producerSubtask": "fondant-component"
              }
            ]
          }
        }
      },
      "tasks": {
        "fondant-component": {
          "cachingOptions": {
            "enableCache": true
          },
          "componentRef": {
            "name": "comp-fondant-component"
          },
          "inputs": {
            "artifacts": {
              "input_manifest_path": {
                "componentInputArtifact": "input_manifest_path"
              }
            },
            "parameters": {
              "component_spec": {
                "componentInputParameter": "component_spec"
              },
              "input_partition_rows": {
                "componentInputParameter": "input_partition_rows"
              },
              "metadata": {
                "componentInputParameter": "metadata"
              }
            }
          },
          "taskInfo": {
            "name": "fondant-component"
          }
        }
      }
    },
    "inputDefinitions": {
      "artifacts": {
        "input_manifest_path": {
          "artifactType": {
            "schemaTitle": "system.Artifact",
            "schemaVersion": "0.0.1"
          },
          "isOptional": true
        }
      },
      "parameters": {
        "component_spec": {
          "defaultValue": {},
          "isOptional": true,
          "parameterType": "STRUCT"
        },
        "input_partition_rows": {
          "isOptional": true,
          "parameterType": "STRING"
        },
        "metadata": {
          "parameterType": "STRING"
        }
      }
    },
    "outputDefinitions": {
      "artifacts": {
        "output_manifest_path": {
          "artifactType": {
            "schemaTitle": "system.Artifact",
            "schemaVersion": "0.0.1"
          }
        }
      }
    }
  },
  "schemaVersion": "2.1.0",
  "sdkVersion": "kfp-2.0.1"
}

There is no real difference between a spec describing a pipeline or a component (a component is just a one step pipeline)
You can read tis spec from file or text and use it in another pipeline.

** I have code to generate these new IR YAML's for fondant components**

@PhilippeMoussalli
Copy link
Contributor

PhilippeMoussalli commented Sep 5, 2023

Thanks for the extensive description @GeorgesLorre!

Solution 1: Indeed does not seems like the most optimal solution to have two versions. Regarding the runner, I think it's a given that we would have to have separate runner for both Vertex and KFP regardless of the version no?

Solution 2: importing from v2 was what we used to do before in Vertex at ML6 (now the new boilerplate is V2 and I haven't worked with it before). Although it's not well documented, we have the ability to use it properly based on the experience/boilerplate that we have. Downside is that we would then need to develop a different compiler for V2.

Solution 3: Seems to be the most optimal one indeed but It still feels like the full fledged v2 is still more integrated with Vertex rather than KFP on GKE (at least for the moment since the official release was not too long ago). There seems to be still some issues/features missing for us to select nodepools and GPU that are still to be integrated: kubeflow/pipelines#9682

I would be more in favor of Solution 3 to avoid additional work, but we would need to make sure that it can offer all the core features that we need. I think in Vertex that's a given but would rather want to test it out on the standalone kfp deployment and check if we can select specific nodepools and work with GPUs. Otherwise it will break our current workflow.

Maybe we can setup a test cluster and deploy v2 there and do some tests?

@GeorgesLorre
Copy link
Collaborator

How to submit a kfp pipeline to vertex manually:

  1. Add this to your pipeline.py
from fondant.compiler import VertexCompiler

compiler = VertexCompiler()
compiler.compile(pipeline=pipeline, output_path="pipeline.json")
  1. Invoke compilation: python pipeline.py

  2. goto the vertex ui

  3. Create new run, select the pipeline.json file

  4. in de advanced options select the kfp service account

@RobbeSneyders RobbeSneyders moved this from In Progress to Ready for development in Fondant development Oct 5, 2023
@RobbeSneyders RobbeSneyders moved this from Ready for development to Validation in Fondant development Oct 17, 2023
@RobbeSneyders
Copy link
Member Author

Released in 0.6.0.

@github-project-automation github-project-automation bot moved this from Validation to Done in Fondant development Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Infrastructure Infrastructure and deployment
Projects
Archived in project
Development

No branches or pull requests

3 participants