-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add freeform task execution metadata to task executions table #325
Comments
This has to be done in two parts |
@EngHabu we already do have https://github.com/flyteorg/flyteidl/blob/da3f0f7695147f79a65f0348ba0f626f054ad28e/protos/flyteidl/event/event.proto#L147 although it is a struct rather than map. this is accessible for calls to get and list task executions https://github.com/flyteorg/flyteidl/blob/master/protos/flyteidl/admin/task_execution.proto#L98 |
Signed-off-by: Sean Lin <[email protected]>
* add retry wip Signed-off-by: Sonja Ericsson <[email protected]> * fix Signed-off-by: Sonja Ericsson <[email protected]> * fix Signed-off-by: Sonja Ericsson <[email protected]> * fix Signed-off-by: Sonja Ericsson <[email protected]> * add tests Signed-off-by: Sonja Ericsson <[email protected]> * fix Signed-off-by: Sonja Ericsson <[email protected]> * fix Signed-off-by: Sonja Ericsson <[email protected]> * fix Signed-off-by: Sonja Ericsson <[email protected]> * fix Signed-off-by: Sonja Ericsson <[email protected]> * fix Signed-off-by: Sonja Ericsson <[email protected]> * fix lint Signed-off-by: Sonja Ericsson <[email protected]>
…g#325) Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.25.11 to 1.26.5. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](urllib3/urllib3@1.25.11...1.26.5) --- updated-dependencies: - dependency-name: urllib3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* supporting data loading config for k8s pod type Signed-off-by: Daniel Rammer <[email protected]> * should be checking dataLoadingConfig for nil Signed-off-by: Daniel Rammer <[email protected]> * updated flyteidl Signed-off-by: Daniel Rammer <[email protected]> --------- Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Sean Lin <[email protected]>
* add retry wip Signed-off-by: Sonja Ericsson <[email protected]> * fix Signed-off-by: Sonja Ericsson <[email protected]> * fix Signed-off-by: Sonja Ericsson <[email protected]> * fix Signed-off-by: Sonja Ericsson <[email protected]> * add tests Signed-off-by: Sonja Ericsson <[email protected]> * fix Signed-off-by: Sonja Ericsson <[email protected]> * fix Signed-off-by: Sonja Ericsson <[email protected]> * fix Signed-off-by: Sonja Ericsson <[email protected]> * fix Signed-off-by: Sonja Ericsson <[email protected]> * fix Signed-off-by: Sonja Ericsson <[email protected]> * fix lint Signed-off-by: Sonja Ericsson <[email protected]>
* supporting data loading config for k8s pod type Signed-off-by: Daniel Rammer <[email protected]> * should be checking dataLoadingConfig for nil Signed-off-by: Daniel Rammer <[email protected]> * updated flyteidl Signed-off-by: Daniel Rammer <[email protected]> --------- Signed-off-by: Daniel Rammer <[email protected]>
…#325) ## Overview This PR fixes an issue where FlytePropeller restarts and the fasttask plugin checks the status of a task execution where the task execution context does not exist. Rather than fail, it creates the task execution context (that will be cleaned up later) and returns running. ## Test Plan This was tested locally under a variety of failure scenarios. ## Rollout Plan (if applicable) This may be rolled out immediately. ## Upstream Changes Should this change be upstreamed to OSS (flyteorg/flyte)? If not, please uncheck this box, which is used for auditing. Note, it is the responsibility of each developer to actually upstream their changes. See [this guide](https://unionai.atlassian.net/wiki/spaces/ENG/pages/447610883/Flyte+-+Union+Cloud+Development+Runbook/#When-are-versions-updated%3F). - [ ] To be upstreamed to OSS ## Issue https://linear.app/unionai/issue/COR-1128/fasttask-plugin-checks-status-of-task-execution-failure-on-restart ## Checklist * [x] Added tests * [ ] Ran a deploy dry run and shared the terraform plan * [ ] Added logging and metrics * [ ] Updated [dashboards](https://unionai.grafana.net/dashboards) and [alerts](https://unionai.grafana.net/alerting/list) * [ ] Updated documentation
Motivation: Why do you think this is important?
Allow additional data to be passed as task execution metadata. It's unclear how useful each of these pieces is at the moment but surfacing them might be a good way to figure that out.
Goal: What should the final outcome look like, ideally?
Add execution metadata to task execution events.
An example of this task metadata could be - nodeId for a pod where the node was executed
Cluster id for the hive cluster etc
Qubole Command ID
Resource Token that was associated
podID
AWS Batch Job ID
Spark application id
etc
Thoughts: A simple Map<string,string> called executionMetadata, which we can show as tabular information in the UI
For node name it is part of the podspec - https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#pod-v1-core
Flyte component
The text was updated successfully, but these errors were encountered: