Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]Automatic documentation in Flyte #531

Open
2 of 13 tasks
kumare3 opened this issue Sep 30, 2020 · 20 comments
Open
2 of 13 tasks

[Feature]Automatic documentation in Flyte #531

kumare3 opened this issue Sep 30, 2020 · 20 comments
Assignees
Labels
enhancement New feature or request flyteadmin Issue for FlyteAdmin Service stale ui Admin console user interface

Comments

@kumare3
Copy link
Contributor

kumare3 commented Sep 30, 2020

Motivation: Why do you think this is important?
Flyte is a type safe orchestration platform. It understands the flow of data and quickly becomes a one stop shop for most teams pipelining usecases. As the users build a repository of workflows, tasks and launch plans, it is essential to associate documentation with these entities. The documentation would help new team members quickly ramp up on the projects and individual entities. It also lays the foundation for improved discoverability and shareability. E.g. a workflow name or a task name is not enough in describing what the intention is, but associated documentation could provide a detailed description and insight into the algorithms, business usecase etc.

This document proposes a simple and extensible way to support documentation with all Flyte entities, which is captured in Flyte Console and keeps constantly getting updated with versions.

Goal: What should the final outcome look like, ideally?

@workflow(docs=Documentation(
                     short="This pipeline is used to target the right set of drivers for incentives",
                     long_file="/path/to/file",
                     long_format=Documentation.MARKDOWN,
                     source_code=Documentation.source_code_from_config(),
                     icon="http://....",
                     tags=["planning", "campaign"]
                     )
          )
class DriverTargetingWorkflow():
  ....


@workflow(docs=Documentation(
                     short="This pipeline is used to target the right set of drivers for incentives",
                     # Defaults to source_code=Documentation.source_code_from_config(),
                     #Also long defaults to using the docstring in the format rST
                   )
          )
class DriverTargetingWorkflow():
  """
    My RST documentation
  """
  ....


Config:
[docs]
  Repo: github.com/flyteorg/xyz
  project_icon: http://...
  tags: python, java, spark, dnn

Describe alternatives you've considered
NA

Flyte component

  • Overall
  • Flyte Setup and Installation scripts
  • Flyte Documentation
  • Flyte communication (slack/email etc)
  • FlytePropeller
  • FlyteIDL (Flyte specification language)
  • Flytekit (Python SDK)
  • FlyteAdmin (Control Plane service)
  • FlytePlugins
  • DataCatalog
  • FlyteStdlib (common libraries)
  • FlyteConsole (UI)
  • Other

[Optional] Propose: Link/Inline
Specification of the protobuf in FlyteIDL that will be added to all entities -
Workflow, Tasks, LaunchPlans, Project

message SourceCode {
    // File where the code is located
    string file = 1;
    // Line number where the task definition, workflow definition, etc starts at
    int line_number = 2;
    // git repository
    string repo = 3;
    // branch of the repository
    string branch = 4;
    // link to the original repository
    string link = 5;
    // language of the code
    string langugae = 6;
}

message Documentation {
    // short description - no more than 256 characters
    string short = 1;
    // Optional information about the source code
    SourceCode info = 3;
    // Optional Tags for easy searching, categorizing etc
    repeated string tags = 4;
}

message LongDocumentation {
    // long description - no more than 4kb
    string long = 1;
    enum DescriptionFormat {
         UNKNOWN = 0;
         MARKDOWN = 1;
         HTML = 2;
         // python default documentation - comments is rst
         RST = 3;
    }
     // format of the long description
    DescriptionFormat long_format = 2;
    // Optional link to an icon for the entity
    string icon_link = 5;
}


Create*Request(
....
   Documentation docs = ...,
...
)

// We will add a special API to create and associate long form documentation
CreateDocs(
  Identifier id = 1;
  LongDocumentation docs = 2;
)

// The Long documentation will be stored in the Blob store and reference will be added to the Metastore

Additional context
The UI should be able to show this information. Also the documentation is implicitly versioned with the entities themselves

Is this a blocker for you to adopt Flyte
NA

@kumare3 kumare3 added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Sep 30, 2020
@kumare3
Copy link
Contributor Author

kumare3 commented Sep 30, 2020

@wild-endeavor
Copy link
Contributor

I like the second option - I think picking it up automatically from source code comments is best.

@kumare3
Copy link
Contributor Author

kumare3 commented Sep 30, 2020

I like the second option - I think picking it up automatically from source code comments is best.

I dont think the above are options, I think both should be supported. Python comments are by default or format rST

@katrogan
Copy link
Contributor

+1 for baking documentation into the workflow definitions.

out of curiosity can file paths (long description, optional icon) be remote? if they're not, how does the process for rendering those in the UI work?

also could we provide a means for configuring workflow statuses to their registered domains? e.g. a workflow is considered under development (by default) when registered in a 'development' domain but fully active when registered in 'production'

@schottra
Copy link
Contributor

active vs. deprecated feels like a duplication of the existing metadata field we have for Workflows/Tasks/LaunchPlans?

@honnix
Copy link
Member

honnix commented Sep 30, 2020

This looks great and I am all for attaching doc to workflow. Just a minor concern that this could potentially easily make the payload a few times bigger. Will the size be enforced at the client side or server?

@yindia
Copy link
Contributor

yindia commented Sep 30, 2020

+1 for doing this. It will open more possibility like we can redirect user from flyteconsole to the exact line number in the file. It is helpfull in debuging

@kumare3
Copy link
Contributor Author

kumare3 commented Sep 30, 2020

@honnix & @katrogan I can answer both your questions together hopefully.

  • Maybe for long form documentation & icons we could have a separate API call that takes in that information instead of making as part of the register endpoint. This will bloat the call otherwise
  • @katrogan I think we should store the data in our backing store (maybe s3 links), but it should be Admin's responsibility - so that a unified rendering experience an be provided

About
| also could we provide a means for configuring workflow statuses to their registered domains? e.g. a workflow is considered under development (by default) when registered in a 'development' domain but fully active when registered in 'production'
I agree with @schottra and I removed it from the list

@kumare3
Copy link
Contributor Author

kumare3 commented Sep 30, 2020

@schottra good point, we can derive it from the existing field. Let me remove it from the docs

@gonzedge
Copy link

gonzedge commented Oct 8, 2020

+1 for doing this too.

@schottra
Copy link
Contributor

schottra commented Oct 8, 2020

@kumare3 Any thoughts on how to map values from a version of a workflow to the overall entity? We struggled with this idea when originally adding metadata fields.
description and active are both handled as metadata at a level above the individual version, because we treat all versions of a task/workflow as one single logical entity.
The UI is also set up like this, with the Workflow and Task details page showing the NamedEntity, not an individual version.

So I'm wondering what we do if user A pushes version 123abc with a description, and user B pushes version 234def with a different description. Which one becomes the description for the workflow which is shown in the UI?

@schottra
Copy link
Contributor

schottra commented Oct 8, 2020

@kumare3 I'm also just noticing that description is another NamedEntityMetadata field that is being duplicated by this. But if I understand correctly, you're intending to allow a separate description per version of a workflow/task?

@kumare3
Copy link
Contributor Author

kumare3 commented Oct 8, 2020

@kumare3 I'm also just noticing that description is another NamedEntityMetadata field that is being duplicated by this. But if I understand correctly, you're intending to allow a separate description per version of a workflow/task?

@schottra, I did see that the NamedEntityMetadata has a description field. This does not seem to be correctly designed.

  1. it does not maintain a history of the documentation (which I think is useful).
  2. We cannot really easily tie updating the documentation with the registration process and docs are updated usually in this way

On the other hand, I do see that you use the description field, but, for example in the Tasks page, you show the interface of the task, which is not really metadata, but evolves with the version. I think we should show the docs in exactly the same way. Is it ordered by time?

@schottra
Copy link
Contributor

@kumare3
I agree that history of documentation and updating via registration are both useful.
The only concern is what criteria to use to determine which version should be shown in the Console.

For the Tasks page, we arbitrarily chose the "latest" version, whatever the first result is when hitting the Task endpoint. That doesn't actually hold any semantic value, since the most recently registered version can be any random version from some person's feature branch.
In practice, it's probably most likely correct. But I'm wary of just taking the latest version for descriptions with no actual mechanism in place to let users specify which version is the main/production/definitive version.
Thoughts?

@kumare3
Copy link
Contributor Author

kumare3 commented Mar 11, 2021

On second, thought, Maybe not everything should be versioned, but a clean flow to have registration and documentation tied would be great - @katrogan

@kumare3 kumare3 added flyteadmin Issue for FlyteAdmin Service ui Admin console user interface labels Mar 11, 2021
@katrogan
Copy link
Contributor

@kumare3 that's what the NamedEntity stuff already covers, right? +1 we should improve how we expose it to users to edit and update

@kumare3
Copy link
Contributor Author

kumare3 commented Mar 12, 2021

@kumare3 that's what the NamedEntity stuff already covers, right? +1 we should improve how we expose it to users to edit and update

@katrogan yup, but I think the git links, line no etc should be added to every taskspec?

@heidimhurst
Copy link

+1, this would be an excellent feature to roll into an upcoming release

@kumare3
Copy link
Contributor Author

kumare3 commented Mar 3, 2023

This is now infact implemented in the code and backend and not
Implemented in UI yet

eapolinario pushed a commit to eapolinario/flyte that referenced this issue Jul 24, 2023
eapolinario pushed a commit to eapolinario/flyte that referenced this issue Aug 21, 2023
Copy link

Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable.
Thank you for your contribution and understanding! 🙏

@github-actions github-actions bot added the stale label Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request flyteadmin Issue for FlyteAdmin Service stale ui Admin console user interface
Projects
None yet
Development

No branches or pull requests

8 participants