Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI: DAG Conf Parameters Form - Param Type of Boolean, Null converted to String for JSON config #36369

Closed
2 tasks done
mblakely opened this issue Dec 22, 2023 · 7 comments
Closed
2 tasks done
Labels
area:UI Related to UI/UX. For Frontend Developers. good first issue kind:bug This is a clearly a bug pending-response stale Stale PRs per the .github/workflows/stale.yml policy file

Comments

@mblakely
Copy link

mblakely commented Dec 22, 2023

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.7.3

What happened?

In the Trigger DAG end point, using the DAG conf Parameters form will incorrectly convert a boolean to a string when the parameter type is defined as:

Param(
                default=None,
                type=["boolean", "null"],
            ),

This can be seen by comparing the value in the field to what is produced in the generated JSON config
image

If this JSON configuration is submitted, then the validation will raise an error because "True" is not a valid boolean
image

Note for comparison, the "use_ndssr" field is defined as:

Param(
                default=False,
                type="boolean"

works as expected: the JSON config has a valid JSON boolean type and it passes the parameter validation.

What you think should happen instead?

When a parameter is defined as a union of boolean and null (like below), the DAG Conf Parameters form should generate a value in the Generated JSON config that is a boolean.

Param(
                default=None,
                type=["boolean", "null"],
            ),

It should be:

{
    "parameter_name": true
}

instead of

{
    "parameter_name": "True"
}

How to reproduce

  1. Create a DAG that has a parameter with these types:
Param(
                default=None,
                type=["boolean", "null"],
            ),
  1. Navigate to the Trigger DAG Run for that DAG
  2. Add a value of true or false in the DAG Conf Parameters form
  3. Note the value in the Generated JSON config has quotes
  4. Press submit, and see an error message like:
    image

Operating System

Ubuntu

Versions of Apache Airflow Providers

No response

Deployment

Other Docker-based deployment

Deployment details

No response

Anything else?

This issue occurs every time the DAG Conf parameter form is used and there is at least one parameter defined like:

Param(
                default=None,
                type=["boolean", "null"],
            ),

This issue can be worked around by directly updating the value in Generated JSON configuration after a value has been added in the form:
image

Based off a similar issue and subsequent resolution, updates are likely needed in (or around) `airflow/www/templates/airflow/trigger.html

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@mblakely mblakely added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Dec 22, 2023
Copy link

boring-cyborg bot commented Dec 22, 2023

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@potiuk potiuk added good first issue and removed needs-triage label for new issues that we didn't triage yet labels Dec 22, 2023
@jscheffl
Copy link
Contributor

jscheffl commented Jan 5, 2024

Hi @mblakely Can you tell me a bit more details about your use case and motivation "why" you want to have a boolan and "null" value as union option? The UI can only render (today) a element which generates true/false - but a user can not select "null" - therefore during implementation this was not considered.

What I am thinking is whether this use case is really needed and UI needs to support it (==bug) or if it is not a valid use case that we rather generate a parsing warning (==missing feature).

Reason why the UI behaves like this today is that it checks for type==boolean (which it is not in your example) to decide for a on/off option, if no other specific type matches, it falls back to a string entry field and therefore generates a string.

To add more complexity to my initial question, what also is not covered might be the case of type=["boolean", "string"] which also would fall-back to string but this time it would be valid. Union types can generate a lot of UI complexity and many potential "pitfalls", that is why we need a better understanding of the use case.

@jscheffl jscheffl added pending-response area:UI Related to UI/UX. For Frontend Developers. and removed area:core labels Jan 5, 2024
Copy link

This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 20, 2024
@mblakely
Copy link
Author

mblakely commented Jan 26, 2024

@jscheffl sorry for the delay.
That is a great question, at a high level, the goal is to create an optional parameter so that code which the DAG calls can define a default. Without the null option, the parameter must be specified in the DAG parameters to be able to load the DAG (due to default validation).

Our use case has a library of code that is sometimes (increasingly, usually) run from Airflow but also needs to be run in other environments.
For example imagine a file like:

data_processing.py 
def process(image_directory: Path, should_parallelize: bool = True, ...):
   ...

We created a DAG as a wrapper around that function using Airflow Params to parameterize some of the values in the function the DAG will delegate to

data_processing_dag.py
from data_processing import process
...
with DAG(
   "process_data"
   params={
      "image_directory": Param(type=string, default="/path/to/a/shared/directory"),
     "should_parallelize": Param(type=["boolean", "null"])
   }
   @task
   def do_stuff(params, **kwargs)
      process(image_directory=Path(params["image_directory"]), should_parallelize=params.get("should_parallelize"))
   do_stuff()

Our actual setup is different so please don't nitpick the code above, the goal is give a rough sense of how we use this.

This allows us to keep the default behavior defined in a centralized place (in the process function) but allow the DAG to override it.

We have other similar use cases where we have "optional" types of strings, numbers, integers, etc.

There are a few more complicated ones like:

default={"dry_run": None},
type="object",
required=["dry_run"],
properties={"dry_run": {"type": ["boolean", "null"]}},

or

default=[[50, 48], [50, 72], [70, 48], [70, 72]],
type=["null", "array"],
items={
              "type": "array",
              "prefixItems": [
                   {"type": "integer"},
                   {"type": "integer"},
                ],  # first and second element must be ints
               "items": False,  # no additional items can be added to the array

As for the other union types, I agree that would add a ton of complexity. For our use cases, I can't think of a need to allow for different types besides a type and null.

@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 27, 2024
@jscheffl
Copy link
Contributor

@jscheffl sorry for the delay. That is a great question, at a high level, the goal is to create an optional parameter so that code which the DAG calls can define a default. Without the null option, the parameter must be specified in the DAG parameters to be able to load the DAG (due to default validation).

Thanks for the detailed explanation @mblakely!

So in my words if I correctly understand you want to define a parameter as you wrap a DAG around some existing function. That function receives a boolean parameter but you want to have the option to call also without these parameters.

Is your requirement only "I want to call w/o need to define all parameters" or is your wrapper required to "I need to distinguish three cases for boolean=[True,False,None]"?

If you are using type=["string, "null"] and type=["boolean", "null"] just to prevent that a caller need to pass all parameter values, then you could assign just default values like:

with DAG(
   "process_data"
   params={
      "image_directory": Param("/path/to/a/shared/directory", type="string"),
      "should_parallelize": Param(True, type="boolean")
   }
   ...

Copy link

This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Feb 11, 2024
Copy link

This issue has been closed because it has not received response from the issue author.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:UI Related to UI/UX. For Frontend Developers. good first issue kind:bug This is a clearly a bug pending-response stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

No branches or pull requests

3 participants