Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Expressions] Use table column ID instead of name when set #99724

Merged
merged 10 commits into from
Jun 1, 2021

Conversation

wylieconlon
Copy link
Contributor

@wylieconlon wylieconlon commented May 10, 2021

Lens uses the esaggs expression function which always has a different column id and name. The id of the column represents the JSON key where the data is stored in each row, for example { 'col-0-a': 0 } has an ID of col-0-a and a name of bytes.

This PR fixes the mapColumn behavior when the optional id parameter is set. In a previous iteration of this PR, the fix was to introduce a complex set of lookups. This is no longer the approach, instead the logic is simple, and can easily be copied as a fix for another set of bugs in Canvas. The new behavior is:

  • If mapColumn is called without an ID, look up the first matching column by name, and use that ID. Example:
esaggs index={indexPatternLoad id="90943e30-9a47-11e8-b64d-95841ca0b247"} 
  aggs={aggSum id="a" enabled=true schema="metric" field="bytes"} metricsAtAllLevels=false partialRows=false
| mapColumn name="Sum of bytes" expression={math "col-0-a + 1"}
| render as="debug"
{
  "datatable": {
    "type": "datatable",
    "columns": [
      {
        "id": "col-0-a",
        "name": "Sum of bytes",
        "meta": { "type": "number" }
      }
    ],
    "rows": [{ "col-0-a": 79725690 }]
  }
}
  • If mapColumn is called with id and name, look up only by id. In this example, the name is changed by the mapColumn call, but the ID is kept:
esaggs index={indexPatternLoad id="90943e30-9a47-11e8-b64d-95841ca0b247"} 
  aggs={aggSum id="a" enabled=true schema="metric" field="bytes"} metricsAtAllLevels=false partialRows=false
| mapColumn id="col-0-a" name="Total bytes" expression={math "col-0-a + 1"}
| render as="debug"
{
  "datatable": {
    "type": "datatable",
    "columns": [
      {
        "id": "col-0-a",
        "name": "Total bytes",
        "meta": { "type": "number" }
      }
    ],
    "rows": [{ "col-0-a": 79725690 }]
  }
}

Previous version of this PR

For code reviewers who previously reviewed this change, the behavior described above is not what was originally implemented. The main reason for this is that all datatable columns have an ID property, which I had missed due to the way I was testing. Also, @cqliu1 fixed the last remaining datatable that was missing an ID property in #98561. You can verify this by opening canvas and typing essql "select bytes from kibana_sample_data_logs | render as="debug", which produces a table with a matching ID and name. Because we can now guarantee that id` is set on the table, we don't need the complex fallback logic.

Checklist

@wylieconlon wylieconlon added Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) v8.0.0 Team:AppServices release_note:skip Skip the PR/issue when compiling release notes v7.14.0 labels May 10, 2021
@wylieconlon wylieconlon requested a review from a team as a code owner May 10, 2021 21:53
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServices)

const existingColumnIndex = columns.findIndex(({ name }) => name === args.name);
const existingColumnIndex = columns.findIndex(({ id, name }) => {
// Columns that have IDs are allowed to have duplicate names, for example esaggs
if (id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also check whether args.id is set as well here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flash1293 @poffdeluxe I've just changed the logic here since your last review. I'm now checking id === args.name if the column has an ID, which makes sense to me as a fallback. What do you all think of this change?

Copy link
Contributor

@poffdeluxe poffdeluxe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this across several workpads using mapcolumn and math and everything worked great still

const existingColumnIndex = columns.findIndex(({ id, name }) => {
// Columns that have IDs are allowed to have duplicate names, for example esaggs
if (args.id) {
return id === args.id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the args.id is passed but the column does not have id is it ok to return false?
I didn't see any test about this. I assume it's ok, but probably it's better to make it clear somewhere (either with a test or documentation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dej611 good catch, I thought I had a test for this but I will add one.

Copy link
Contributor

@dej611 dej611 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left only a comment about some edge case clarification.
Apart from that, LGTM.

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Paraphrasing the new logic to make sure I understood correctly:

  • If id is set, id is used to find a column
  • If id is not available, name is used as fallback
  • It's not possible to reference by name if an id is present

@flash1293
Copy link
Contributor

@elasticmachine merge upstream

Copy link
Contributor

@streamich streamich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppServices changes LGTM.

@wylieconlon
Copy link
Contributor Author

Sorry to all the code reviewers, but I've changed the behavior of this PR to make it much simpler. It turns out that I was making a bad assumption about the lack of id property on SQL, because the _sql endpoint does not have id, but ID is added by Canvas. That bad assumption led to more bad decisions.

I have written a much more detailed set of examples in the PR description up at the top. I hope you will all take some time to re-review as I think this is in a much better state now.

@flash1293 @dej611 @poffdeluxe @streamich @cqliu1 @crob611

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two nits, but otherwise this looks good to me after having played around with it a bit.

As names aren't unique anymore, it's a bit weird to only reference a column by name now - first one wins which makes kind of sense, but maybe we can introduce a breaking change in 8.0 to require the id to simplify this logic. I think it's fine for now though.

@@ -114,7 +122,7 @@ export const mapColumn: ExpressionFunctionDefinition<
}

if (existingColumnIndex === -1) {
columns.push(newColumn);
columns[columnLength] = newColumn;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wrong with using .push here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the existingColumnIndex was updated for each row, now it's updated once. So if I used .push it would add a new column for each row.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this executed once per row? It seems like only https://github.com/elastic/kibana/pull/99724/files/9f4d6474708ba8f11375fc81893cc153bb87ba7c#diff-30a63ac89404e0e4c5cebad82ffb721533d9aae11810cb2d982d1b45d6fecbbfR102-R109 is executed per row, the rest of the code is running only once per mapColumn invocation. Promise.all returns a single promise which is resolved once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I think I just misread this part. Will update.

const mathContext = isDatatable(input)
? pivotObjectArray(
input.rows,
input.columns.map((col) => col.name)
input.columns.map((col) => col.id ?? col.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? If we can be sure there is an id now, we can simply use input.columns.map((col) => col.id), right?

@@ -44,7 +44,7 @@ export const mapColumn: ExpressionFunctionDefinition<
types: ['string', 'null'],
help: i18n.translate('expressions.functions.mapColumn.args.idHelpText', {
defaultMessage:
'An optional id of the resulting column. When `null` the name/column argument is used as id.',
'An optional id of the resulting column. When no id is provided, the id will be looked up from the existing column and default to the name.',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had trouble inferring everything that happens from this description. Maybe like this? ... will be looked up from the existing column by the provided name argument. If no column with this name exists yet, a new column with this name and an identical id will be added to the table

@flash1293
Copy link
Contributor

Thanks for the changes, looks great to me!

@flash1293
Copy link
Contributor

@elasticmachine merge upstream

return name === args.name;
});
const columnId =
existingColumnIndex === -1 ? args.id ?? args.name : columns[existingColumnIndex].id;
Copy link
Member

@ppisljar ppisljar May 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ids should be unique, if we find existing column with requested new column id we should throw

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo the logic should be:

const columnId = args.id ?? args.name;
if(columns.find({id} => id === columnId)) throw('id already exists');

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ppisljar Column IDs are guaranteed to be unique with this logic, I am not seeing any case where you could get a duplicate ID. Also, this logic lets you rewrite a column by ID, this is shown in some of the test cases. For example mapColumn id="existing" name="New name" expression={math "existing + 2"} is a valid expression that updates the data in place.

const mathContext = isDatatable(input)
? pivotObjectArray(
input.rows,
input.columns.map((col) => col.name)
input.columns.map((col) => col.id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is meant by fallback to names ? the fact that name will be used for columnId when id was not provided (to map_column for example ?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if no id is provided we find the first column with the name. This is required for backwards compatibility.

Copy link
Member

@ppisljar ppisljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wylieconlon
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
expressions 198.9KB 199.2KB +333.0B
Unknown metric groups

References to deprecated APIs

id before after diff
canvas 29 25 -4
crossClusterReplication 8 6 -2
fleet 22 20 -2
globalSearch 4 2 -2
indexManagement 12 7 -5
infra 256 146 -110
lens 67 45 -22
licensing 18 15 -3
lists 239 236 -3
maps 286 208 -78
ml 121 115 -6
monitoring 109 56 -53
securitySolution 390 346 -44
stackAlerts 101 95 -6
total -340

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@wylieconlon wylieconlon merged commit 72d5b8a into elastic:master Jun 1, 2021
@wylieconlon wylieconlon deleted the map-column-id branch June 1, 2021 21:06
wylieconlon pushed a commit to wylieconlon/kibana that referenced this pull request Jun 1, 2021
…9724)

* [Expressions] Use table column ID instead of name when set

* Update ID matching to match by name sometimes

* Add an extra case to prevent insertion of duplicate column

* Simplify logic and add test for output ID

* Respond to review comments

Co-authored-by: Kibana Machine <[email protected]>
wylieconlon pushed a commit that referenced this pull request Jun 1, 2021
…101110)

* [Expressions] Use table column ID instead of name when set

* Update ID matching to match by name sometimes

* Add an extra case to prevent insertion of duplicate column

* Simplify logic and add test for output ID

* Respond to review comments

Co-authored-by: Kibana Machine <[email protected]>

Co-authored-by: Kibana Machine <[email protected]>
jloleysens added a commit to jloleysens/kibana that referenced this pull request Jun 2, 2021
…sens/kibana into reporting/new-png-pdf-report-type

* 'reporting/new-png-pdf-report-type' of github.com:jloleysens/kibana: (46 commits)
  [Security Solution] Add Ransomware canary advanced policy option (elastic#101068)
  [Exploratory view] Core web vitals (elastic#100320)
  [Security solution][Endpoint] Add unit tests for fleet event filters/trusted apps cards (elastic#101034)
  [Lens] Use a setter function for the dimension panel (elastic#101123)
  [Index Patterns] Fix return saved index pattern object (elastic#101051)
  [CI] For PRs, build TS refs before public api docs check (elastic#100791)
  [Maps] fix line and polygon label regression (elastic#101085)
  Migrate CCR to new ES JS client. (elastic#100131)
  [Canvas] Switch Canvas to use React Router (elastic#100579)
  [Expressions] Use table column ID instead of name when set (elastic#99724)
  [DOCS] Updates docs landing page (elastic#100749)
  [DOCS] Corrects typo in step 3 (elastic#101079)
  [DOCS] Updates runtime example in Discover (elastic#100926)
  Migrate kibana.autocomplete config to data plugin (elastic#100586)
  [Uptime] New width/delay definition for waterfall sidebar item tooltip (elastic#100147)
  [FTR] Use importExport for saved_object/basic archive (elastic#100244)
  [Fleet] Better input for multi text input in agent policy builder (elastic#101020)
  [CI] Buildkite support with Baseline pipeline (elastic#100492)
  [Reporting/Telemetry] Do not send telemetry if we are in screenshot mode (elastic#100388)
  Create API keys with metadata (elastic#100682)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) Feature:Lens release_note:skip Skip the PR/issue when compiling release notes v7.14.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants