Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Internal row structure for partial filtering / sorting and lazy-loading #4851

Closed
flaviendelangle opened this issue May 12, 2022 · 15 comments · Fixed by #4927
Closed

[RFC] Internal row structure for partial filtering / sorting and lazy-loading #4851

flaviendelangle opened this issue May 12, 2022 · 15 comments · Fixed by #4927
Assignees
Labels
breaking change component: data grid This is the name of the generic UI component, not the React module! discussion feature: Row grouping Related to the data grid Row grouping feature feature: Tree data Related to the data grid Tree data feature RFC Request For Comments v6.x

Comments

@flaviendelangle
Copy link
Member

flaviendelangle commented May 12, 2022

This is a draft to start discussing about this subject and prepare for v6

Introduction

We currently store the rows in a flat structure (in state.rows.ids) and then create a tree from it if needs be.
Any update to the rows is at first an update to the flat row list followed by a full re-generation of the tree
This has several implications:

  • When adding / removing / modifying a row, the grid always triggers a re-generation of the whole tree

  • When adding / removing / modifying a row, the grid always re-applies the sorting, filtering and aggregation on all the rows, even those in groups not impacted at all by the modification

  • When toggling a group expansion, the grid always re-applies the filtering on all the rows

  • In the tree, the group are represented by there parent row. But for the top level rows, there is no notion of "top level group" since the top level rows do not have a parent. In [data grid] Implement Aggregation (not publicly released) #4208, I had to invent a fake row ID to store aggregated data of the top level rows.

  • In the tree, the group are represented by there parent row. It is therefore impossible to do lazy loading with Row Grouping without having at least one row of each group on the initial fetch.

  • The auto generated rows (grouping rows for Tree Data or Row Grouping and aggregation footer) are regular rows in the state. They can be updated / removed by apiRef.current.updateRows which makes no sense.

Note: It is unclear to me exactly how far we can optimize to avoid re-computing sorting / filtering / aggregation on non-impacted groups. But I am sure that we can do non-negligible optimizations and that we need a better internal state structure for it.

In summary, our current row state is not suited for large datasets, especially when it comes to Row Grouping because any change in the dataset triggers computation on the whole dataset, even when it is a localized update.

Proposal

AG-Grid paragraph

In the presentation below, row means "real row". The auto generated row are no longer stored as regular row in the state. They can't be updated / removed using the same API as the regular row.

We do have a slight nomenclature issue because the word row can have two meanings.

  1. An element in the data given by the user (does not include the groups)
  2. A row in the HTML table (does include the group)

This can create confusion, but AG-Grid also calls the data given by the user "rows" so it's probably not that problematic.

New state structure

/**
 * Equivalent of the current GridRowTreeNodeConfig for real rows
 */
interface GridRowConfig {
  /**
   * The id of the row
   */
  id: GridRowId;

  /**
   * The id of the group in which this row id
   */
  groupId: string;

  /**
   * 0-based depth of the row in the tree.
   */
  depth: number;
}

/**
 * Equivalent of the current GridRowTreeNodeConfig for groups
 */
interface GridRowGroup {
  /**
   * The id of the group. It the group is created from a real row, it is the ID of the row.
   * If not, it is an auto-generated ID
   */
  id: string;

  /**
   * If the group is created from a real row (in the Tree Data), then we store its id.
   */
  rowId: GridRowId | null;

  /**
   * The id of the parent group (null for the group containing the top level rows).
   * Today, groups of depth = 0 don't have parent, but in the new structure they will have the top level group for parent.
   */
  parent: GridRowId | null;

  /**
   * The id of the groups children of the current group
   */
   childrenGroups?: string[]

  /**
   * The id of the rows children of the current group
   */
  childrenRows?: GridRowId[];

  /**
   * Current expansion status of the group.
   */
  childrenExpanded?: boolean;

  /**
   * Behavior to be clarified.
   */
  depth: number;

    /**
   * The key used to group the children of this row.
   */
  groupingKey: GridKeyValue;

  /**
   * The field used to group the children of this row.
   * Is `null` if no field has been used to group the children of this row.
   */
  groupingField: string | null;
}

type GridRowConfigLookup = Record<GridRowId, GridRowConfig>

type GridRowGroupTree = { [groupId: string]: GridRowGroup }

interface GridRowsState {
  tree: GridRowTree
}

Tree generation

  1. Only fully re-generate the tree when props.rows changes or apiRef.current.setRows is called (new dataset)

  2. When calling apiRef.current.updateRows, insert / remove / update the rows inside the current tree (without mutating of course)
    Each row tree generator is responsible for handling these changes (we currently have 3: Flat / Row Grouping / Tree Data)

    Examples:

    • When removing a row on Row Grouping, remove it from its parent group
    • When adding a row on Row Grouping, add it to its parent group
    • When updating a row on Row Grouping, check if the group changes, if so remove it from its prior parent group and add it to its new one
    • When removing a row on Tree Data, if it has children, replace its group by an auto generated one, if not remove its group from the tree
    • ...

    For the Row Grouping and Flat trees, the behavior is quite simple. For the Tree Data it becomes a lot more complicated since the groups are derived from real rows.

Sorting / Filtering

Sorting and filtering now listens to the rowGroupChange event which takes for argument the list of the group ids that were modified during the last row update

We may need to pass more information about this update (for instance "group X: row removed") to fine tune the update. For instance do not re-apply sorting on a group where we just removed rows.
This do not need to be done in the 1st version, the format must just be compatible with future improvements.

Related issues

@flaviendelangle flaviendelangle added breaking change discussion v6.x feature: Tree data Related to the data grid Tree data feature feature: Row grouping Related to the data grid Row grouping feature labels May 12, 2022
@flaviendelangle flaviendelangle self-assigned this May 12, 2022
@flaviendelangle
Copy link
Member Author

@m4theushw @cherniavskii @DanailH @alexfauquette

I think this is a major change that we need to prepare for v6 (whatever structure we decide to take, even if its very different to what I describe here) if we want to have a good data management with lazy loading and row grouping.

@DanailH
Copy link
Member

DanailH commented May 12, 2022

Great summary @flaviendelangle. I think it's good to start discussing this topic now as it will also impact the way the lazy loading will work initially.

  • In the tree, the group are represented by there parent row. It is therefore impossible to do lazy loading with Row Grouping without having at least one row of each group on the initial fetch.

Just to be sure I understand the problem correctly - we would need to know which row is a parent of a group in advance, correct? Because if it is regarding the children can't we put fake rows inside the group, for example, skeleton rows which will have to be inserted either way?

@flaviendelangle
Copy link
Member Author

flaviendelangle commented May 12, 2022

@DanailH

we would need to know which row is a parent of a group in advance, correct?

Not sure to understand, sorry if my answer misses your point.

The problem for row grouping with lazy loading is that I need to know the list of the groups without having the rows to group them.

For instance, the clients says "I have 3 release year 2022, 2021 and 2020" without giving me one movie of each group.
Here is AG Grid paragraph for that topic.

We could of course ask for the list of groups, then create fake rows that would fit into these groups, then create the Row Grouping tree, and we would have those groups.
But it would be a very convoluted way of doing it in comparison with just having a real notion of group in state that do not need real children to exist

These groups could have auto generated children (for skeleton loading for instance).
But to keep the example of the release year, I don't think the skeleton row of the group "Year 2020" should have year: 2020 in its model.
I think those rows should be added after the grouping (like I'm doing for the aggregation), not before.
And I also think those rows should not be present in state.rows.ids but just in the state.rows.tree and in the state.filter and state.sorting (because they are impacted by the sorting and filtering).


My main goal is to create a real notion of group, to be able to apply logic to a group (sorting / filtering / aggregation / lazy loading / ...)

It's a step forward in the differentiation between the data and the visible rows.

On one side you have the data where everything is by group and you can trigger logic on specific groups or on all (i.e on the top level group).

On the other side, you have the visible rows, which are flat, make no difference between a real row, a footer, a grouping row etc...

@flaviendelangle
Copy link
Member Author

Side note: my RFC does not cover the impact on the sorting and filtering state.
I will add a paragraph later but basically I think these state should be by group, something like:

filteredRows: { [groupId: string]: { [rowId: GridRowId]: boolean }  }

That way we can easily re-run the sorting / filtering on some of the groups.
And then have a selector that takes those per-group values and returns a flat look (or flat list for the sorting).

@cherniavskii
Copy link
Member

I have few minor suggestions to the state structure above:

@@ -8,7 +8,7 @@ interface GridRowConfig {
   id: GridRowId;
 
   /**
-   * The id of the group in which this row id
+   * The id of the group in which this row is
    */
   groupId: string;
 
@@ -18,15 +18,17 @@ interface GridRowConfig {
   depth: number;
 }
 
+type GridRowGroupId = string;
+
 /**
  * Equivalent of the current GridRowTreeNodeConfig for groups
  */
 interface GridRowGroup {
   /**
-   * The id of the group. It the group is created from a real row, it is the ID of the row.
+   * The id of the group. If the group is created from a real row, it is the ID of the row.
    * If not, it is an auto-generated ID
    */
-  id: string;
+  id: GridRowGroupId;
 
   /**
    * If the group is created from a real row (in the Tree Data), then we store its id.
@@ -42,7 +44,7 @@ interface GridRowGroup {
   /**
    * The id of the groups children of the current group
    */
-  childrenGroups?: string[];
+  childrenGroups?: GridRowGroupId[];
 
   /**
    * The id of the rows children of the current group
@@ -76,5 +78,5 @@ type GridRowConfigLookup = Record<GridRowId, GridRowConfig>;
 type GridRowGroupTree = { [groupId: string]: GridRowGroup };
 
 interface GridRowsState {
-  tree: GridRowTree;
+  tree: GridRowGroupTree;
 }

Does this make sense?

@m4theushw
Copy link
Member

Looking at the attributes of GridRowGroup, if a group is composed of other groups, would childrenRows contains all rows of all children groups or only the immediate rows?

Have you thought about borrowing the concept of FiberNode from React and applying it here? Everything (rows and groups) could be a node and what would differ them is the value of a tag attribute. Each node could point to its sibling in the correct sorting order. We could also store in the node whether it satisfies or not the current filtering criteria. Currently, these information are spread across a few different states and we have to always intersect them. I agree we should improve the tree structure, but this is also an opportunity to keep data into a single place.

@flaviendelangle
Copy link
Member Author

flaviendelangle commented May 13, 2022

@m4theushw

Looking at the attributes of GridRowGroup, if a group is composed of other groups, would childrenRows contains all rows of all children groups or only the immediate rows?

The wording is of course open to discussion
For me:

  • childrenGroups: the direct children (not grand children) groups
  • childrenRows: the direct children (not grand children) rows, do not contain the groups

Have you thought about borrowing the concept of FiberNode from React and applying it here? [...]

It would be a massive change but it's probably worth experimenting.

Something like this ?

interface GridTreeNode {
  id: GridRowId;
  depth: number;
  children?: GridRowId[];
  parent: GridRowId;
  sortedChildren?: GridRowId[];
  isPassingFilters?: boolean;
  isCollapsed?: boolean // if true the row is part of a collapsed group and therefore not visible
}

interface GridRowNode extends GridTreeNode {
  type: 'data-row',
}

interface GridGroupNode extends GridTreeNode {
  type: 'group',
  groupingKey: GridKeyValue
  groupingField: string | null
  // Should we also store the aggregated values here ?
}

// The footer for the aggregation, they are not filtered nor sorted
interface GridFooterNode extrends GridTreeNode {
  type: 'footer',
}

Unlike today, we would have a GridTreeNode for the top level group with a reserved name.


I think we should have a listing of the groups (without the data rows) in one place
To be able to loop across them to apply behaviors without looping across all the rows.
But it would be interesting to do a full POC (with Aggregation if possible) to see what's viable or not.


@cherniavskii yes your changes make sense, we will clearly have to clean this interface to make it as clear as possible

@flaviendelangle
Copy link
Member Author

One drawback of storing the filtering like I show above is that it has a higher cost to reset the filter.
Right now you just wipe the lookup.
With this approach, you have to loop across the rows to remove it.

@alexfauquette
Copy link
Member

It does not shock me that resetting filtering takes as much time as applying a new one.

About having to loop on each row, we are a bit trapped by the virtualization which requires computing the position of each rows in the page with a reducer.

So as soon as you modify the sorting/filtering we will have to loop on the remaining rows in the correct order to know the y coordinate of those rows

@flaviendelangle
Copy link
Member Author

The rows of the current page
If the pagination is enabled there is a major difference.

@m4theushw
Copy link
Member

Something like this ?

More or less, I would go further and not store the children as array but as a pointer to the first child, as below:

interface GridTreeNode {
  id: GridRowId;
  depth: number;
  child?: GridTreeNode; // the first row of a group, the second row goes into the nextNode of the first
  nextNode?: GridTreeNode; // the sibling row
  parent?: GridTreeNode;
  isPassingFilters?: boolean;
  isCollapsed?: boolean // if true the row is part of a collapsed group and therefore not visible
}

interface GridRowNode extends GridTreeNode {
  type: 'data-row',
}

interface GridGroupNode extends GridTreeNode {
  type: 'group',
  groupingKey: GridKeyValue
  groupingField: string | null
  // Should we also store the aggregated values here ?
}

// The footer for the aggregation, they are not filtered nor sorted
interface GridFooterNode extrends GridTreeNode {
  type: 'footer',
}

One drawback of storing the filtering like I show above is that it has a higher cost to reset the filter.

Yeah, but is it important to reset the filter faster than to apply the filter? If that's a problem we can have a second tree with all nodes visible, then we only switch it. I just think that the idea of using a tree for everything is better than to intersect a few states to get which rows should be visible and in which order.

@flaviendelangle
Copy link
Member Author

More or less, I would go further and not store the children as array but as a pointer to the first child, as below:

That's an even more massive change
I would need to implement it to really see if it's viable

@flaviendelangle
Copy link
Member Author

Some early feedbacks on #4927

I implemented a 1st version of the new structure, with partial tree update, but without the partial filtering / sorting / aggregation (it's hard to split #4927 into small PRs but partial filtering / sorting / aggregation can probably be implemented one by one).

  • Storing the actual siblings and / or children in the nodes is not viable because it creates to much interconnection between the nodes and with React immutable approach it's a living hell.
    When you want to update a node, you have to also update its siblings to register the new node.
    But then the siblings themselves have a new object reference (unless we mutate the node which is not a good idea), so you have to update there siblings as well, etc...
    Same thing for children / parents
    And in the end, every change to the tree is a full regeneration of the tree.

  • Partial updates of the tree is working fine. There is one performance bottleneck when removing nodes, I have a linear complexity to remove them for them parents (when removing N nodes in a group of P nodes, I have a complexity of N*P instead of P because I remove the nodes one by one. Batching the removal like we do in updateRows is probably doable but I need to be careful for edge cases.

  • The new structure would be a lot easier to play with for row reorder. We could probably enable row reordering with row grouping and only allow reordering inside a group.
    And for row reordering across group, the foundations will be easier to play with but it will still be hard to implement.

  • The Tree Data partial update is harder to achieve because groups provided by the user can become leaf rows when deleting all of there children (whereas for row grouping, a group with 0 rows is deleted). I will have to test more in depth this part.

  • Working with the throttled row update adds some complexity because we have to build an object describing the partial updates without loosing the updates done on the previous calls to updatesRows and not applied yet (for instance if you insert a row with updateRows, then modify it with updateRows during the throttling period, then in the end you just want to insert it. But if you modify it then delete it, then you just want to delete it, etc...)

  • The properties idToIdLookup, ids and idRowsLookup contains both the data rows and the auto generated rows. With partial updates it becomes very complicated to play with them. I removed ids from the state (it can be replaced either by idToIdLookup when the order is not needed or groupNode.children when the order is needed). For idToIdLookup I split it in two (one for the data rows and one for the auto generated rows). For idRowsLookup, I no longer store fake models for auto generated rows (which was a dirty hack, but removing them has some implications I need to clarify).

  • The aggregation footer management is harder with partial update because we need to be able to remove the footers and not just add them (like we do for grouping columns in a way). I did not implement that part yet.


New structure

Tthe JSDoc need to be improved

Tree

export type GridTreeNode = GridLeafNode | GridGroupNode | GridFooterNode;

export type GridRowTreeConfig = Record<GridRowId, GridTreeNode>;

export interface GridTreeBasicNode {
  /**
   * The uniq id of this node.
   */
  id: GridRowId;
  /**
   * Depth of this node in the tree.
   */
  depth: number;
}

export interface GridLeafNode extends GridTreeBasicNode {
  type: 'leaf';
  /**
   * The row id of the group containing this node.
   */
  parent: GridRowId;
  /**
   * The key used to group the children of this row.
   * Only used in the tree data, may-be renamed leafKey.
   */
  groupingKey: GridKeyValue | null;
}

export interface GridGroupNode extends GridTreeBasicNode {
  type: 'group';
  /**
   * If `true`, this node has been automatically generated by the grid.
   * In the row grouping, all groups are auto-generated
   * In the tree data, some groups can be passed in the rows
   */
  isAutoGenerated: boolean;
  /**
   * The key used to group the children of this row.
   */
  groupingKey: GridKeyValue | null;
  /**
   * The field used to group the children of this row.
   * Is `null` if no field has been used to group the children of this row.
   */
  groupingField: string | null;
  /**
   * The id of the children nodes.
   */
  children: GridRowId[];
  /**
   * The id of the children nodes, grouped by grouping field and grouping key.
   * This key is the main addition on the tree.
   * To be able to do partial update with good performances, we need to be able to easily convert the path of grouping criteria into a path of ids to now where to put the new rows.
   */
  childrenFromPath: {
    [groupingField: string]: {
      [groupingKey: string]: GridRowId;
    };
  };
  /**
   * If `true`, the children of this group are not visible.
   * @default false
   */
  childrenExpanded?: boolean;
  /**
   * The row id of the group containing this node (null for the root group).
   */
  parent: GridRowId | null;
}

export interface GridFooterNode extends GridTreeBasicNode {
  type: 'footer';
  /**
   * The row id of the group containing this node.
   */
  parent: GridRowId;
}

We will add properties for sorting / filtering and maybe aggregation on those nodes later if we go for @m4theushw approach (which I would be in favor of doing).

Row state

export interface GridRowsLookups {
  dataRowIdToModelLookup: GridRowsLookup;
  dataRowIdToIdLookup: Record<string, GridRowId>;
  autoGeneratedRowIdToIdLookup: Record<string, GridRowId>;
}

export interface GridRowTreeCreationValue {
  /**
   * Name of the algorithm used to group the rows
   * It is useful to decide which filtering / sorting algorithm to apply, to avoid applying tree-data filtering on a grouping-by-column dataset for instance.
   */
  groupingName: string;
  tree: GridRowTreeConfig;
  treeDepth: number;
  autoGeneratedRowIdToIdLookup: Record<string, GridRowId>;
}

export interface GridRowsState extends GridRowTreeCreationValue, GridRowsLookups {
  /**
   * Matches the value of the `loading` prop.
   */
  loading?: boolean;
  /**
   * Amount of rows before applying the filtering.
   * It also counts the expanded and collapsed children rows.
   */
  totalRowCount: number;
  /**
   * Amount of rows before applying the filtering.
   * It does not count the expanded children rows.
   */
  totalTopLevelRowCount: number;
  /**
   * Tree returned by the `rowTreeCreation` strategy processor.
   * It is used to re-apply the `hydrateRows` pipe processor without having to recreate the tree.
   * Unclear if we will still need it
   */
  groupingResponseBeforeRowHydration: GridRowTreeCreationValue;
}

@m4theushw
Copy link
Member

The properties idToIdLookup, ids and idRowsLookup contains both the data rows and the auto generated rows. With partial updates it becomes very complicated to play with them.

We don't need to store in idToIdLookup the IDs for auto-generated rows. The use case for this object is to translate IDs when using the cellModesModel prop. I don't think we need to allow to edit auto-generated rows.

@flaviendelangle
Copy link
Member Author

True, I will just rename it to always use the dataRow prefix for keys only containing the data rows then.

@oliviertassinari oliviertassinari added the RFC Request For Comments label Aug 12, 2022
@flaviendelangle flaviendelangle added the component: data grid This is the name of the generic UI component, not the React module! label Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change component: data grid This is the name of the generic UI component, not the React module! discussion feature: Row grouping Related to the data grid Row grouping feature feature: Tree data Related to the data grid Tree data feature RFC Request For Comments v6.x
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants