Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-23] Move SQL Lab storage out of browser localStorage #7748

Closed
betodealmeida opened this issue Jun 21, 2019 · 9 comments
Closed

[SIP-23] Move SQL Lab storage out of browser localStorage #7748

betodealmeida opened this issue Jun 21, 2019 · 9 comments
Labels
enhancement:request Enhancement request submitted by anyone from the community sip Superset Improvement Proposal

Comments

@betodealmeida
Copy link
Member

betodealmeida commented Jun 21, 2019

[SIP] Proposal for moving SQL Lab storage out of browser localStorage

Motivation

Currently, we store SQL Lab state in the browser localStorage, including tabs, their queries and results. The redux state is persisted to localStorage using the redux-localstorage library.

While the implementation is clean, it provides a few drawbacks:

  • The state is not preserved across browsers, of if the user clears the application data in the browser.
  • Upgrades might leave the state in a bad shape, preventing SQL Lab from working successfully. We observed this a few times at Lyft, and users would use incognito mode until we instructed them to delete the application data.
  • Storage is limited to 5 MB, hardcoded on the browser.

At Lyft we're currently migrating from BigQuery to Superset, and the project requires querying tables with nested fields (see the work on #7625, #7627 and #7693). Here's what we see when querying the first 100 rows from one of our tables:

Screen Shot 2019-06-19 at 1 23 01 PM

In this case, the query is running automatically for the data preview when the user selects the table in the table browser (left of SQL Lab) in order to inspect it. This makes the browser extremely sluggish, even crashing it.

Proposed Change

I propose moving the persistence of SQL Lab's state from the browser localStorage to the metadata database. State would be synced the following way:

backend -> frontend

  • On load, the bootstrap payload contains a list of tab IDs, and the active tab ID.
  • SQL Lab loads the active tab asynchronously by ID. This will load:
    • selected database
    • selected schema
    • any table schemas
    • query in textarea
    • results (if query has run)
    • any table previews
  • On tab switch, the corresponding tab is loaded asynchronously in a similar way.

frontend -> backend

  • Tabs are saved every time a query changes (user typing, eg), with debouncing.
  • Tabs are saved every time a query is executed.
  • Tabs are saved every time results are loaded.
  • Similar for other changes (database changed, schema, table preview).

These changes should be transparent to the user, with the exception that there will be an additional latency from having to request the state from the server, instead of having it in localStorage.

New or Changed Public Interfaces

For this work, we need to create the following models:

# superset/models/sql_lab.py
class TabState(Model):

    __tablename__ = 'tab_state'

    # basic info
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey('ab_user.id'))
    label = Column(String(256))
    active = Column(Boolean, default=False)

    # tables that are open in the schema browser and their data previews
    table_schemas = relationship('TableSchema')

    # the query in the textarea, and results (if any)
    # we'll reuse the Query model, since it has everything we need
    # (note that this require having results_key set even for sync queries)
    query_id = Column(Integer, ForeignKey('query.id'))
    query = relationship('Query')


class TableSchema(Model):

    __tablename__ = 'table_schema'

    id = Column(Integer, primary_key=True)
    tab_state_id = Column(Integer, ForeignKey('tab_state.id'))

    # DB
    database_id = Column(Integer, ForeignKey('dbs.id'), nullable=False)
    database = relationship('Database', foreign_keys=[database_id])
    schema = Column(String(256))
    table = Column(String(256))
  
    # JSON describing the schema, partitions, latest partition, etc.
    results = Column(Text)

These will be exposed via automatically generated views by FAB.

Note that since we're loading the query results from the server, for synchronous queries we need to store the results in a results backend. We can fallback to a simple results backend (werkzeug.contrib.cache.BaseCache, eg) when none is set.

Optionally, we can store results ourselves in the database with a simple model:

class Results(Model):

    __tablename__ = 'results'

    # used as results_key for sync queries
    id = Column(Integer, primary_key=True)

    # JSON with the results payload
    results = Column(Text)

New dependencies

None.

Migration Plan and Compatibility

We should implement logic that moves the state from localStorage to the backend when we detect that it's stored in the browser.

Rejected Alternatives

None.

@issue-label-bot issue-label-bot bot added the enhancement:request Enhancement request submitted by anyone from the community label Jun 21, 2019
@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label #enhancement to this issue, with a confidence of 0.61. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@betodealmeida
Copy link
Member Author

@betodealmeida betodealmeida added the sip Superset Improvement Proposal label Jun 21, 2019
@betodealmeida betodealmeida changed the title [SIP-?] Move SQL Lab storage out of browser localStorage [SIP-23] Move SQL Lab storage out of browser localStorage Jun 21, 2019
@mistercrunch
Copy link
Member

Some notes:

  • how do we store/retrieve query history? Seems reasonable to fetch it async when navigating to that tab
  • nit: I'd rename Tab to TabState
  • thought: I'd add a json blob field in TabState that matches what goes into the redux store, denormalizing and removing the need for TableSchema altogether

@etr2460
Copy link
Member

etr2460 commented Jun 24, 2019

Thoughts...

  • What happens if someone has the same SQL Lab tab in two different browsers? Do we provide a cursor for updating the tabs/queries, or do we just let people overwrite it automatically? How do we sync the state of the backend to the frontend?
  • I was thinking about simply storing a json blob in TabState too, but it has the downside of making updates to the client store and state very difficult (wouldn't address your second drawback above).
  • Can the bootstrap payload include the tab details for the active tab too? That would improve load time I think
  • What's the tradeoff between falling back to a default results backend vs. not caching query results at all? I'm concerned with adding additional caching complexity that might not be necessary.

@khtruong
Copy link
Contributor

I also agree that adding a json blob would make it difficult to make updates to the store, especially since the state structure can change as often as we want it to.

@betodealmeida
Copy link
Member Author

betodealmeida commented Jun 25, 2019

@etr2460

What happens if someone has the same SQL Lab tab in two different browsers? Do we provide a cursor for updating the tabs/queries, or do we just let people overwrite it automatically? How do we sync the state of the backend to the frontend?

I'm not sure what's the best way of handling this. Currently the last edit is persisted, but not propagated to other open tabs. The implementation proposed would preserve that behavior.

We could use localStorage(or https://developer.mozilla.org/en-US/docs/Web/API/BroadcastChannel) to store if a given tab is being edited, so that when the user opens the same tab in a new browser tab they get a message warning them, or the tab textarea is rendered as read-only, with option to takeover. Any thoughts?

Can the bootstrap payload include the tab details for the active tab too? That would improve load time I think

Good point, thanks!

@DiggidyDave
Copy link
Contributor

DiggidyDave commented Jun 26, 2019

I suggest keeping the narrative clean, stick to the general principle of this SIP, and NOT use local storage. What about?

  • put a guid on TabState
  • have browser do conditional PUT for TabState, if the GUID doesn't match then fail write and query latest/correct state
  • on window.focus fetch latest state, to make sure user doesn't start working on top of stale state
  • on window.blur (visible but not focus) poll for updates
  • handle onvisibilitychangeto stop polling (if not visible) or transition to correct state if becoming visible

The last 3 bullets make a foundational assumption that a user will not be simultaneously editing in 2 windows. They may have 2 windows open, even side by side on the same screen, but they will only have focus and be actively typing in one of them.

EDIT to note: thie TL;DR of this is that when you are typing, if another editor window is open and visible, it will update at the polling frequency to reflect what you are typing in the focus window.

@DiggidyDave
Copy link
Contributor

Why are both of query_id and query required in TabState?

@betodealmeida
Copy link
Member Author

Why are both of query_id and query required in TabState?

query is just a lazy attribute, not a real column. Accessing it returns the Query object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement:request Enhancement request submitted by anyone from the community sip Superset Improvement Proposal
Projects
Status: Implemented / Done
Development

No branches or pull requests

5 participants