Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cache-cell-results): added cell result caching to reduce cell configuration querying on loads #18581

Merged
merged 18 commits into from
Jun 23, 2020

Conversation

asalem1
Copy link
Contributor

@asalem1 asalem1 commented Jun 17, 2020

Closes #18401

Problem

Dashboard cells are queried once when they are initially loaded, then again when the cell is opened for configuration

Solution

Implemented a reducer-based cache to cache dashboard cell queries and their results based on a hashed query index. Now, if a cell is opened for configuration, the action will check to see if a result exists for that hashed query, otherwise the query will execute. Once the dashboard unmounts, the query is cached.

  • CHANGELOG.md updated with a link to the PR (not the Issue)
  • Rebased/mergeable

@asalem1 asalem1 changed the title Feat/conditionally query cell feat(cache-cell-results): added cell result caching to reduce cell configuration querying on loads Jun 17, 2020
@asalem1 asalem1 force-pushed the feat/conditionally-query-cell branch from cb0a5ae to 27e4c32 Compare June 17, 2020 22:05
| ReturnType<typeof resetCachedQueryResults>
| ReturnType<typeof setQueryResultsByQueryID>

export const hashCode = s =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's usually a good idea to paste the link you copy/pasted internet code form so people can get some context on it, and in the case of bugs, go right to the source.

}

return state
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this file necessary? why can't the query results be stored in the time machine reducer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had originally set the data in the time machine, but it became a little tricky accessing the data since timeMachine is partitioned into segmented subsets. The thinking with this feature is that we might be also benefit from this caching across the platform in the future, so if the same query is run in data-explorer and a dashboard cell, we'd be able to cache those results and reduce querying

@@ -344,6 +349,7 @@ const mstp = (state: AppState, props: OwnProps): StateProps => {

const mdtp: DispatchProps = {
notify: notifyAction,
setQueryResults: setQueryResultsByQueryID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the idea behind aliasing this? it's really confusing - there's already a method being exported from another file called setQueryResults. this really confounds the ability to see where setQueryResults is used - this represents a false positive of that.

Copy link
Contributor Author

@asalem1 asalem1 Jun 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I think aliasing helps differentiate between the imported action vs the one that's accessible in the props. Without aliasing, I find that my workflow is:

  1. Import the action
  2. Try and call the action directly within the component
  3. Get confused for a bit, do some debugging, then realize the err of my ways

Having said that, I'm fine with not aliasing if it makes things clearer

Copy link
Contributor

@ebb-tide ebb-tide Jun 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for differentiate between the imported action vs the one that's accessible in the props we sometimes use setX vs onSetX I had personally set a bad example of this in this file. :(

dispatch(executeQueries())
} catch (error) {
// if the files don't exist in the cache, we want to execute the query
console.error(error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i'm understanding this correctly, this logs an error on cache misses? seems noisy - if that's the case i think we should just swallow this error and not log it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.


@ErrorHandling
class DashboardPage extends Component<Props> {
public componentWillUnmount() {
this.props.resetCachedQueryResults()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thinking about cache invalidation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't take any credit for it, it was all @ebb-tide 's idea

@@ -40,6 +43,7 @@ export interface AppState {
}
currentPage: CurrentPage
currentDashboard: CurrentDashboardState
data: DataState
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm going to be blunt here: data is a really poor name for a reducer - all reducers deal in data. i'm not sure if this reducer needs to exist, but if it does, what's wrong with calling it the cache reducer or the queryCache reducer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are really great names, I'll switch them up


@ErrorHandling
class DashboardPage extends Component<Props> {
public componentWillUnmount() {
this.props.resetCachedQueryResults()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

case 'RESET_CACHED_QUERY_RESULTS': {
return initialState
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's return a brand new object here just in case.

@@ -73,6 +74,7 @@ export const rootReducer = combineReducers<ReducerState>({
}),
currentPage: currentPageReducer,
currentDashboard: currentDashboardReducer,
data: dataReducer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call this queryCache and queryCacheReducer?. data can mean sooo many things... :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally. Just seeing this. how about dashboardQueryCache? or are we thinking something a little less specific?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like queryCache, I can't tell if we will only use this in dashboards or not.

const state = getState()
const {view} = getActiveTimeMachine(state)
const queries = view.properties.queries.filter(({text}) => !!text.trim())
const queryID = get(queries, '[0].text', '')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name setQueryResultsByQueryID suggests that you will be hashing first and then passing queryID rather than queryText to the function. I think you should be hashing on this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. I was thinking that as I was walking through this time around. I'll go ahead and import it in and hash it here

const {view} = getActiveTimeMachine(state)
const queries = view.properties.queries.filter(({text}) => !!text.trim())
const queryID = get(queries, '[0].text', '')
if (queryID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be where you lookup to see whether the queryID exists in your cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in TimeSeries- we setQueryResults, here we should read it back from queryResults.
so if state.queryResults.Query.queryResultsByQueryID[queryID] exists we should use it in the visualization, and not executeQueries. (you might need to look into how to use in the visualization)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be a misunderstanding stemming from naming two different functions the same thing, since this dispatch should be dispatching the action above:

https://github.com/influxdata/influxdb/pull/18581/files#diff-d32f5a03b5c68c499b895f2222a64da9R125

and not the action to set the data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is pretty confusing :) I don't think we need a separate setQueryResultsByQueryID function here, just roll what it does into this action.

export const hashCode = s =>
s.split('').reduce((a, b) => ((a << 5) - a + b.charCodeAt(0)) | 0, 0)

export const setQueryResultsByQueryID = (queryID: string, files: string[]) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's really confusing when the variable is called queryID but you are actually passing in the queryText, and you will convert it into the ID inside the function. Could you change the name of this function to setQueryResultsForQuery(queryText: string, files: string[]))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great point. I ended up importing the hashCode function into the TimeSeries component to set the ID there, so now it's actually passing the ID in

Copy link
Contributor

@ebb-tide ebb-tide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I'm not done!

case 'SET_QUERY_RESULTS_BY_QUERY': {
return produce(state, draftState => {
const {queryID, files} = action
if (queryID && files.length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be checking these conditions before firing SET_QUERY_RESULTS_BY_QUERY action, and then we can rely on them being here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@asalem1 asalem1 force-pushed the feat/conditionally-query-cell branch from 5a61d99 to 57c49a0 Compare June 18, 2020 20:37
const {
alertBuilder: {id: checkID},
} = state

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this down so it would be place a little closer to the source of where checkID is being used

): Promise<void> => {
try {
const state = getState()
const files = state.queryCache.queryResultsByQueryID[hashCode(queryID)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are hashing queryID before sending to this function. so you shouldn't be hashing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I'm not sure if i am, but I definitely should be since it's a little misleading at this point

@@ -271,6 +280,12 @@ class TimeSeries extends Component<Props & WithRouterProps, State> {

this.pendingReload = false

const queryText = queries[0].text
Copy link
Contributor

@ebb-tide ebb-tide Jun 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work if there is more than one query in queries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about that. I've never seen a situation where more than one query is being sent, but is that a possibility? If so, should we cache each one separately?

@asalem1 asalem1 requested a review from ebb-tide June 18, 2020 20:57
timeMachineID: TimeMachineID
) => (dispatch, getState: GetState): Promise<void> => {
try {
dispatch(getViewForTimeMachine(dashboardID, cellID, timeMachineID))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about race conditions here. We can not await the results of a thunk, we just dispatch it, and so if we are depending on its results as we are... on line 151, we're hoping that getViewForTimeMachine has done its thing already and we can get the view. The solutions is to integrate the getViewForTimeMachine and setQueryResultsForCell functions in to one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear what you're saying, it's definitely a valid concern, but it seems like it might be overstepping the intent of the function. I'm curious as to how moving the dispatches within the scope of setQueryResultsForCell would prevent a race condition?

@asalem1 asalem1 force-pushed the feat/conditionally-query-cell branch from 593db7b to cec9b0d Compare June 18, 2020 22:23
@asalem1 asalem1 requested a review from ebb-tide June 22, 2020 12:29
dispatch(executeQueries())
} catch (error) {
// if the files don't exist in the cache, we want to execute the query
dispatch(executeQueries())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is an error here that is relevant. We are already checking if files don't exist on line 132, and executing queries if they don't. on line 136

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a really good point that I didn't consider. Should we just remove this from the try/catch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yeah.

view,
}) => {
useEffect(() => {
// TODO split this up into "loadView" "setActiveTimeMachine"
// and something to tell the component to pull from the context
// of the dashboardID
getViewForTimeMachine(dashboardID, cellID, 'veo')
setQueryResultsForCell(dashboardID, cellID, 'veo')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this is a nit, but this function name is a lot less descriptive to me. can we switch it to something that has a reference to view? maybe getViewAndResultsForVEO?

@asalem1 asalem1 force-pushed the feat/conditionally-query-cell branch from 638733e to ab3dfd3 Compare June 22, 2020 22:24
@asalem1 asalem1 dismissed hoorayimhelping’s stale review June 22, 2020 22:35

spoke offline and addressed the issues

@asalem1 asalem1 force-pushed the feat/conditionally-query-cell branch from 1b09f53 to 2179681 Compare June 23, 2020 12:32
asalem1 added 5 commits June 23, 2020 08:33
… separated some of the view thunk logic to.

Need to add more explicit typing and need to integrate cache expiration logic
…nt unmounts. Also separate thunk concerns to default query execution when files are undefined
asalem1 and others added 13 commits June 23, 2020 08:33
Adding context to where the hashing function was found
Renaming the data src to queryCache
Renaming the imported function in TimeSeries
Renaming some of the input parameters to be a little more relevant
…ased on the query text and renamed var to align with realistic output
… separated some of the view thunk logic to.

Need to add more explicit typing and need to integrate cache expiration logic
…nt unmounts. Also separate thunk concerns to default query execution when files are undefined
Adding context to where the hashing function was found
Renaming the data src to queryCache
Renaming the imported function in TimeSeries
Renaming some of the input parameters to be a little more relevant
…ctionality of getViewAndResultsForVeo to eliminate possible race condition
@asalem1 asalem1 force-pushed the feat/conditionally-query-cell branch from 2179681 to 655d5de Compare June 23, 2020 15:33
@asalem1 asalem1 merged commit c4fee5b into master Jun 23, 2020
@asalem1 asalem1 deleted the feat/conditionally-query-cell branch June 23, 2020 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't reload query when opening a cell
3 participants