Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to the timeseries to read data instead of accessing mongodb directly #29

Closed
shankari opened this issue Apr 15, 2023 · 13 comments
Closed

Comments

@shankari
Copy link
Contributor

I had flagged this initially as a "best practice"
#7 (comment)
#7 (comment)

However, this turned into a P0 because AWS DocumentDB (which NREL uses as a managed instance of MongoDB, and is close but not 100% compatible with it) does not seems to support the projections that are used.

We get the error

File "/root/miniconda-4.12.0/envs/emission/lib/python3.7/site-packages/dash/_callback.py", line 447, in add_context
output_value = func(*func_args, **func_kwargs) # %% callback invoked %%
File "app_sidebar_collapsible.py", line 190, in update_store_trips
df = query_confirmed_trips(start_date_obj, end_date_obj)
File "/usr/src/app/utils/db_utils.py", line 66, in query_confirmed_trips
df = pd.json_normalize(list(query_result))
File "/root/miniconda-4.12.0/envs/emission/lib/python3.7/site-packages/pymongo/cursor.py", line 1207, in next
if len(self.__data) or self._refresh():
File "/root/miniconda-4.12.0/envs/emission/lib/python3.7/site-packages/pymongo/cursor.py", line 1124, in _refresh
self.__send_message(q)
File "/root/miniconda-4.12.0/envs/emission/lib/python3.7/site-packages/pymongo/cursor.py", line 1001, in __send_message
address=self.__address)
File "/root/miniconda-4.12.0/envs/emission/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1372, in _run_operation_with_response
exhaust=exhaust)
File "/root/miniconda-4.12.0/envs/emission/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1471, in _retryable_read
return func(session, server, sock_info, slave_ok)
File "/root/miniconda-4.12.0/envs/emission/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1366, in _cmd
unpack_res)
File "/root/miniconda-4.12.0/envs/emission/lib/python3.7/site-packages/pymongo/server.py", line 137, in run_operation_with_response
first, sock_info.max_wire_version)
File "/root/miniconda-4.12.0/envs/emission/lib/python3.7/site-packages/pymongo/helpers.py", line 168, in _check_command_response
max_wire_version)
pymongo.errors.OperationFailure: Internal server error, full error: {'ok': 0.0, 'operationTime': Timestamp(1681425547, 1), 'code': 42, 'errmsg': 'Internal server error'}

Per NREL IT,

It's aws managed db, do not expose logs directly like other application logs, it only provides metrics and auditing calls/events around the cluster, not that helpful for this issue.

The NREL core team makes sure that the timeseries calls work against both MongoDB and DocumentDB, so using them exclusively to make DB calls would reduce maintenance burden overall.

@shankari
Copy link
Contributor Author

With this fix, the dashboard loads in the NREL environment without errors.

Screenshot 2023-04-15 at 10 27 59 AM

@shankari
Copy link
Contributor Author

Keeping the issue open to track the hack removals by @AlirezaRa94

@shankari shankari mentioned this issue Apr 17, 2023
Merged
6 tasks
@shankari
Copy link
Contributor Author

Note that if there is no data, this fails with

ERROR:app_sidebar_collapsible:Exception on /_dash-update-component [POST]
Traceback (most recent call last):
  File "/root/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/flask/app.py", line 2528, in wsgi_app
    response = self.full_dispatch_request()
  File "/root/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/flask/app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/root/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "/root/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/root/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/dash_auth/basic_auth.py", line 36, in wrap
    response = f(*args, **kwargs)
  File "/root/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/dash/dash.py", line 1283, in dispatch
    ctx.run(
  File "/root/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/dash/_callback.py", line 447, in add_context
    output_value = func(*func_args, **func_kwargs)  # %% callback invoked %%
  File "/usr/src/app/app_sidebar_collapsible.py", line 188, in update_store_trips
    df = query_confirmed_trips(start_date_obj, end_date_obj)
  File "/usr/src/app/utils/db_utils.py", line 85, in query_confirmed_trips
    df = df[["user_id", "data.start_fmt_time", "data.end_fmt_time", "data.distance", "data.duration", "data.start_loc.coordinates", "data.end_loc.coordinates"]]
  File "/root/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/pandas/core/frame.py", line 3464, in __getitem__
    indexer = self.loc._get_listlike_indexer(key, axis=1)[1]
  File "/root/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/pandas/core/indexing.py", line 1314, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis)
  File "/root/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/pandas/core/indexing.py", line 1374, in _validate_read_indexer
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['user_id', 'data.start_fmt_time', 'data.end_fmt_time', 'data.distance',\n       'data.duration', 'data.start_loc.coordinates',\n       'data.end_loc.coordinates'],\n      dtype='object')] are in the [columns]"

We may want to highlight that there is no data in this case instead of letting the error be swallowed silently

@shankari
Copy link
Contributor Author

shankari commented May 1, 2023

@AlirezaRa94 This does not have any errors, but:
(1) I cannot see any user labels in the trip table
(2) I see the start and end place, which are not needed/relevant since they are internal links to other objects in the data model
Screenshot 2023-05-01 at 12 07 31 PM

I am fixing (2), which is fairly straightforward but not sure why (1) doesn't work.

@shankari
Copy link
Contributor Author

shankari commented May 1, 2023

(1) does not work because there are no additional paths identified in the study-program config.
I am not sure how this was ever tested with the dynamic config.
@AlirezaRa94 please indicate "testing done" and "testing process" in the future

        "data_trips_columns_exclude": [],
        "additional_trip_columns": [],
        "data_uuids_columns_exclude": [],

@shankari
Copy link
Contributor Author

shankari commented May 1, 2023

I removed all the unnecessary fields using "data_trips_columns_exclude"
The good news is that they all were removed!
The bad news is that there are other fields that are not displayed correctly, namely the start and end fmt time are both missing.

Screenshot 2023-05-01 at 12 20 49 PM

@shankari
Copy link
Contributor Author

shankari commented May 1, 2023

After adding a bunch of logs, these values are being returned but not displayed. I note that the values that are displayed (except for user_id) are prefixed with data. Coincidence?

{'data.duration': 9358.15185546875, 'user_id': 'fc8f71a3-8c82-458d-bf97-18c3ee11a0f3', 'data.distance': 160547.62160904994, 'trip_start_time_str': '2022-12-31T08:44:23.982959-07:00', 'trip_end_time_str': '2022-12-31T11:20:22.134815-07:00', 'start_coordinates': [...], 'end_coordinates': [...]}

@shankari
Copy link
Contributor Author

shankari commented May 1, 2023

I can confirm that if I comment out this line

df = df.drop(columns=[named_col['path']])

from d977851#diff-1c6b8e6d103286796ce21a8276c4a4d8b258e29d6b9cc6df516a92accf4674d1R72

then I do see the expected columns. So either it is the data prefix or some other filtering going on

Screenshot 2023-05-01 at 3 02 02 PM

@shankari
Copy link
Contributor Author

shankari commented May 1, 2023

It is not the data prefix

I changed

    {'label': 'data.trip_end_time_str', 'path': 'data.end_fmt_time'},

and reinstated the drop line

df = df.drop(columns=[named_col['path']])

The records looked like

{'data.distance': 15145.48585499552, 'data.duration': 892.0510001182556, 'user_id': 'bd9cffc8-dbf1-402d-a479-f9f148ec9e60', 'trip_start_time_str': '2021-09-20T16:00:08-06:00', 'data.trip_end_time_str': '2021-09-20T16:15:00.051000-06:00', 'start_coordinates': [...], 'end_coordinates': [...]}

but the end str was not displayed

Screenshot 2023-05-01 at 3 13 37 PM

It looks like there is some additional filtering happening, potentially against the VALID_TRIP_COLS but I don't see where.
I am going to hack this by commenting out the drop statement (since we don't currently end up with duplicates) and will let @AlirezaRa94 fix it the right way once he returns

@shankari
Copy link
Contributor Author

shankari commented May 2, 2023

Aha! I think I figured it out. In pages/data.py, we see

    elif tab == 'tab-trips-datatable':
        data = store_trips["data"]
        columns = perm_utils.get_allowed_trip_columns()
        columns.update(
            col['label'] for col in perm_utils.get_allowed_named_trip_columns()
        )
        has_perm = perm_utils.has_permission('data_trips')

and the allowed trip columns are basically only the ones in VALID_TRIP_COLS

def get_allowed_trip_columns():
    columns = set(constants.VALID_TRIP_COLS)
    for column in permissions.get("data_trips_columns_exclude", []):
        columns.discard(column)
    return columns

@shankari
Copy link
Contributor Author

shankari commented May 2, 2023

@AlirezaRa94 This is seriously problematic because it means that we will not be able to display any labels, which means that the trip table is largely useless for admins.

@shankari
Copy link
Contributor Author

shankari commented May 2, 2023

Ok so refactored the code in a somewhat sloppy fashion to create a new get_display_trip_columns which returns the labels from the named columns

def get_display_trip_columns():
    columns = set()
    # logging.debug("get_display_trip_columns: curr set is %s" % columns)
    columns.update(get_allowed_trip_columns())
    # logging.debug("get_display_trip_columns: curr set is %s" % columns)
    columns.update(
        col['label'] for col in get_allowed_named_trip_columns()
    )
    # logging.debug("get_display_trip_columns: curr set is %s" % columns)
    columns.update(get_display_required_columns())
    # logging.debug("get_display_trip_columns: curr set is %s" % columns)
    return columns

But now I get an error

Invalid argument `data[0].start_coordinates` passed into DataTable.
Expected one of type [string, number, boolean].

Although the start coordinates seem to be formatted the same way in both cases.

{'user_id': 'bd9cffc8-dbf1-402d-a479-f9f148ec9e60', 'data.distance': 15145.48585499552, 'data.duration': 892.0510001182556, 'data.end_loc.coordinates': [....,....], 'trip_start_time_str': '2021-09-20T16:00:08-06:00', 'trip_end_time_str': '2021-09-20T16:15:00.051000-06:00', 'start_coordinates': [..., ....]}
{'data.distance': 15145.48585499552, 'user_id': 'bd9cffc8-dbf1-402d-a479-f9f148ec9e60', 'data.start_loc.coordinates': [...,...], 'data.end_loc.coordinates': [...,...], 'data.duration': 892.0510001182556, 'trip_start_time_str': '2021-09-20T16:00:08-06:00', 'trip_end_time_str': '2021-09-20T16:15:00.051000-06:00'}

@shankari
Copy link
Contributor Author

shankari commented May 2, 2023

Set up the multilabel defaults, corresponding to
#39

MULTILABEL_NAMED_COLS = [
    {'label': 'mode_confirm', 'path': 'data.user_input.mode_confirm'},
    {'label': 'purpose_confirm', 'path': 'data.user_input.purpose_confirm'},
    {'label': 'replaced_mode', 'path': 'data.user_input.replaced_mode'},
]

and although I have commented out the code to drop the paths, I don't see the paths and only the labels.
This seems to be working (sort of) for now, so will defer a more principled fix until @AlirezaRa94 returns

Screenshot 2023-05-01 at 10 28 27 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant