Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic server-side implementation of configurable expectation proposal #828

Merged
merged 5 commits into from
Jul 19, 2021

Conversation

GabrielKS
Copy link
Contributor

What this PR does:

  • Provides a sample expectation/notification configuration file and a complete schema for such files
  • Exposes functions to calculate expectation/notification settings for a given trip based on its end date and inferred labels
  • Adds a stage to the intake pipeline that adds expectation data to each trip
  • Sends expectation data to the client app (which currently ignores it)
  • Fixes a bug in TestLabelInferencePipeline.py
  • Includes unit tests for most of the above

What this PR doesn't do:

  • Validate the expectation/notification configuration file against the schema
  • Fully implement the expectation proposal — for now, the only expect types implemented are "all" and "none"; the others default to "all"
  • Implement any notification behavior aside from the logic to calculate what the notifications should be

+ Adds pipeline stage EXPECTATION_POPULATION, which will compute the user input expectation for each trip, between LABEL_INFERENCE and CREATE_CONFIRMED_OBJECTS
- The stage currently uses a placeholder algorithm that marks every trip as not expected
Tested by confirming that confirmed trips are as desired and that client behavior is unchanged
+ expectation_notification_config.py:
  * Loads the configuration file
  * Implements rule- and schedule-matching algorithms
  * Ultimately exposes methods to get trip- and collection mode-level settings
+ TestExpectationNotificationConfig.py: Tests all of that. Currently passing.

Schema changes:
- Removes draftDelay option
+ Notes that omitting a time zone is valid and has a specific meaning

None of this is yet connected to the pipeline.
The bug was due to the fact that the test datasets shankari_2015-07-22
and shankari_2015-jul-22 both exist and are different things. I was
loading shankari_2015-jul-22 and it was not actually loading any trips;
switching to shankari_2015-07-22 solved this.
+ Added a test case so if this happens in the future the tests actually fail
+ The expectation stage of the pipeline now consults the configuration to determine a trip's expectation status
- For now, the only expect types implemented are "all" and "none"; the others default to "all"
- Still no verification against the schema
+ Includes extensive unit tests

Tested by running the unit tests and confirming that client-side behavior is as expected.
@GabrielKS
Copy link
Contributor Author

The everlasting UI proposal, for reference — though it's a bit out of date and could use a new draft. A better reference for the config file is the schema itself.

Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any code changes required; these are primarily requests for clarification/additional documentation to improve long-term maintainability.

conf/ux/expectations.conf.schema.json Show resolved Hide resolved
Comment on lines +25 to +28
"enabled": {
"description": "Whether or not to enable this mode; mode is enabled if this property is omitted",
"type": "boolean"
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do you anticipate deployers will/should use this v/s setting up a schedule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought I'd add a way to "comment out" a mode. It's not strictly necessary. Do you think it adds excessive complexity?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, that's fine. Just update the documentation in a future commit to highlight this.

"type": "string"
},
"confidenceThreshold": {
"description": "Only display yellow labels with confidence greater than this threshold (0 to display all, 1 to display none)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this "only display" or "only mark"?

From this
https://github.com/e-mission/e-mission-server/pull/828/files#diff-5e8452621bcf334e7ef59c9d10744703d736adbb6109ce4cced282bf9b8c0631R92

It looks like if the confidence is below the threshold, we always show it.

Concretely, I have a trip with confidence 0.6.
Am I correct that in the "intensive" mode, this will be marked ("trigger": -1,) and in the "relaxed" mode, this will be marked "yellow"? An inline example will help because, as you might have figured out, nobody RTFMs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually something I haven't figured out yet. If the confidence for a given label is below the threshold, the phone should always display that label as red and blank; that part I'm sure of (and have implemented accordingly on the client side). The current implementation has these below-threshold labels also counting as red for the purposes of rule matching as well, but I'm not sure what the best choice is there.

If we wanted to be maximally flexible, we would not have below-threshold labels match "trigger": -1 because there is a semantic difference between very low confidence inferences and no inferences at all, and if a study admin wanted to treat them the same they could just write one more rule to explicitly match these trips. The argument in favor of having below-threshold labels match "trigger": -1 (the status quo) is that it forces expectation/notification behavior to match what the user actually sees: to the user, red labels are red labels, if some trips that display red labels are appearing in To Label and some aren't (and it's not because of some random sample option), they might get confused. It's also simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is complex. Let's discuss it after tomorrow. I really want to deploy this today so we can walk through the new screen at tomorrow's deployer/beta tester meeting.

"type": "object",
"properties": {
"startDate": {
"description": "The base date from which to measure time; omit time zone to signify user's local time",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timezones are indeed tricky (remind me to tell you about the Brazilian timezone problem sometime), but I am concerned that this is over-engineering.

I anticipate that the related code is also tricky and brittle (now I understand why you had to do complicated time management), and it also introduces ambiguity.

Why do you think it is important to have "the user's local time" as opposed to using UTC, for example? Again:

  • we are designing this for deployers, not end users, AND
  • hopefully, we can auto-detect individual schedules in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for this part is actually relatively simple; the phone's time zone is embedded in the trip data object, so the server just inserts it if we're missing a time zone. (The complicated part is the schedule matching, regardless of time zones, and in particular figuring out how many months/weeks/days there are between two dates — see expectation_notification_config.py:64).

Let's say we want to do primary data collection for one week, Monday to Sunday, every eight weeks. For simplicity, say we want the user to label all trips at the end of the day under primary data collection and none under secondary data collection. Under the existing schema, we can implement this as a schedule that begins at 00:00 user's local time and lasts 7 days, and specify a startDate that is a Monday. Then the user will have trips appear on their To Label screen for exactly 7 days every 56 days; if they label trips at the end of each day as they are supposed to, they will be interacting with the app 1 in 8 days.

Say we remove the ability to specify a time in the user's local time zone. We implement the schedule as beginning at 00:00 UTC. If the user is in a US time zone, primary mode will begin on Sunday evening and last until the next Sunday evening. Though this is the same length of time and on average will cover the same number of trips, it now spans 8 days instead of 7. If the user is labeling at the end of each day, they will be interacting with the app 1 in 7 days. This is worse.

To summarize, the ability to use a user's local time zone minimizes the number of days the user has to interact with the app. Even if they'll be labeling the same number of trips either way, every day we can avoid making the user do something is a hassle avoided.

I suppose if we wanted to simplify we could only use local time and not give the option to specify a time zone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, let's discuss this after tomorrow. Right now, I'm going to merge this and get the actual recommendations to show up.

emission/core/wrapper/confirmedtrip.py Show resolved Hide resolved
emission/analysis/userinput/expectations.py Show resolved Hide resolved
eaue._test_options = test_options_stash

def preprocess(self, trip):
# See eacilp.placeholder_predictor_2 for an explanation of the "fingerprint" technique
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this explanation here - only a one-liner about Timestamp2index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can flesh out that explanation in an upcoming commit.

Comment on lines +48 to +52
def _get_expectation_for_trip(trip):
raw_expectation = eace.get_expectation(trip)
# For now, expect always labeling unless the config file specifies no labeling at all
processed_expectation = not raw_expectation["type"] == "none"
return {"to_label": processed_expectation}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add in a comment about what the difference between raw and processed is and is intended to be?
My understanding is that the raw expectation is of the form

                    "expect": {
                        "type": "randomDays",
                        "value": 2
                    },

and the processed is a simple boolean true/false on whether trips need to be labeled or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. Will add the documentation in an upcoming commit.

}
for trip in self.expected_trips:
fingerprint = trip["data"]["start_local_dt"]["hour"]*60+trip["data"]["start_local_dt"]["minute"]
if answers[fingerprint] is not None: self.assertEqual(trip["data"]["expectation"]["to_label"], answers[fingerprint])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this? won't the "randomDays" expectations also return True with your current implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want the tests to enforce the actual intended behavior, not the current implementation. The actual intended behavior for the random options is not currently well-defined, so we don't test it — returning True for the random options is an implementation detail (and a placeholder, at that).



def testGetRuleForLabel(self):
# An answers entry is a (labels, index of corresponding rule under intensive mode, index of corresponding rule under relaxed mode) tuple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: expand on this a bit to explain how you go from the list of all rules to the selected rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please handle this in a subsequent commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what you mean.

Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to go ahead and merge this now so we can deploy tonight, demo tomorrow and start getting feedback. I've left the "will handle in subsequent commit" comments unresolved for your reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants