You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should discuss if we want to include a validation step for the user data to check if it matches the campaign specifications they have provided. The rationale is that this could avoid hard-to-detect / silent bugs where a user simply forgets to include a parameter in their campaign definition, which could lead the to worst-case scenario of producing recommendations that have been optimized using the wrong problem specs. I think there is a good chance that this can happen while people are still experimenting with their setup and try different problem specifications.
So what we avoid is that someone starts with a configuration like this ...
... and then thinks "great, works, now let me pull in the real data" and they overlook that the latter has additional context that is relevant for the model. For instance, they could swap out the dataframe against something like the following, resulting in a situation where the different tasks would be mixed up:
A simple (and at the same time user-friendly) approach could be to simply add an allow_extra: bool = False flag to Campaign.add_measurements, which users can still explicitly deactivate if they are certain what they are doing and want to avoid the check to be able to keep meta data columns in their dataframe.
With the check activated, a simple explicit filtering in the spirit of campaign.add_measurements(df.filter(campaign.columns)) would still do the job.
Potentially, other places could benefit from this as well.
The text was updated successfully, but these errors were encountered:
We should discuss if we want to include a validation step for the user data to check if it matches the campaign specifications they have provided. The rationale is that this could avoid hard-to-detect / silent bugs where a user simply forgets to include a parameter in their campaign definition, which could lead the to worst-case scenario of producing recommendations that have been optimized using the wrong problem specs. I think there is a good chance that this can happen while people are still experimenting with their setup and try different problem specifications.
So what we avoid is that someone starts with a configuration like this ...
... and then thinks "great, works, now let me pull in the real data" and they overlook that the latter has additional context that is relevant for the model. For instance, they could swap out the dataframe against something like the following, resulting in a situation where the different tasks would be mixed up:
A simple (and at the same time user-friendly) approach could be to simply add an
allow_extra: bool = False
flag toCampaign.add_measurements
, which users can still explicitly deactivate if they are certain what they are doing and want to avoid the check to be able to keep meta data columns in their dataframe.With the check activated, a simple explicit filtering in the spirit of
campaign.add_measurements(df.filter(campaign.columns))
would still do the job.Potentially, other places could benefit from this as well.
The text was updated successfully, but these errors were encountered: