Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering on distinct_id in trends #7810

Closed
mariusandra opened this issue Dec 20, 2021 · 7 comments
Closed

Filtering on distinct_id in trends #7810

mariusandra opened this issue Dec 20, 2021 · 7 comments
Labels
enhancement New feature or request

Comments

@mariusandra
Copy link
Collaborator

Is your feature request related to a problem?

I want to filter by distinct_id under trends, and only see what one user did. However the UI does not let me do that... and the API doesn't let me do that either.

Describe the solution you'd like

I would like to select distinct_id from a list of properties, and filter by it... Like it's a first class citizen.

Describe alternatives you've considered

  • Editing the URL and adding distinct_id to the filter in the URL manually. This worked, but turns out distinct_id is hidden for a reason, and only some integrations send it as a property that I can then filter on.
  • Sending a custom property with the user's ID, just so I could filter on it.
  • Using the events API directly, which lets me filter by distinct_id and person_id, but doesn't provide the aggregated data trends does.

Additional context

This came from a request in the user's Slack.

To solve this, we should either

  • set distinct_id as a property on all events in the plugin server
  • update the queries to add filtering and breakdowns by distinct_id (and person_id?) possible

Thank you for your feature request – we love each and every one!

@mariusandra mariusandra added the enhancement New feature or request label Dec 20, 2021
@macobo
Copy link
Contributor

macobo commented Dec 21, 2021

For context the problem is kind of technical:

  • When sending payloads we sometimes send distinct_id as a event property from client libraries
  • This caused it to show up under properties, but ingestion pipeline correctly removed it
  • Hence causing broken queries

We could technically make it work without changing mych if we are sure we don't want to support another property like distinct_id. By adding a comment to the distinct_id column indicating it's materialized for get_materialized_columns()

@mariusandra
Copy link
Collaborator Author

Interesting. I'm out of depth here, but:

  • This way querying for the "distinct_id" property will always just defer to the actual "distinct_id" field?
  • Is this something that should be added into ClickHouse, or this should be patched onto the get_materialized_columns function in python?

@mariusandra
Copy link
Collaborator Author

There's also a case to be made for filtering by e.g. $time and actually hitting the event's true timestamp. This is less needed for insights, but there have been users who have requested it for e.g. the events list.

CC @pauldambra for #7804

@macobo
Copy link
Contributor

macobo commented Dec 22, 2021

This way querying for the "distinct_id" property will always just defer to the actual "distinct_id" field?

Yes

Is this something that should be added into ClickHouse, or this should be patched onto the get_materialized_columns function in python?

Kind of neither. You need to add a comment to the right clickhouse column, like so: https://github.com/PostHog/posthog/blob/master/ee/clickhouse/materialized_columns/columns.py#L79-L82

However this takes us down a tricky route since:

  1. some properties are not really properties so they need to get side-loaded in, along with all the baggage it introduces (e.g. the 'pseudo-property' won't have any associated meta-data or be shown in dropdowns)
  2. We will 'shadow' some real variables
  3. Not all queries yet support materialized columns: Lifecycle query is slow #7382 Speed up cohort property filtering #5854, sessions - these properties would fail there.

Hence not suggesting we go down this route - just giving some technical insight.

@dbartholomae
Copy link

What's the status on this? If I'm not able to filter by id, this makes e.g. feature flags almost completely useless for me, as I need to enable for certain companies/users only and id is the only unique identifier we have. The only workaround I could think of so far was to create a second userId property with the same value.

@mariusandra
Copy link
Collaborator Author

Hey, you can add a HogQL filter on distinct_id. It's sadly not yet exposed as a selectable dropdown option, but a query like this should do the trick:

image

@dbartholomae
Copy link

@mariusandra Thanks! What's the plan for supporting this? Especially for feature flags this seems to be a core feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants