-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cohort build - using payer-plan-period #4
Comments
Few considerations: If we want to go against this philosophy, then the other consideration is that sql render needs to be able to support the different flavors of xml/json/regex that is available in the different database platforms. |
I don't think cdm working group can facilitate this @cgreich - thoughts? The fields in question are:
These are not coded concepts but are either full local names or local codes for local names in source data. We can't standardize them. Similar fields in omop cdm are the _source_value fields like care_site_source_value, person_source_value, provider_source_value, specimen_source_value. We could add new fields that are just integer representation of the _source_value like payer_id, plan_id similar to care_site_id, person_id, provider_id or specimen_id but both _source_value and _id don't have a semantic meaning and don't represent a concept. Contrast to_source_code, _concept_id : they have semantic meaning that may be standardized. _source_values cannot be used in standardized/network studies. But they are very valuable to local studies. E.g. care_site_source_value may contain the room in a hospital in the "third floor wing five #3451", which only local hospital staff understand. They want that information in their local studies. I think the _source_values in payer_plan_period are examples of a similar challenge. In the absence of standardization - are we really going against the philosophy or is this a whole new problem that network research is not interested in solving but local data users are? |
@chrisknoll i think it is a whole new problem, and one potential solution is to introduce the construct of Factsets. Factsets are similar to conceptset but instead of collection of concept-id's they are a collection of fact-id's. Fact-id's are the primary-keys of the omop cdm tables (e.g. payer_plan_period_id, visit_occurrence_id, observation_period_id, etc.). We store expressions that generate these primary key's and these expressions are regular-expressions on _source_value.
We then use the factset's in @criteriaQueries as follows
Similarly we could do for @QualifiedLimitFilter, @additionalCriteriaQuery, etc |
Hi, @gowthamrao. I'll thnk over this, but my instinct tells me that this is leading us to an abstraction that makes the system less well-defined. Today we have different criteria types relegated to the specific domains that they contain. With 'fact sets', what's to say that we just throw all those types away and just refer to everything as 'facts'? Sounds more flexible but my experience has been that it's just less structured (and harder to manage). If I'm looking at this correctly, it looks like fact sets are just a way to alias results coming out of a criteria query? That's a different way to do it but that doesn't really solve our problem of this Issue which is to incorporate payer_plan_period...in that we either get standard concepts to reference in the payer_plan_period table or we introduce some sort of text-mining capablity in sqlRender. The former can be standardized, the later is not standardized and a lot more work to implement (from a sqlrender perspective). So, I'll continue to think this over, but I don't think this is a good idea. |
Can you elaborate this one - maybe we have a disconnect here. |
And then Primary events query becomes:
I'm copying how you said to use the factset table, but changed it from a left join, I 'm not sure what the advantage of left join + where ... is null vs. doing a simple join. But my point of this is: what's the point of creating factsets when you can just refer to the fields that you want directly in the criteria query? Part of my concerns is not just usability across a network and clearly defined constructs (ie criteria types), but also performance considerations when doing the queries. When you create a factset table, you loose all query optimization and index contexts on the underlying tables you're creating facts from. So let's say you create a fact set that resolves to a billion visits, but then gets widdled down to a few thousand after you have other patient criteria applied...your database performance blows up because it can't optimize the data retrieval from the #factsets table and the underlying datastore. I know I'm getting into the weeds here, but all I can say is I don't think factsets solve a specific problem..it just abstracts things.... I know this is off topic on this issue 'Payer plan periods in cohort defs', but what I think you are circling is that you'd like a way to create 'criteria expressions' that can be reused interchangeably across different parts of a cohort expression. If that's something you are trying to do, the object model in circe-be can be the building blocks, and you just have to create a new structure that sits on top of them to create these re-usable pieces and then materialize them into a circe cohort expression at execution time. This isn't what circe does, it is soemthing that can be built on top of circe, but I think that would be either an extension to the circe-be capablity or a completely external library that references circe-be. If what I've just said isn't something you're tryign to accomplish, please disregard :) -Chris |
@chrisknoll what you said makes sense. Based on your input proposal to CDM WG OHDSI/CommonDataModel#120 |
@chrisknoll are these the exceptions
|
Yes, those are the exceptions. |
@gowthamrao , if this issue is addressed in the recent PR for the payer plan period criteria, could you close this? Otherwise, please let me know what items remain open that should keep this issue open. Thank you! |
OMOP https://github.com/OHDSI/CommonDataModel/blob/master/Documentation/CommonDataModel_Wiki_Files/StandardizedHealthEconomicsDataTables/PAYER_PLAN_PERIOD.md is not a concept_id based table.
Payer-plan-period only has _source_values (payer_source_value, plan_source_value and family_source_value). Can cohort's be built using some form of regular-expression search of _source_values? Alternatively, the _source_value may hold a Json or XML object.
Can we use _source_value XML, Json or regular expression search on string as part of criteriaquery for example?
The text was updated successfully, but these errors were encountered: