Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POST query support multiple value combinations from multiple fields #49

Closed
newgene opened this issue Aug 27, 2019 · 4 comments
Closed
Assignees

Comments

@newgene
Copy link
Member

newgene commented Aug 27, 2019

here is a use case from MyVariant.info: biothings/myvariant.info#89.

Possible solution:

q="1|12345,2|1000", scopes="chrom|vcf.position"

Using "|" to separate the value combination in q and the corresponding field combination in scopes.

A few other notes:

  • "|" may not be the best choice, need to pick one won't be used in the value itself and the query separator.
  • The number of value/field combination must match.
  • Should set a cap for maximium number of value/field combinations (10 should be good enough)
  • should be 100% back compatible with the current behavior
  • can still have multiple fields to match one value, i.e., this should work: q=val1|val2&scopes=fld1,fld2|fld3, where val1 should match either fld1 or fld2.
@namespacestd0
Copy link
Contributor

Syntax shown in an example here:

{
  "q": [["CDK2", "c.314A>T"], ["CXCR4", "c.535G>C"]],
  "scopes": [["dbsnp.gene.symbol", "snpeff.ann.genename"], "snpeff.ann.hgvs_c"]
}

will basically be equivalent to

[
    {
        "query": {
            "bool": {
                "must": [
                    {
                        "multi_match": {
                            "query": "CDK2",
                            "operator": "and",
                            "fields": [
                                "dbsnp.gene.symbol",
                                "snpeff.ann.genename"
                            ]
                        }
                    },
                    {
                        "multi_match": {
                            "query": "c.314A>T",
                            "operator": "and",
                            "fields": "snpeff.ann.hgvs_c"
                        }
                    }
                ]
            }
        }
    },
    {
        "query": {
            "bool": {
                "must": [
                    {
                        "multi_match": {
                            "query": "CXCR4",
                            "operator": "and",
                            "fields": [
                                "dbsnp.gene.symbol",
                                "snpeff.ann.genename"
                            ]
                        }
                    },
                    {
                        "multi_match": {
                            "query": "c.535G>C",
                            "operator": "and",
                            "fields": "snpeff.ann.hgvs_c"
                        }
                    }
                ]
            }
        }
    }
]

Implemented in 89c0e0c
Tests/examples in 996420a
Myvariant use cases/tests in biothings/myvariant.info@fc7fac7
Pending deployment.

@kevinxin90
Copy link
Contributor

@namespacestd0 Is this feature deployed? Tried on myvariant, seems not working at the moment.

@namespacestd0
Copy link
Contributor

This feature should be universally available on all major biothings instances now.

@newgene
Copy link
Member Author

newgene commented Aug 4, 2022

for reference, these two tests provide some example queries:

def test_20_nested(self):
"""
[
{
"query": [
"cdk2",
"9555"
],
"_id": "101025892",
"_score": 1.0840356,
"entrezgene": "101025892",
"name": "cyclin dependent kinase 2",
"symbol": "CDK2",
"taxid": 9555
}
]
"""
payload = {
"q": [["cdk2", "9555"]],
"scopes": ["symbol", "taxid"],
"fields": ["symbol", "name", "taxid", "entrezgene", "ensemblgene"]
}
ans = self.query(method='POST', json=payload)
assert len(ans) == 1
def test_21_nested(self):
"""
[
{
"query": [
"101025892",
"9555"
],
"_id": "101025892",
"_score": 3.8134108,
"entrezgene": "101025892",
"name": "cyclin dependent kinase 2",
"symbol": "CDK2",
"taxid": 9555
}
]
"""
payload = {
"q": [["101025892", "9555"]],
"scopes": [["symbol", "entrezgene"], "taxid"],
"fields": ["symbol", "name", "taxid", "entrezgene", "ensemblgene"]
}
ans = self.query(method='POST', json=payload)
assert len(ans) == 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants