Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improper merging of PFOCR edge attributes from multiple Records #463

Closed
tokebe opened this issue Jul 7, 2022 · 15 comments
Closed

Improper merging of PFOCR edge attributes from multiple Records #463

tokebe opened this issue Jul 7, 2022 · 15 comments

Comments

@tokebe
Copy link
Member

tokebe commented Jul 7, 2022

When a single edge is derived from multiple Records, the edge attributes appear to be merged improperly, at least in the case of the attribute figure_url, figure_title, pmc_reference from PFOCR. What appears to be happening is that only one figure_url is kept, instead of each being merged into an array.

To replicate

In the workspace:

  • Ensure all packages are on main branch and updated
    • npm run git checkout main & npm run git pull
  • Ensure local smartapi specs have been updated
    • npm run smartapi_sync

Query:

URL: http://localhost:3000/v1/smartapi/edeb26858bd27d0322af93e7a9e08761/query

Body:

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:SmallMolecule"],
                    "ids": ["MESH:C000541"]
                },
                "n1": {
                    "categories": ["biolink:Gene"],
                    "ids": ["NCBIGene:208"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}

This query should be returning 9 figures instead of the 1 that appears (attributes of the only edge in message.knowledge_graph.edges)

What has been confirmed

  • Record objects are being formed correctly (each record gets its proper mappedResponse object with correct attributes.

TODO

  • Figure out where in the process Record mappedResponse is turned into edge.attributes
  • Confirm whether this is a generalized merging issue or just requires special behavior for PFOCR attributes

Tagging @colleenXu @ariutta for additional details.

@tokebe tokebe added the good first issue Good for newcomers label Jul 7, 2022
@colleenXu
Copy link
Collaborator

colleenXu commented Jul 7, 2022

What appears to be happening is that only one figure_url is kept, instead of each being merged into an array.

and maybe only one figure_title, pmc_reference is being kept as well...

Basically, I think the "last" Record processed is what is kept (all prior ones are overwritten?)...and instead we want to make the values arrays and add elements to them...

Something like this is already done for other apis ingested through x-bte annotation, like semmeddb (think pubmed IDs) and MyVariant

@ariutta
Copy link
Collaborator

ariutta commented Jul 7, 2022

Maybe semmeddb would have an equivalent test case to see whether the problem happens there too?

@ariutta
Copy link
Collaborator

ariutta commented Jul 7, 2022

Without going step-by-step through the code, I'm not aware off the top of my head what would be causing this.

@rjawesome
Copy link
Contributor

rjawesome commented Jul 7, 2022

I think I figured out where the issue is being caused in the code, I can probably make a PR soon if I am correct?

@rjawesome
Copy link
Contributor

rjawesome commented Jul 7, 2022

This is the expected behavior, correct? (figure_url and pmc_reference are also arrays). The file I am currently editing for my fix is query_graph_hander/graph/kg_edge.js. Also, would we like to make these fields an array if there is a single value or just keep it as that single value?
image

@colleenXu
Copy link
Collaborator

the screenshot looks good, ask @tokebe whether you're working with the correct file.

For single values, I think it's fine to leave them as single values (not 1-element arrays). But if you see something different in the code (like elsewhere single values are converted into 1-element arrays), let us know...

@tokebe
Copy link
Member Author

tokebe commented Jul 8, 2022

Screenshot looks good, make a PR as soon as you're ready.

@colleenXu
Copy link
Collaborator

colleenXu commented Jul 8, 2022

on a related note @rjawesome @tokebe I think "sets" (aka unique values only) is more useful than "lists". I've noticed that problem with stuff from MyVariant; look at the civic stuff on edges in the following query.

However, maybe this is something separate enough to be a different issue?

POST to MyVariant specifically: http://localhost:3000/v1/smartapi/09c8782d9f4027712e65b95424adba79/query

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["DBSNP:rs121913521"],
                    "categories":["biolink:SequenceVariant"]
                },
                "n1": {
                    "categories":["biolink:Disease"]
               }
            },
            "edges": {
                "e1": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}

Example:

                        {
                            "attribute_type_id": "civic_clinical_significance",
                            "value": [
                                "Sensitivity/Response",
                                "Sensitivity/Response",
                                "Sensitivity/Response",
                                "Sensitivity/Response",
                                "Resistance",
                                "Resistance",
                                "Resistance",
                                "Sensitivity/Response",
                                "Sensitivity/Response",
                                "Sensitivity/Response"
                            ]
                        },

@rjawesome
Copy link
Contributor

rjawesome commented Jul 8, 2022

I can make each attribute into a Set, that should be no problem.

@rjawesome
Copy link
Contributor

rjawesome commented Jul 8, 2022

@colleenXu Hmm, your query seems to create another issue, apparently, sometimes an array is being put as the attribute in the record, for example for your query, these two arrays are in the two records that combine to form this edge

[
  [
      "Sensitivity/Response",
      "Sensitivity/Response",
      "Sensitivity/Response",
      "Resistance",
      "Sensitivity/Response",
      "Resistance",
      "Sensitivity/Response"
  ],
  [
      "Sensitivity/Response",
      "Sensitivity/Response",
      "Sensitivity/Response",
      "Sensitivity/Response",
      "Resistance",
      "Resistance",
      "Resistance",
      "Sensitivity/Response",
      "Sensitivity/Response",
      "Sensitivity/Response"
  ]
 ]                         

So, right now my code is making a set or array of these two. Would we like to flatten these arrays? (we can only do this if we are sure no values themselves are of the array type). The results you are showing are because the attributes are only taken from the last record

@colleenXu
Copy link
Collaborator

@rjawesome

I think what's happening is that this DBSNP ID actually corresponds to two hits in MyVariant, and the nested nature of the data is what's leading to this...

I think it'd be nice to flatten these arrays and then run a set operation / unique-values-only. I was hoping for something in the end like ["Sensitivity/Response", "Resistance"] for this particular MyVariant thing.

I currently can't think of cases where we wouldn't want to do this for an edge-attribute....but we'd have to test your stuff carefully once it's ready for testing.

@rjawesome
Copy link
Contributor

@colleenXu Should be ready for testing, flattening of arrays/usage of set has been implemented.

@colleenXu
Copy link
Collaborator

@rjawesome @tokebe If I try to run the example query from the first post, I get a status 500 response.

console logs
  bte:biothings-explorer-trapi:QueryResult initialQEdgeID: e01, initialQNodeIDToMatch: n0 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:442972_&_n1-NCBIGene:208 has 9 +1ms
  bte:biothings-explorer-trapi:QueryResult Successfully scored 0 results, couldn't score 1 results. +0ms
  bte:biothings-explorer-trapi:Graph pruning BTEGraph nodes/edges... +30ms
  bte:biothings-explorer-trapi:Graph pruned 0 nodes and 0 edges from BTEGraph. +0ms
  bte:biothings-explorer-trapi:error_handler ReferenceError: Arrays is not defined
  bte:biothings-explorer-trapi:error_handler     at KnowledgeGraph._createAttributes (/Users/colleenxu/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/graph/knowledge_graph.js:162:28)
  bte:biothings-explorer-trapi:error_handler     at KnowledgeGraph._createEdge (/Users/colleenxu/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/graph/knowledge_graph.js:174:30)
  bte:biothings-explorer-trapi:error_handler     at /Users/colleenxu/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/graph/knowledge_graph.js:182:37
  bte:biothings-explorer-trapi:error_handler     at Array.map (<anonymous>)
  bte:biothings-explorer-trapi:error_handler     at KnowledgeGraph.update (/Users/colleenxu/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/graph/knowledge_graph.js:181:37)
  bte:biothings-explorer-trapi:error_handler     at /Users/colleenxu/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/graph/graph.js:110:24
  bte:biothings-explorer-trapi:error_handler     at Array.map (<anonymous>)
  bte:biothings-explorer-trapi:error_handler     at BTEGraph.notify (/Users/colleenxu/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/graph/graph.js:109:26)
  bte:biothings-explorer-trapi:error_handler     at TRAPIQueryHandler.query (/Users/colleenxu/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/index.js:625:23)
  bte:biothings-explorer-trapi:error_handler     at processTicksAndRejections (node:internal/process/task_queues:96:5) +0ms

@rjawesome
Copy link
Contributor

rjawesome commented Jul 26, 2022

Ah I must have made a typo when copying the code over to the other PR... Fix pushed.

@tokebe
Copy link
Member Author

tokebe commented Mar 28, 2023

Deployed to prod 🚀

@tokebe tokebe closed this as completed Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants