Skip to content

Commit

Permalink
Develop to main sync (#98)
Browse files Browse the repository at this point in the history
* Update _version.py (#86)

* Update _version.py

* Rti merge (#84)

* roger cli preped for Merge Deploy

* Update Makefile to work with python env

* Update redisgraph-bulk-loader to fix issue with loading MODULE LIST

* Revert "Update redisgraph-bulk-loader to fix issue with loading MODULE LIST"

This reverts commit 7baf7ef.

* Finalized dev deployment of dug inside Catapult Merge, deployment yamls, code changes and configurations

* updated to reflect the Dug-Api updates to FastAPI

* adding multi label redis by removing 'biolink:' on nodes, edges cannot be fixed after update so they need to be solved either by changing TranQl AND Plater or forking bulk-redisgraph to allow for colons to be added in the edges

* Working multi label redis nodes w/ no biolink label

* Latest code changes to deploy working Roger in Merge

* biolink data move to '.' separator

* updates to include new dug fixes, upgraded redis-bulk-loader and made changes to for biolink variables to specify it's domain with a 'biolink.'

* adding test roger code

* removed helm deployments

* change docker owner

* remove core.py

* remove dup dev config

* redis graph is not directly used removing cruft

* remove print statement

* remove logging files

* update requriemtns

* update requriemtns

* add redis graph.py

* fix import error for logger

* adding es scheme and ca_path config

* adding es scheme and ca_path config

* adding debug code

* removing debug

* adding nodes args

* adding biolink.

* adding biolink.

* Update requirements.txt

* Update .gitignore

* Update dug_utils.py

Handle Error when curie not found in validate

* Update __init__.py

* Update config.yaml

* Update dev-config.yaml

* Update docker-compose.yaml

* fixed docker-compose

* adding back postgres volume to docker compose

* env correction , docker compose updates

---------

Co-authored-by: Nathan Braswell <[email protected]>
Co-authored-by: esurface <[email protected]>
Co-authored-by: braswent <[email protected]>

* adding v5.0

* cde-links branch

* pin linkml

* Update config.yaml

collection_action to action

* pop total items before result

* print extracted elements

* Update requirements.txt

* Keep edge provenance (#94)

* Update kgx.py

* Update kgx.py

* Update kgx.py

can't delete edge keys while looping over them.

* just collect then update

* Update requirements.txt (#93)

---------

Co-authored-by: Nathan Braswell <[email protected]>
Co-authored-by: esurface <[email protected]>
Co-authored-by: braswent <[email protected]>
Co-authored-by: Howard Lander <[email protected]>
  • Loading branch information
5 people authored Feb 19, 2024
1 parent de46493 commit a8feb24
Show file tree
Hide file tree
Showing 5 changed files with 27 additions and 3 deletions.
6 changes: 5 additions & 1 deletion dags/dug_helpers/dug_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,7 @@ def _search_elements(self, curie, search_term):
raise Exception(f"Validation error - Did not find {curie} for"
f"Search term: {search_term}")
else:
del response['total_items']
for element_type in response:
all_elements_ids = [e['id'] for e in
reduce(lambda x, y: x + y['elements'], response[element_type], [])]
Expand Down Expand Up @@ -434,12 +435,15 @@ def crawl_concepts(self, concepts, data_set_name):
casting_config = query['casting_config']
tranql_source = query['tranql_source']
dug_element_type = query['output_dug_type']
extracted_dug_elements += crawler.expand_to_dug_element(
new_elements = crawler.expand_to_dug_element(
concept=concept,
casting_config=casting_config,
dug_element_type=dug_element_type,
tranql_source=tranql_source
)
log.debug("extracted:")
log.debug(str(list([el.get_searchable_dict() for el in new_elements])))
extracted_dug_elements += new_elements
concept.clean()
percent_complete = int((counter / total) * 100)
if percent_complete % 10 == 0:
Expand Down
12 changes: 12 additions & 0 deletions dags/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,18 @@ kgx:
files:
- cde/annotated_edges_v4.0.jsonl
- cde/annotated_nodes_v4.0.jsonl
- version: v5.0
name: baseline-graph
format: jsonl
files:
- baseline-5.0/edges_v5.0.jsonl
- baseline-5.0/nodes_v5.0.jsonl
- version: v5.0
name: cde-graph
format: jsonl
files:
- cde/annotated_edges_v5.0.jsonl
- cde/annotated_nodes_v5.0.jsonl
dug_inputs:
versions:
- name: bdc
Expand Down
4 changes: 2 additions & 2 deletions dags/roger/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ annotation_base_data_uri: https://stars.renci.org/var/dug/

kgx:
biolink_model_version: v3.1.2
dataset_version: v4.0
dataset_version: v5.0
merge_db_id: 1
merge_db_temp_dir: workspace
data_sets:
Expand Down Expand Up @@ -85,7 +85,7 @@ indexing:
desc: "summary"
collection_name: "cde_category"
collection_id: "cde_category"
collection_action: "files"
action: "files"

elasticsearch:
host: elasticsearch
Expand Down
7 changes: 7 additions & 0 deletions dags/roger/models/kgx.py
Original file line number Diff line number Diff line change
Expand Up @@ -478,6 +478,13 @@ def merge(self):
edges['subject'] + edges['predicate'] +
edges['object'] +
edges.get("biolink:primary_knowledge_source", ""))
keys_to_del = set()
for key in edges:
if key.startswith('biolink:'):
keys_to_del.add(key)
for k in keys_to_del:
edges[k.replace('biolink:', '')] = edges[k]
del edges[k]
stream.write(json.dumps(edges).decode('utf-8') + '\n')

write_merge_metric['edges_writing_time'] = time.time() - start_edge_jsonl
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ git+https://github.com/helxplatform/[email protected]
orjson
kg-utils==0.0.6
bmt==1.1.0
linkml-runtime==1.6.0

0 comments on commit a8feb24

Please sign in to comment.