Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rti merge #84

Merged
merged 41 commits into from
Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
290ca53
roger cli preped for Merge Deploy
Mar 23, 2023
1e46a36
Update Makefile to work with python env
Mar 29, 2023
7baf7ef
Update redisgraph-bulk-loader to fix issue with loading MODULE LIST
Mar 29, 2023
0f5e35e
Revert "Update redisgraph-bulk-loader to fix issue with loading MODUL…
Mar 29, 2023
ee17823
Finalized dev deployment of dug inside Catapult Merge, deployment yam…
Mar 31, 2023
135d54b
updated to reflect the Dug-Api updates to FastAPI
Apr 17, 2023
bb148a4
adding multi label redis by removing 'biolink:' on nodes, edges canno…
May 5, 2023
b6ea2c7
Working multi label redis nodes w/ no biolink label
braswent May 24, 2023
5b85480
Latest code changes to deploy working Roger in Merge
braswent May 25, 2023
0083b60
biolink data move to '.' separator
braswent Jun 9, 2023
c7b3a71
updates to include new dug fixes, upgraded redis-bulk-loader and made…
braswent Jun 20, 2023
50f5b23
adding test roger code
braswent Jun 21, 2023
444fd16
removed helm deployments
braswent Aug 9, 2023
ce54b73
Merge remote-tracking branch 'rti/develop' into rti-merge
YaphetKG Sep 29, 2023
28fb94b
change docker owner
YaphetKG Sep 29, 2023
142e874
remove core.py
YaphetKG Sep 29, 2023
3b4ed65
remove dup dev config
YaphetKG Sep 29, 2023
d6a779a
redis graph is not directly used removing cruft
YaphetKG Sep 29, 2023
1df8ed5
remove print statement
YaphetKG Sep 29, 2023
e608fa1
remove logging files
YaphetKG Sep 29, 2023
d2cee55
update requriemtns
YaphetKG Sep 29, 2023
89ba673
update requriemtns
YaphetKG Sep 29, 2023
82161c3
add redis graph.py
YaphetKG Oct 2, 2023
8ef6b49
fix import error for logger
YaphetKG Oct 2, 2023
d03acb0
adding es scheme and ca_path config
YaphetKG Oct 2, 2023
16e3d3a
adding es scheme and ca_path config
YaphetKG Oct 2, 2023
0beba39
adding debug code
YaphetKG Oct 3, 2023
5725045
removing debug
YaphetKG Oct 3, 2023
ae8b0fd
adding nodes args
YaphetKG Oct 3, 2023
a9e422f
adding biolink.
YaphetKG Oct 3, 2023
5c37b91
adding biolink.
YaphetKG Oct 3, 2023
3c1f88a
Update requirements.txt
YaphetKG Oct 25, 2023
209cd8c
Update .gitignore
braswent Oct 25, 2023
80f80fe
Update dug_utils.py
braswent Oct 25, 2023
50f58c7
Update __init__.py
braswent Oct 25, 2023
e167e21
Update config.yaml
braswent Oct 25, 2023
a4af022
Update dev-config.yaml
braswent Oct 25, 2023
a8c6b43
Update docker-compose.yaml
braswent Oct 25, 2023
9a23d47
fixed docker-compose
braswent Oct 25, 2023
a66fb63
adding back postgres volume to docker compose
braswent Oct 25, 2023
cb1454e
env correction , docker compose updates
YaphetKG Oct 31, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 6 additions & 7 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,14 @@ DATA_DIR=./local_storage

DUG_LOG_LEVEL=INFO

ELASTIC_PASSWORD=12345
ELASTIC_API_HOST=elasticsearch
ELASTIC_USERNAME=elastic
ELASTICSEARCH_PASSWORD=12345
ELASTICSEARCH_HOST=elasticsearch
ELASTICSEARCH_USERNAME=elastic

NBOOST_API_HOST=nboost

REDIS_PASSWORD=12345
REDIS_HOST=redis
REDIS_PORT=6379

REDISGRAPH_PASSWORD=weak
REDISGRAPH_HOST=merge-redis-master
REDISGRAPH_PORT=6379
TRANQL_ACCESS_LOG=access.log
TRANQL_ERROR_LOG=error.log
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Git ignore bioler plate from https://github.com/github/gitignore/blob/master/Python.gitignore


# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down Expand Up @@ -108,6 +107,7 @@ celerybeat.pid
*.sage.py

# Environments
.secrets-env
.venv
env/
venv/
Expand Down Expand Up @@ -149,4 +149,4 @@ cython_debug/
dags/roger/data
local_storage
logs
tests/integration/data/bulk/
tests/integration/data/bulk/
5 changes: 4 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
PYTHON = PYTHONPATH=dags /usr/bin/env python3
PYTHON = $(shell which python3)
PYTHONPATH = dags
VERSION_FILE = ./dags/_version.py
VERSION = $(shell cut -d " " -f 3 ${VERSION_FILE})
DOCKER_REPO = docker.io
Expand All @@ -17,10 +18,12 @@ help:
mk_dirs:
mkdir -p {logs,plugins}
mkdir -p local_storage/elastic
mkdir -p local_storage/redis

rm_dirs:
rm -rf logs/*
rm -rf local_storage/elastic/*
rm -rf local_storage/redis/*
rm -rf ./dags/roger/data/*

#install: Install application along with required packages to local environment
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -533,7 +533,7 @@ Open localhost:8080 in a browser.

Then run:
```
python tranql_translator.py
python tranql_translate.py
```
The Airflow interface shows the workflow:
![image](https://user-images.githubusercontent.com/306971/97787955-b968f680-1b8b-11eb-86cc-4d93842eafd3.png)
Expand Down
4 changes: 2 additions & 2 deletions bin/docker_backend/docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ services:
- roger-network
environment:
- REDIS_PASSWORD=$ROGERENV_REDISGRAPH_PASSWORD
entrypoint: /usr/local/bin/gunicorn --workers=2 --bind=0.0.0.0:8081 --name=tranql --timeout=600 tranql.api:app
entrypoint: /usr/local/bin/gunicorn --workers=2 --bind=0.0.0.0:8001 --name=tranql --timeout=600 tranql.api:app
ports:
- 8081:8081
- 8001:8001
volumes:
- ./tranql-schema.yaml:/tranql/tranql/conf/schema.yaml
#################################################################################
Expand Down
2 changes: 1 addition & 1 deletion bin/roger
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ export PYTHONPATH=$ROGER_HOME:$ROGER_HOME/../kgx
export DB_NAME=test

roger () {
python $ROGER_HOME/roger/core.py $*
python $ROGER_HOME/dags/roger/core.py $*
}

kgx () {
Expand Down
9 changes: 8 additions & 1 deletion cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,15 @@
from dug_helpers.dug_utils import DugUtil, get_topmed_files, get_dbgap_files, get_sparc_files, get_anvil_files, get_nida_files
import sys
import argparse
import os
import time


log = get_logger()

if __name__ == "__main__":

start = time.time()
log.info(f"Start TIME:{start}")
parser = argparse.ArgumentParser(description='Roger common cli tool.')
""" Common CLI. """
parser.add_argument('-d', '--data-root', help="Root of data hierarchy", default=None)
Expand Down Expand Up @@ -102,4 +105,8 @@
if args.validate_concepts:
DugUtil.validate_indexed_concepts(config=config)

end = time.time()
time_elapsed = end - start
log.info(f"Completion TIME:{time_elapsed}")

sys.exit (0)
31 changes: 22 additions & 9 deletions dags/dug_helpers/dug_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -381,10 +381,17 @@ def _search_elements(self, curie, search_term):
query=search_term
))
ids_dict = []
for element_type in response:
all_elements_ids = [e['id'] for e in
reduce(lambda x, y: x + y['elements'], response[element_type], [])]
ids_dict += all_elements_ids
if 'total_items' in response:
if response['total_items'] == 0:
log.error(f"No search elements returned for variable search: {self.variables_index}.")
log.error(f"Concept id : {curie}, Search term: {search_term}")
raise Exception(f"Validation error - Did not find {curie} for"
f"Search term: {search_term}")
else:
for element_type in response:
all_elements_ids = [e['id'] for e in
reduce(lambda x, y: x + y['elements'], response[element_type], [])]
ids_dict += all_elements_ids
return ids_dict

def crawl_concepts(self, concepts, data_set_name):
Expand Down Expand Up @@ -511,18 +518,24 @@ def validate_indexed_concepts(self, elements, concepts):

searched_element_ids = self._search_elements(curie, search_term)

present = bool(len([x for x in sample_elements[curie] if x in searched_element_ids]))
if not present:
log.error(f"Did not find expected variable {element.id} in search result.")
if curie not in sample_elements:
log.error(f"Did not find Curie id {curie} in Elements.")
log.error(f"Concept id : {concept.id}, Search term: {search_term}")
raise Exception(f"Validation error - Did not find {element.id} for"
f" Concept id : {concept.id}, Search term: {search_term}")
else:
present = bool(len([x for x in sample_elements[curie] if x in searched_element_ids]))
if not present:
log.error(f"Did not find expected variable {element.id} in search result.")
log.error(f"Concept id : {concept.id}, Search term: {search_term}")
raise Exception(f"Validation error - Did not find {element.id} for"
f" Concept id : {concept.id}, Search term: {search_term}")

def clear_index(self, index_id):
exists = self.search_obj.es.indices.exists(index_id)
exists = self.search_obj.es.indices.exists(index=index_id)
if exists:
log.info(f"Deleting index {index_id}")
response = self.event_loop.run_until_complete(self.search_obj.es.indices.delete(index_id))
response = self.event_loop.run_until_complete(self.search_obj.es.indices.delete(index=index_id))
log.info(f"Cleared Elastic : {response}")
log.info("Re-initializing the indicies")
self.index_obj.init_indices()
Expand Down
4 changes: 4 additions & 0 deletions dags/roger/config/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,8 @@ class ElasticsearchConfig(DictLike):
username: str = "elastic"
password: str = ""
nboost_host: str = ""
scheme: str = "http"
ca_path: str = ""



Expand Down Expand Up @@ -174,6 +176,8 @@ def to_dug_conf(self) -> DugConfig:
elastic_host=self.elasticsearch.host,
elastic_password=self.elasticsearch.password,
elastic_username=self.elasticsearch.username,
elastic_scheme=self.elasticsearch.scheme,
elastic_ca_path=self.elasticsearch.ca_path,
redis_host=self.redisgraph.host,
redis_password=self.redisgraph.password,
redis_port=self.redisgraph.port,
Expand Down
4 changes: 3 additions & 1 deletion dags/roger/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,8 @@ elasticsearch:
username: elastic
password: ""
nboost_host: ""
scheme: "http"
ca_path: ""

validation:
queries:
Expand Down Expand Up @@ -154,4 +156,4 @@ lakefs_config:
enabled: false
access_key_id: ""
secret_access_key: ""
host: ""
host: ""
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ data_root: "/Users/schreepc/Projects/helxplatform/roger/roger/test/data"
dug_data_root: dug_helpers/dug_data/topmed_data
base_data_uri: https://stars.renci.org/var/kgx_data/trapi-1.0/
kgx:
biolink_model_version: 1.5.0
biolink_model_version: test

#https://github.com/RedisGraph/redisgraph-bulk-loader/blob/master/redisgraph_bulk_loader/bulk_insert.py#L43
bulk_loader:
Expand Down Expand Up @@ -115,4 +115,4 @@ validation:
- var: "[N-]=[N+]=[N-]"
- var: "[Ag+]"
- var: "[Zn+2]"
- var: "[C-]#[O+]"
- var: "[C-]#[O+]"
24 changes: 15 additions & 9 deletions dags/roger/core/bulkload.py
Original file line number Diff line number Diff line change
Expand Up @@ -341,20 +341,26 @@ def insert (self):
args = []
if len(nodes) > 0:
bulk_path_root = storage.bulk_path('nodes') + os.path.sep
nodes_with_type = [
f"{ x.replace(bulk_path_root, '').split('.')[0].replace('~', ':')} {x}"
for x in nodes]
nodes_with_type = []
for x in nodes:
"""
These lines prep nodes bulk load by:
1) appending to labels 'biolink.'
2) combine labels to create a multilabel redis node i.e. "biolink.OrganismalEntity:biolink.SubjectOfInvestigation"
"""
file_name_type_part = x.replace(bulk_path_root, '').split('.')[0].split('~')[1]
all_labels = "biolink." + file_name_type_part + ":" + ":".join([f'biolink.{v.lstrip("biolink:")}' for v in self.biolink.toolkit.get_ancestors("biolink:" + file_name_type_part, reflexive=False, formatted=True )] )
nodes_with_type.append(f"{all_labels} {x}")
args.extend(("-N " + " -N ".join(nodes_with_type)).split())
if len(edges) > 0:
bulk_path_root = storage.bulk_path('edges') + os.path.sep
edges_with_type = [
f"{x.replace(bulk_path_root, '').strip(os.path.sep).split('.')[0].replace('~', ':')} {x}"
for x in edges]
edges_with_type = [f"biolink.{x.replace(bulk_path_root, '').strip(os.path.sep).split('.')[0].split('~')[1]} {x}"
for x in edges]
# Edge label now no longer has 'biolink:'
args.extend(("-R " + " -R ".join(edges_with_type)).split())
args.extend([f"--separator={self.separator}"])
args.extend([f"--host={redisgraph['host']}"])
args.extend([f"--port={redisgraph['port']}"])
args.extend([f"--password={redisgraph['password']}"])
log.debug(f"--redis-url=redis://:{redisgraph['password']}@{redisgraph['host']}:{redisgraph['port']}")
args.extend([f"--redis-url=redis://:{redisgraph['password']}@{redisgraph['host']}:{redisgraph['port']}"])
args.extend(['--enforce-schema'])
args.extend([f"{redisgraph['graph']}"])
""" standalone_mode=False tells click not to sys.exit() """
Expand Down
55 changes: 23 additions & 32 deletions dags/roger/core/redis_graph.py
Original file line number Diff line number Diff line change
@@ -1,51 +1,44 @@
"Graph abstraction layer over redisgraph python module"

import copy

import redis
from redisgraph import Node, Edge, Graph
# from redisgraph import Node, Edge, Graph
# https://redis-py.readthedocs.io/en/v4.5.1/redismodules.html#redisgraph-commands
from redis.commands.graph.node import Node
from redis.commands.graph.edge import Edge

from roger.logger import get_logger

logger = get_logger ()


class RedisGraph:
""" Graph abstraction over RedisGraph

A thin wrapper but provides us some options.
"""

def __init__(self, host='localhost', port=6379, graph='default',
password=''):
""" Graph abstraction over RedisGraph. A thin wrapper but provides us some options. """

def __init__(self, host='localhost', port=6379, graph='default', password=''):
""" Construct a connection to Redis Graph. """
self.r = redis.Redis(host=host, port=port, password=password)
self.redis_graph = Graph(graph, self.r)
self.redis_graph = self.r.graph(graph)

def add_node (self, identifier=None, label=None, properties=None):
""" Add a node with the given label and properties. """
logger.debug (
f"--adding node id:{identifier} label:{label} prop:{properties}")
logger.debug (f"--adding node id:{identifier} label:{label} prop:{properties}")
if identifier and properties:
properties['id'] = identifier
node = Node(node_id=identifier, alias=identifier,
label=label, properties=properties)
node = Node(node_id=identifier, alias=identifier, label=label, properties=properties)
self.redis_graph.add_node(node)
return node

def get_edge (self, start, end, predicate=None):
"Get an edge from the graph with the specified start and end ids"
""" Get an edge from the graph with the specified start and end identifiers. """
result = None
for edge in self.redis_graph.edges:
if edge.src_node.id == start and edge.dest_node.id == end:
result = edge
break
return result

def add_edge (self, start, predicate, end, properties={}):
"Add edge with given predicate and properties btw start and end nodes"
logger.debug (
f"--adding edge start:{start} pred:{predicate} "
f"end:{end} prop:{properties}")
""" Add an edge with the given predicate and properties between start and end nodes. """
logger.debug (f"--adding edge start:{start} pred:{predicate} end:{end} prop:{properties}")
if isinstance(start, str) and isinstance(end, str):
start = Node(node_id = start, label='thing')
end = Node(node_id = end, label='thing')
Expand All @@ -56,27 +49,25 @@ def add_edge (self, start, predicate, end, properties={}):
return edge

def has_node (self, identifier):
"Does the graph have a node with this ID"
return identifier in self.redis_graph.nodes

def get_node (self, identifier, properties=None):
"Retrieve the node with the given id"
return self.redis_graph.nodes[identifier]

def commit (self):
""" Commit modifications to the graph. """
self.redis_graph.commit()

def query (self, query):
""" Query and return result set. """
result = self.redis_graph.query(query)
result.pretty_print()
print(result)
return result

def delete (self):
""" Delete the named graph. """
self.redis_graph.delete()

def test ():
rg = RedisGraph ()
p = { 'a' : 4,
Expand All @@ -91,10 +82,10 @@ def test ():
if last is not None:
rg.add_edge (node, 'link', last)
last = node
rg.commit ()
rg.query ("MATCH (obj:yeah)-[:link]->(j:yeah) RETURN obj.a, obj.b, obj.x")
rg.query ("MATCH (a) RETURN a")
rg.commit ()
rg.query ("""MATCH (obj:yeah)-[:link]->(j:yeah) RETURN obj.a, obj.b, obj.x""")
rg.query ("""MATCH (a) RETURN a""")
rg.delete ()

# rg.query ("""MATCH (a { id : 'chemical_substance' }) RETURN a""")
#test ()
#test ()
Loading