Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rti-dug merge #322

Merged
merged 30 commits into from
Oct 24, 2023
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
33b008e
updated docker compose MERGE images
Mar 1, 2023
013297d
doc: Update README.md
Mar 1, 2023
75aaecb
feat: Update elasticsearch to 8.5.2 and dug_api to work with the new …
Mar 1, 2023
0409e42
feat: Change from bitnami/redis:7.0.9 to redis/redis-stach:6.2.4-v2
Mar 2, 2023
d11fc68
fix: Fix Dockerfile RUN apt-get and cache clear
Mar 2, 2023
fd2dc47
fix: Update Makefile to work with python env
Mar 29, 2023
e4caab3
fix: Updates use of ElasticSearch function body param
Mar 29, 2023
15f8bf1
Revert "fix: Updates use of ElasticSearch function body param"
Mar 29, 2023
e256e4e
fix: Updates use of ElasticSearch function body param
Mar 29, 2023
65888b0
feat: Updates README with release information
Mar 29, 2023
7973187
Fuctional Dug APIs -- Added Node Red Testing suite
Mar 31, 2023
a55c057
working roger deployment with Elastic search updates
braswent Jun 16, 2023
a59a4e1
fixing tranql/crawl step into elasicsearch
braswent Jun 26, 2023
f4cd5af
Merge remote-tracking branch 'rti/develop' into dug-merge
YaphetKG Sep 27, 2023
70de5f3
workflow python 3.10
YaphetKG Sep 27, 2023
a8e90fa
adding scheme to elastic hosts
YaphetKG Sep 27, 2023
5400918
aiohttp addition for es
YaphetKG Sep 27, 2023
2a0c670
adding scheme to mock server
YaphetKG Sep 27, 2023
009cf0b
adding env var
YaphetKG Sep 27, 2023
3af15b4
fix typo
YaphetKG Sep 27, 2023
51a1f93
fix another typo
YaphetKG Sep 27, 2023
b4e2610
ssl support
YaphetKG Sep 29, 2023
6c7ddb8
adding ssl config for indexing obj
YaphetKG Oct 3, 2023
76e4612
bump url lib
YaphetKG Oct 3, 2023
5d8b5c8
adding index parameters
YaphetKG Oct 4, 2023
55413fd
adding index parameters
YaphetKG Oct 4, 2023
1b6a790
fix assignemnt error
YaphetKG Oct 4, 2023
01736d4
fix es.get
YaphetKG Oct 24, 2023
80df535
fix expanding queries to get cdes
YaphetKG Oct 24, 2023
2888678
BREAKING CHANGE
YaphetKG Oct 24, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 0 additions & 18 deletions .env

This file was deleted.

6 changes: 3 additions & 3 deletions .github/workflows/code-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
python-version: '3.10'

# Currently actions/setup-python supports caching
# but the cache is not as robust as cache action.
Expand Down Expand Up @@ -106,7 +106,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
python-version: '3.10'

- name: Install Requirements
run: |
Expand All @@ -127,7 +127,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
python-version: '3.10'

- name: Install Requirements
run: |
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ celerybeat.pid
*.sage.py

# Environments
.env
.venv
env/
venv/
Expand Down Expand Up @@ -148,6 +149,9 @@ Thumbs.db
.idea/
.vscode/

# Trivy
trivy/

db/*
normalized_inputs.txt
concept_file.json
Expand All @@ -157,3 +161,4 @@ monarch_results.txt
anno_fails.txt
data/elastic/
crawl/
.env
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
# A container for the core semantic-search capability.
#
######################################################
FROM python:3.9.6-slim
FROM python:3.10.10-slim

# Install required packages
RUN apt-get update && \
apt-get install -y curl make vim && \
rm -rf /var/cache/apk/*
rm -rf /var/cache/apt/*

# Create a non-root user.
ENV USER dug
Expand Down
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
PYTHON = /usr/bin/env python3
PYTHON = $(shell which python3)
VERSION_FILE = ./src/dug/_version.py
VERSION = $(shell cut -d " " -f 3 ${VERSION_FILE})
DOCKER_REPO = docker.io
DOCKER_OWNER = helxplatform
DOCKER_OWNER = rti
DOCKER_APP = dug
DOCKER_TAG = ${VERSION}
DOCKER_IMAGE = ${DOCKER_OWNER}/${DOCKER_APP}:$(DOCKER_TAG)
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ export REDIS_HOST=localhost
Then you can actually crawl the data:

```shell
dug crawl data/test_variables_v1.0.csv -p "TOPMedTag"
dug crawl tests/integration/data/test_variables_v1.0.csv -p "TOPMedTag"
````

After crawling, you can search:
Expand Down Expand Up @@ -287,4 +287,6 @@ Once the test is complete, a command line search shows the contents of the index
TOPMed phenotypic concept data is [here](https://github.com/helxplatform/dug/tree/master/data).


## Release

To release, commit the change and select feature.
Empty file modified data/bdc_dbgap_download.sh
100644 → 100755
Empty file.
4 changes: 4 additions & 0 deletions debug.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
source .env
export $(cut -d= -f1 .env)
export ELASTIC_API_HOST=localhost
export REDIS_HOST=localhost
11 changes: 5 additions & 6 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,7 @@ services:
##
#################################################################################
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.6.1
platform: "linux/amd64"
image: docker.elastic.co/elasticsearch/elasticsearch:8.5.2
networks:
- dug-network
environment:
Expand All @@ -73,18 +72,18 @@ services:
#################################################################################
##
## A memory cache for results of high volume service requests.
## https://redis.io/docs/stack/get-started/install/docker/
##
#################################################################################
redis:
platform: "linux/amd64"
image: 'bitnami/redis:5.0.8'
image: 'redis/redis-stack:6.2.4-v2'
networks:
- dug-network
environment:
- REDIS_PASSWORD=$REDIS_PASSWORD
- REDIS_ARGS=--requirepass $REDIS_PASSWORD
- REDIS_DISABLE_COMMANDS=FLUSHDB,FLUSHALL
volumes:
- $DATA_DIR/redis:/bitnami/redis/data
- $DATA_DIR/redis:/data
ports:
- '6379:6379'

Expand Down
11 changes: 8 additions & 3 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,24 +1,29 @@
elasticsearch[async]==7.16.3
aiohttp
asyncio
fastapi==0.95.0
uvicorn==0.23.2
elasticsearch[async]==8.5.2
gunicorn
itsdangerous
Jinja2
jsonschema
MarkupSafe
ormar==0.12.1
mistune==2.0.3
pluggy==1.0.0
pyrsistent==0.17.3
pytest
pytz==2021.1
PyYAML==6.0
redis==4.4.4
requests==2.31.0
# old redis==4.4.2
redis==4.5.1
requests-cache==0.9.8
six==1.16.0

# Click for command line arguments
# We use Click 7.0 because that's what one of the pinned packages above use.
click
httpx>=0.24.1
bmt==1.1.0
bmt==1.1.0
urllib3>=1.26.17
15 changes: 6 additions & 9 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
name = dug
version = attr: dug.__version__
author = Renaissance Computing Institute
author_email = [email protected]
description = Digging up data
long_description = file: README.md
long_description_content_type = text/markdown
Expand All @@ -18,25 +17,23 @@ classifiers =
package_dir =
= src
packages = find:
python_requires = >=3.9
python_requires = >=3.10
include_package_data = true
install_requires =
elasticsearch>=7.0.0,<8.0.0
elasticsearch==8.5.2
pluggy
requests
requests_cache==0.5.2
redis>=3.0.0
requests_cache==0.9.8
redis==4.5.1

[options.entry_points]
console_scripts =
dug = dug.cli:main

[options.extras_require]
rest =
Flask
flask_cors
flask_restful
flasgger
fastapi==0.95.0
uvicorn==0.23.2
gunicorn
jsonschema

Expand Down
2 changes: 1 addition & 1 deletion src/dug/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "2.9.9dev"
__version__ = "2.11.1.dev"
5 changes: 3 additions & 2 deletions src/dug/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,8 +117,9 @@ def search(args):
dug = Dug(factory)
# dug = Dug()
response = dug.search(args.target, args.query, **args.kwargs)
jsonResponse = json.dumps(response, indent = 2)
print(jsonResponse)
# Using json.dumps raises 'TypeError: Object of type ObjectApiResponse is not JSON serializable'
#jsonResponse = json.dumps(response, indent = 2)
print(response)

def datatypes(args):
config = Config.from_env()
Expand Down
4 changes: 4 additions & 0 deletions src/dug/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ class Config:
elastic_host: str = "elasticsearch"
elastic_port: int = 9200
elastic_username: str = "elastic"
elastic_scheme: str = "http"
elastic_ca_path: str = ""

redis_host: str = "redis"
redis_port: int = 6379
Expand Down Expand Up @@ -99,6 +101,8 @@ def from_env(cls):
env_vars = {
"elastic_host": "ELASTIC_API_HOST",
"elastic_port": "ELASTIC_API_PORT",
"elastic_scheme": "ELASTIC_API_SCHEME",
"elastic_ca_path": "ELASTIC_CA_PATH",
"elastic_username": "ELASTIC_USERNAME",
"elastic_password": "ELASTIC_PASSWORD",
"redis_host": "REDIS_HOST",
Expand Down
2 changes: 1 addition & 1 deletion src/dug/core/annotate.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ def expand_identifier(self, identifier, query_factory, kg_filename, include_all_

# Case: Skip if empty KG
try:
if not len(response["message"]["knowledge_graph"]["nodes"]):
if response["message"] == 'Internal Server Error' or len(response["message"]["knowledge_graph"]["nodes"]) == 0:
logger.debug(f"Did not find a knowledge graph for {query}")
logger.debug(f"{self.url} returned response: {response}")
return []
Expand Down
18 changes: 14 additions & 4 deletions src/dug/core/async_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import logging
from elasticsearch import AsyncElasticsearch
from elasticsearch.helpers import async_scan
import ssl

from dug.config import Config

Expand Down Expand Up @@ -38,13 +39,22 @@ def __init__(self, cfg: Config, indices=None):

self.indices = indices
self.hosts = [{'host': self._cfg.elastic_host,
'port': self._cfg.elastic_port}]
'port': self._cfg.elastic_port,
'scheme': self._cfg.elastic_scheme}]

logger.debug(f"Authenticating as user "
f"{self._cfg.elastic_username} "
f"to host:{self.hosts}")

self.es = AsyncElasticsearch(hosts=self.hosts,
if self._cfg.elastic_scheme == "https":
ssl_context = ssl.create_default_context(
cafile=self._cfg.elastic_ca_path
)
self.es = AsyncElasticsearch(hosts=self.hosts,
http_auth=(self._cfg.elastic_username,
self._cfg.elastic_password),
ssl_context=ssl_context)
else:
self.es = AsyncElasticsearch(hosts=self.hosts,
http_auth=(self._cfg.elastic_username,
self._cfg.elastic_password))

Expand Down Expand Up @@ -256,7 +266,7 @@ async def search_concepts(self, query, offset=0, size=None, types=None,
aggregations['type-count']['buckets']
}
search_results.update({'total_items': total_items['count']})
search_results['concept_types'] = concept_types
search_results.update({'concept_types': concept_types})
return search_results

async def search_variables(self, concept="", query="", size=None,
Expand Down
4 changes: 2 additions & 2 deletions src/dug/core/crawler.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ def expand_to_dug_element(self,
curie_filter = casting_config["curie_prefix"]
attribute_mapping = casting_config["attribute_mapping"]
array_to_string = casting_config["list_field_choose_first"]
target_node_type_snake_case = biolink_snake_case(target_node_type.replace("biolink:", ""))
target_node_type_snake_case = biolink_snake_case(target_node_type.replace("biolink.", ""))
for ident_id, identifier in concept.identifiers.items():

# Check to see if the concept identifier has types defined, this is used to create
Expand All @@ -219,7 +219,7 @@ def expand_to_dug_element(self,

# convert the first type to snake case to be used in tranql query.
# first type is the leaf type, this is coming from Node normalization.
node_type = biolink_snake_case(get_formatted_biolink_name(identifier.types).replace("biolink:", ""))
node_type = biolink_snake_case(get_formatted_biolink_name(identifier.types).replace("biolink.", ""))
try:
# Tranql query factory currently supports select node types as valid query
# Types missing from QueryFactory.data_types will be skipped with this try catch
Expand Down
24 changes: 17 additions & 7 deletions src/dug/core/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import logging

from elasticsearch import Elasticsearch
import ssl

from dug.config import Config

Expand All @@ -20,12 +21,21 @@ def __init__(self, cfg: Config, indices=None):
logger.debug(f"Connecting to elasticsearch host: {self._cfg.elastic_host} at port: {self._cfg.elastic_port}")

self.indices = indices
self.hosts = [{'host': self._cfg.elastic_host, 'port': self._cfg.elastic_port}]
self.hosts = [{'host': self._cfg.elastic_host, 'port': self._cfg.elastic_port, 'scheme': self._cfg.elastic_scheme}]

logger.debug(f"Authenticating as user {self._cfg.elastic_username} to host:{self.hosts}")

self.es = Elasticsearch(hosts=self.hosts,
http_auth=(self._cfg.elastic_username, self._cfg.elastic_password))
if self._cfg.elastic_scheme == "https":
ssl_context = ssl.create_default_context(
cafile=self._cfg.elastic_ca_path
)
self.es = Elasticsearch(
hosts=self.hosts,
http_auth=(self._cfg.elastic_username, self._cfg.elastic_password),
ssl_context=ssl_context)
else:
self.es = Elasticsearch(
hosts=self.hosts,
http_auth=(self._cfg.elastic_username, self._cfg.elastic_password))
self.replicas = self.get_es_node_count()

if self.es.ping():
Expand Down Expand Up @@ -184,7 +194,7 @@ def update_doc(self, index, doc, doc_id):

def index_concept(self, concept, index):
# Don't re-index if already in index
if self.es.exists(index, concept.id):
if self.es.exists(index=index, id=concept.id):
return
""" Index the document. """
self.index_doc(
Expand All @@ -193,15 +203,15 @@ def index_concept(self, concept, index):
doc_id=concept.id)

def index_element(self, elem, index):
if not self.es.exists(index, elem.id):
if not self.es.exists(index=index, id=elem.id):
# If the element doesn't exist, add it directly
self.index_doc(
index=index,
doc=elem.get_searchable_dict(),
doc_id=elem.id)
else:
# Otherwise update to add any new identifiers that weren't there last time around
results = self.es.get(index, elem.id)
results = self.es.get(index=index, doc_id=elem.id)
identifiers = results['_source']['identifiers'] + list(elem.concepts.keys())
doc = {"doc": {}}
doc['doc']['identifiers'] = list(set(identifiers))
Expand Down
Loading
Loading