Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

annotator modules added by passing config val #90

Merged
merged 58 commits into from
Jan 29, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
eac8e6a
annotator modules added by passing config val
braswent Nov 6, 2023
3880dbc
Merge branch 'pipeline_parameterize_restructure' into rebased-annotat…
YaphetKG Jan 4, 2024
2480001
fix merge conflict
YaphetKG Jan 4, 2024
b0028c5
following same pattern as parsers , modify configs
YaphetKG Jan 4, 2024
5113953
fix to dug config method
YaphetKG Jan 4, 2024
c59bfb0
fix old dug pipeline for backward compatiblity
YaphetKG Jan 4, 2024
e0dcd93
correct default annotator type
YaphetKG Jan 4, 2024
ae48080
reflective changes
YaphetKG Jan 4, 2024
5fd9168
typo extra quotes
YaphetKG Jan 4, 2024
5ae2d3a
annotator type not being picked up from config
YaphetKG Jan 5, 2024
9ca1e38
remove annotate simple , log env value for lakefs enabled
YaphetKG Jan 16, 2024
348be81
testing lakefs off
YaphetKG Jan 16, 2024
528768d
add more logging
YaphetKG Jan 16, 2024
df89638
add more logging
YaphetKG Jan 16, 2024
683e35b
post init for config to parse to boolean
YaphetKG Jan 17, 2024
6428075
put back task calls
YaphetKG Jan 17, 2024
ec54cc8
revert some changes
YaphetKG Jan 17, 2024
d557c2f
adding new pipeline
YaphetKG Jan 19, 2024
83815ab
lakefs io support for merge task
YaphetKG Jan 19, 2024
80ddf59
fix name
YaphetKG Jan 22, 2024
93dcbe9
add io params for kg tasks
YaphetKG Jan 22, 2024
3676cc9
wire up i/o paths for merge
YaphetKG Jan 23, 2024
0a8be3e
fix variable name
YaphetKG Jan 23, 2024
5a010d8
print files
YaphetKG Jan 23, 2024
1e6a9a2
few debug logs
YaphetKG Jan 23, 2024
c1ae51a
few debug logs
YaphetKG Jan 23, 2024
eb7fdc1
treat path as path not str
YaphetKG Jan 23, 2024
6c49a0c
few debug logs
YaphetKG Jan 23, 2024
7fdf08a
some fixes
YaphetKG Jan 23, 2024
07c5bd9
logging edge files
YaphetKG Jan 23, 2024
999d4a6
bug fix knowledge has edge
YaphetKG Jan 23, 2024
5a8f629
re-org graph structure
YaphetKG Jan 23, 2024
c5d2b0e
adding pathing for other tasks
YaphetKG Jan 23, 2024
96c8ff5
pagenation logic fix for avalon
YaphetKG Jan 23, 2024
62e8a0c
update lakefs client code
YaphetKG Jan 23, 2024
9f03265
fix glob for get kgx files
YaphetKG Jan 23, 2024
749deba
fix up get merged objects
YaphetKG Jan 23, 2024
f4adf0d
send down fake commit id for metadata
YaphetKG Jan 24, 2024
ce4c84f
working on edges schema
YaphetKG Jan 24, 2024
dcfd3db
bulk create nodes I/O
YaphetKG Jan 24, 2024
e43b5f1
find schema file
YaphetKG Jan 24, 2024
db58018
bulk create edges I/O
YaphetKG Jan 24, 2024
44ab91a
bulk create edges I/O
YaphetKG Jan 24, 2024
27fc0e4
bulk load io
YaphetKG Jan 24, 2024
2319a46
no outputs for final tasks
YaphetKG Jan 24, 2024
6429caa
add recursive glob
YaphetKG Jan 24, 2024
3c84f6a
fix globbing
YaphetKG Jan 24, 2024
f8db982
oops
YaphetKG Jan 24, 2024
8ab1a5c
delete dags
YaphetKG Jan 24, 2024
ce59d53
pin dug to latest release
YaphetKG Jan 25, 2024
3434d1a
cruft cleanup
YaphetKG Jan 25, 2024
60f90ba
re-org kgx config
YaphetKG Jan 25, 2024
6f2c0cc
add support for multiple initial repos
YaphetKG Jan 25, 2024
eef984e
fix comma
YaphetKG Jan 25, 2024
f90f13f
create dir to download to
YaphetKG Jan 25, 2024
4fd8ae2
swap branch and repo
YaphetKG Jan 25, 2024
0c057c1
clean up dirs
YaphetKG Jan 26, 2024
9ac704a
fix up other pipeline 👌
YaphetKG Jan 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# Git ignore bioler plate from https://github.com/github/gitignore/blob/master/Python.gitignore
.secret-env
Merge-helm/
Merge-Dug-Architecture.md
.vscode/

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
2 changes: 2 additions & 0 deletions dags/roger/config/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ class BulkLoaderConfig(DictLike):

@dataclass
class AnnotationConfig(DictLike):
annotator_type: str = "annotator_monarch"
annotator: str = "https://api.monarchinitiative.org/api/nlp/annotate/entities?min_length=4&longest_only=false&include_abbreviation=false&include_acronym=false&include_numbers=false&content="
normalizer: str = "https://nodenormalization-sri.renci.org/get_normalized_nodes?curie="
synonym_service: str = "https://onto.renci.org/synonyms/"
Expand Down Expand Up @@ -195,6 +196,7 @@ def to_dug_conf(self) -> DugConfig:
redis_port=self.redisgraph.port,
nboost_host=self.elasticsearch.nboost_host,
preprocessor=self.annotation.preprocessor,
annotator_type=self.annotation.annotator_type,
annotator={
'url': self.annotation.annotator,
},
Expand Down
5 changes: 5 additions & 0 deletions dags/roger/core/bulkload.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@ def tables_up_to_date (self):
targets=glob.glob (storage.bulk_path ("nodes/**.csv")) + \
glob.glob (storage.bulk_path ("edges/**.csv")))

def create (self):
"""Used in the CLI on args.create_bulk"""
self.create_nodes_csv_file()
self.create_edges_csv_file()

def create_nodes_csv_file(self):
if self.tables_up_to_date ():
log.info ("up to date.")
Expand Down
17 changes: 13 additions & 4 deletions dags/roger/pipelines/base.py
Copy link
Collaborator

@mbacon-renci mbacon-renci Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like if we're getting rid of the default of monarch, the docstring should change too. (I may have missed this getting replaced elsewhere tho)

EDIT: okay not sure if that change was in the PR, GitHub decided to show me a diff where this was relevant but now it's applying this comment to a different block, not sure what happened there, this may be moot.

Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,12 @@

import requests

from dug.core import get_parser, get_plugin_manager, DugConcept
from dug.core.annotate import DugAnnotator, ConceptExpander
from dug.core import get_parser, get_annotator, get_plugin_manager, DugConcept
from dug.core.concept_expander import ConceptExpander
from dug.core.crawler import Crawler
from dug.core.factory import DugFactory
from dug.core.parsers import Parser, DugElement
from dug.core.annotators import Annotator
from dug.core.async_search import Search
from dug.core.index import Index

Expand Down Expand Up @@ -130,8 +131,9 @@ def __init__(self, config: RogerConfig, to_string=True):
log.addHandler(self.string_handler)
self.s3_utils = S3Utils(self.config.s3_config)

self.annotator: DugAnnotator = self.factory.build_annotator()

self.annotator: Annotator = get_annotator(
dug_plugin_manager.hook, self.get_annotator_name(dug_conf)
)
self.tranqlizer: ConceptExpander = self.factory.build_tranqlizer()

graph_name = self.config["redisgraph"]["graph"]
Expand Down Expand Up @@ -195,6 +197,13 @@ def get_parser_name(self):
can also be overriden.
"""
return getattr(self, 'parser_name', self.pipeline_name)

def get_annotator_name(dug_conf):
"""
Access method for annotator_name
Defaults to annotator_monarch unless specified using annotation.annotator_type in the configuration file.
"""
return getattr(dug_conf, "annotator_type", "annotator_monarch")

def annotate_files(self, parsable_files, output_data_path=None):
"""
Expand Down
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ flatten-dict
redisgraph-bulk-loader==0.12.3
pytest
PyYAML
git+https://github.com/helxplatform/dug@dug-merge
# git+https://github.com/helxplatform/dug@dug-merge # Version used by mbacon for dev
git+https://github.com/helxplatform/dug@329-annotator-modules # Version used for annotator modules
orjson
kg-utils==0.0.6
bmt==1.1.0
Expand Down