Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Pipeline parameterize restructure (#95)
* roger cli preped for Merge Deploy * Update Makefile to work with python env * Update redisgraph-bulk-loader to fix issue with loading MODULE LIST * Revert "Update redisgraph-bulk-loader to fix issue with loading MODULE LIST" This reverts commit 7baf7ef. * Finalized dev deployment of dug inside Catapult Merge, deployment yamls, code changes and configurations * updated to reflect the Dug-Api updates to FastAPI * adding multi label redis by removing 'biolink:' on nodes, edges cannot be fixed after update so they need to be solved either by changing TranQl AND Plater or forking bulk-redisgraph to allow for colons to be added in the edges * Working multi label redis nodes w/ no biolink label * Latest code changes to deploy working Roger in Merge * biolink data move to '.' separator * updates to include new dug fixes, upgraded redis-bulk-loader and made changes to for biolink variables to specify it's domain with a 'biolink.' * adding test roger code * removed helm deployments * change docker owner * remove core.py * remove dup dev config * redis graph is not directly used removing cruft * remove print statement * remove logging files * update requriemtns * update requriemtns * add redis graph.py * fix import error for logger * adding es scheme and ca_path config * adding es scheme and ca_path config * Parameterized annotate tasks with input_data_path and output_data_path * adding debug code * removing debug * adding nodes args * adding biolink. * adding biolink. * Parameterized annotate tasks with input_data_path and output_data_path (#85) * adding lakefs changes to roger-2.0 * point avalon to vg1 branch * change avalon dep * update airflow * fix avalon tag typo * update jenkins to tag version on main branch only * update jenkins to tag version * update jenkins to tag version * psycopg2 installation * add cncf k8s req * use airflow non-slim * simplified for testing * simplified for testing * change dag name * Erroneous parameter passed, should not be None * adding pre-exec * adding pre-exec * adding pre-exec * typo preexec * typo preexec * fix context * get files from repo * get files from repo * get files from repo * get files from repo * First shot at moving pipeline into base class and implementing. Anvil pipeline not complete * Syntax fix, docker image version bump to airflow 2.7.2-python3.11 * update storage dir * update remove dir code * update remove dir code * remote path to * * fix input dir for annotators * fix input dir for annotators * fix input dir for annotators * kwargs to task * kwargs to task * kwargs to task * kwargs to task * kwargs to task * kwargs to task * kwargs to task * kwargs to task * kwargs to task * kwargs to task * kwargs to task * adding branch info on lakefs config * callback push to branch * back to relative import * reformat temp branch name based on unique task id * add logging * add logging * convert posix path to str for avalon * add extra / to root path * New dag created using DugPipeline subclasses * EmptyOperator imported from wrong place * import and syntax fixes * utterly silly syntax error * Added anvil to default input data sets for testing purposes * adding / to local path * commit meta task args empty string * add merge logic * add merge logic * upstream task dir pull for downstream task * Switched from subdag to taskgroup because latest Airflow depricated subdag * Added BACPAC pipeline object * Temporarily ignoring configuration variable for enabled datasets for testing * Passed dag in to create task group to see if it helps dag errors * Fixed silly syntax error * adding input / output dir params for make kgx * Trying different syntax to make taskgroups work. * adding input / output dir params for make kgx * Parsing, syntax, pylint fixes * adding input / output dir params for make kgx * Added pipeline name to task group name to ensure uniqueness * oops, moved something out of scope. Fixed * Filled out pipeline with methods from dug_utils. Needs data path changes * Finished implementing input_data_path and output_data_path handling, pylint cleanup * Update requirements.txt * adding toggle to avoid sending config obj * adding toggle to avoid sending config obj * disable to string for test * control pipelines for testing * add self to anvil get files * add log stream to make it available * typo fix * correcting branch id * adding source repo * adding source repo * patch name-resolver response * no pass input repo and branch , if not overriden to pre-exec * no pass input repo and branch , if not overriden to pre-exec * no pass input repo and branch , if not overriden to pre-exec * dug pipeline edit * recurisvely find recursively * recurisvely find recursively * setup output path for crawling * all task functions should have input and output params * adding annotation as upstream for validate index * revamp create task , and task wrapper * add validate concepts index task * adding concept validation * add index_variables task as dependecy for validate concepts * add index_variables task as dependecy for validate concepts * await client exist * await client exist * concepts not getting picked up for indexing * concepts not getting picked up for indexing * fix search elements * converting annotation output to json * json format annotation outputs * adding support for json format elements and concepts read * json back to dug objects * fixing index valriables with json objects * indetation and new line for better change detection :? * indetation and new line for better change detection * treat dictionary concepts as dictionary * read concepts json as a dict * concepts files are actually file paths * debug message * make output jsonable * clear up dir after commit , and delete unmerged branch even if no changes * don`t clear indexes, parallel dataset processing will be taxed * memory leak? * memory leak? * memory leak? * dumping pickles to debug locally * find out why concepts are being added to every other element * find out why concepts are being added to every other element * pointless shuffle 🤷♂️ * revert back in time * back to sanitize dug * output just json for annotation * adding jsonpickle * jsonpickle 🥒 * unpickle for index * unpickle for validate index * crawling fixes * crawling fixes * crawling validation fixes * fix index concepts * fix makekgx * adding other bdc pipelines * adding pipeline paramters to be able to configure per instance * fix * add input dataset for pipelines * Adding README to document how to create data set-specific pipelines * catchup on base.py * Added dbgap and nida pipelines * fix import errors * annotator modules added by passing config val (#90) * annotator modules added by passing config val * fix merge conflict * following same pattern as parsers , modify configs * fix to dug config method * fix old dug pipeline for backward compatiblity * correct default annotator type * reflective changes * typo extra quotes * annotator type not being picked up from config * remove annotate simple , log env value for lakefs enabled * testing lakefs off * add more logging * add more logging * post init for config to parse to boolean * put back task calls * revert some changes * adding new pipeline * lakefs io support for merge task * fix name * add io params for kg tasks * wire up i/o paths for merge * fix variable name * print files * few debug logs * few debug logs * treat path as path not str * few debug logs * some fixes * logging edge files * bug fix knowledge has edge * re-org graph structure * adding pathing for other tasks * pagenation logic fix for avalon * update lakefs client code * fix glob for get kgx files * fix up get merged objects * send down fake commit id for metadata * working on edges schema * bulk create nodes I/O * find schema file * bulk create edges I/O * bulk create edges I/O * bulk load io * no outputs for final tasks * add recursive glob * fix globbing * oops * delete dags * pin dug to latest release * cruft cleanup * re-org kgx config * add support for multiple initial repos * fix comma * create dir to download to * swap branch and repo * clean up dirs * fix up other pipeline 👌 --------- Co-authored-by: YaphetKG <[email protected]> * Add heal parsers (#96) * annotator modules added by passing config val * fix merge conflict * following same pattern as parsers , modify configs * fix to dug config method * fix old dug pipeline for backward compatiblity * correct default annotator type * reflective changes * typo extra quotes * annotator type not being picked up from config * remove annotate simple , log env value for lakefs enabled * testing lakefs off * add more logging * add more logging * post init for config to parse to boolean * put back task calls * revert some changes * adding new pipeline * lakefs io support for merge task * fix name * add io params for kg tasks * wire up i/o paths for merge * fix variable name * print files * few debug logs * few debug logs * treat path as path not str * few debug logs * some fixes * logging edge files * bug fix knowledge has edge * re-org graph structure * adding pathing for other tasks * pagenation logic fix for avalon * update lakefs client code * fix glob for get kgx files * fix up get merged objects * send down fake commit id for metadata * working on edges schema * bulk create nodes I/O * find schema file * bulk create edges I/O * bulk create edges I/O * bulk load io * no outputs for final tasks * add recursive glob * fix globbing * oops * delete dags * pin dug to latest release * cruft cleanup * re-org kgx config * add support for multiple initial repos * fix comma * create dir to download to * swap branch and repo * clean up dirs * fix up other pipeline 👌 * add remaining pipelines * adding ctn parser * change merge strategy * merge init fix * debug dir * fix topmed file read * fix topmed file read * return file names as strings * topmed kgx builder custom * topmed kgx builder custom * add skip * get files pattern recursive * version pin avalon * pin dug --------- Co-authored-by: braswent <[email protected]> * Add heal parsers (#97) * annotator modules added by passing config val * fix merge conflict * following same pattern as parsers , modify configs * fix to dug config method * fix old dug pipeline for backward compatiblity * correct default annotator type * reflective changes * typo extra quotes * annotator type not being picked up from config * remove annotate simple , log env value for lakefs enabled * testing lakefs off * add more logging * add more logging * post init for config to parse to boolean * put back task calls * revert some changes * adding new pipeline * lakefs io support for merge task * fix name * add io params for kg tasks * wire up i/o paths for merge * fix variable name * print files * few debug logs * few debug logs * treat path as path not str * few debug logs * some fixes * logging edge files * bug fix knowledge has edge * re-org graph structure * adding pathing for other tasks * pagenation logic fix for avalon * update lakefs client code * fix glob for get kgx files * fix up get merged objects * send down fake commit id for metadata * working on edges schema * bulk create nodes I/O * find schema file * bulk create edges I/O * bulk create edges I/O * bulk load io * no outputs for final tasks * add recursive glob * fix globbing * oops * delete dags * pin dug to latest release * cruft cleanup * re-org kgx config * add support for multiple initial repos * fix comma * create dir to download to * swap branch and repo * clean up dirs * fix up other pipeline 👌 * add remaining pipelines * adding ctn parser * change merge strategy * merge init fix * debug dir * fix topmed file read * fix topmed file read * return file names as strings * topmed kgx builder custom * topmed kgx builder custom * add skip * get files pattern recursive * version pin avalon * pin dug --------- Co-authored-by: braswent <[email protected]> * Radx pipeline (#99) * point to large download * fix schema path * debug bulk input dir * fix schema read * fix schema read * fix schema read * commenting steup dir for test * adding logs * fix path stuff * add commented stuff back in * testing radx parser * adding parser * skip indexing vars with no id * adding indexes as part of bulk loader paramters * fix id index cli arg * fix local cli * dug latest --------- Co-authored-by: Nathan Braswell <[email protected]> Co-authored-by: esurface <[email protected]> Co-authored-by: braswent <[email protected]> Co-authored-by: Michael T. Bacon <[email protected]> Co-authored-by: Michael T Bacon <[email protected]>
- Loading branch information