diff --git a/README.md b/README.md new file mode 100644 index 000000000..2426e1ce8 --- /dev/null +++ b/README.md @@ -0,0 +1,368 @@ +# A dynamic benchmark for gene regulatory network (GRN) inference + + + + +Benchmarking GRN inference methods The full documentation is hosted on +[ReadTheDocs](https://openproblems-grn-task.readthedocs.io/en/latest/index.html). +[![Documentation +Status](https://readthedocs.org/projects/grn-inference-benchmarking/badge/?version=latest.png)](https://grn-inference-benchmarking.readthedocs.io/en/latest/?badge=latest) + +Path to source: +[`src`](https://github.com/openproblems-bio/task_grn_inference/tree/main/src) + +## README + +## Installation + +You need to have Docker, Java, and Viash installed. Follow [these +instructions](https://openproblems.bio/documentation/fundamentals/requirements) +to install the required dependencies. + +## Download resources + +``` bash +git clone git@github.com:openproblems-bio/task_grn_inference.git + +cd task_grn_inference + +# download resources +scripts/download_resources.sh +``` + +## Infer a GRN + +``` bash +viash run src/methods/dummy/config.vsh.yaml -- --multiomics_rna resources/grn-benchmark/multiomics_rna.h5ad --multiomics_atac resources/grn-benchmark/multiomics_atac.h5ad --prediction output/dummy.csv +``` + +Similarly, run the command for other methods. + +## Evaluate a GRN + +``` bash +scripts/benchmark_grn.sh --grn resources/grn-benchmark/models/collectri.csv +``` + +Similarly, run the command for other GRN models. + +## Add a method + +To add a method to the repository, follow the instructions in the +`scripts/add_a_method.sh` script. + +## Motivation + +GRNs are essential for understanding cellular identity and behavior. +They are simplified models of gene expression regulated by complex +processes involving multiple layers of control, from transcription to +post-transcriptional modifications, incorporating various regulatory +elements and non-coding RNAs. Gene transcription is controlled by a +regulatory complex that includes transcription factors (TFs), +cis-regulatory elements (CREs) like promoters and enhancers, and +essential co-factors. High-throughput datasets, covering thousands of +genes, facilitate the use of machine learning approaches to decipher +GRNs. The advent of single-cell sequencing technologies, such as +scRNA-seq, has made it possible to infer GRNs from a single experiment +due to the abundance of samples. This allows researchers to infer +condition-specific GRNs, such as for different cell types or diseases, +and study potential regulatory factors associated with these conditions. +Combining chromatin accessibility data with gene expression measurements +has led to the development of enhancer-driven GRN (eGRN) inference +pipelines, which offer significantly improved accuracy over +single-modality methods. + +## Description + +Here, we present a dynamic benchmark platform for GRN inference. This +platform provides curated datasets for GRN inference and evaluation, +standardized evaluation protocols and metrics, computational +infrastructure, and a dynamically updated leaderboard to track +state-of-the-art methods. It runs novel GRNs in the cloud, offers +competition scores, and stores them for future comparisons, reflecting +new developments over time. + +The platform supports the integration of new datasets and protocols. +When a new feature is added, previously evaluated GRNs are re-assessed, +and the leaderboard is updated accordingly. The aim is to evaluate both +the accuracy and completeness of inferred GRNs. It is designed for both +single-modality and multi-omics GRN inference. Ultimately, it is a +community-driven platform. So far, six eGRN inference methods have been +integrated: Scenic+, CellOracle, FigR, scGLUE, GRaNIE, and ANANSE. + +Due to its flexible nature, the platform can incorporate various +benchmark datasets and evaluation methods, using either prior knowledge +or feature-based approaches. In the current version, due to the absence +of standardized prior knowledge, we use a feature-based approach to +benchmark GRNs. Our evaluation utilizes standardized datasets for GRN +inference and evaluation, employing multiple regression analysis +approaches to assess both accuracy and comprehensiveness. + +## Authors & contributors + +| name | roles | +|:------------------|:------------| +| Jalil Nourisa | author | +| Robrecht Cannoodt | author | +| Antoine Passimier | contributor | +| Christian Arnold | contributor | +| Marco Stock | contributor | + +## API + +``` mermaid +flowchart LR + file_perturbation_h5ad("perturbation") + comp_control_method[/"Control Method"/] + comp_metric[/"Label"/] + file_prediction("GRN") + file_score("Score") + file_multiomics_rna_h5ad("multiomics rna") + comp_method[/"Method"/] + file_multiomics_atac_h5ad("multiomics atac") + comp_method_r[/"Method r"/] + file_perturbation_h5ad---comp_control_method + file_perturbation_h5ad---comp_metric + comp_control_method-->file_prediction + comp_metric-->file_score + file_prediction---comp_metric + file_multiomics_rna_h5ad---comp_method + comp_method-->file_prediction + file_multiomics_atac_h5ad---comp_method + comp_method_r-->file_prediction +``` + +## File format: perturbation + +Perturbation dataset for benchmarking. + +Example file: `resources_test/grn-benchmark/perturbation_data.h5ad` + +Format: + +
+ + AnnData object + obs: 'cell_type', 'sm_name', 'donor_id', 'plate_name', 'row', 'well', 'cell_count' + layers: 'n_counts', 'pearson', 'lognorm' + +
+ +Slot description: + +
+ +| Slot | Type | Description | +|:---|:---|:---| +| `obs["cell_type"]` | `string` | The annotated cell type of each cell based on RNA expression. | +| `obs["sm_name"]` | `string` | The primary name for the (parent) compound (in a standardized representation) as chosen by LINCS. This is provided to map the data in this experiment to the LINCS Connectivity Map data. | +| `obs["donor_id"]` | `string` | Donor id. | +| `obs["plate_name"]` | `string` | Plate name 6 levels. | +| `obs["row"]` | `string` | Row name on the plate. | +| `obs["well"]` | `string` | Well name on the plate. | +| `obs["cell_count"]` | `string` | Number of single cells pseudobulked. | +| `layers["n_counts"]` | `double` | Pseudobulked values using mean approach. | +| `layers["pearson"]` | `double` | (*Optional*) Normalized values using pearson residuals. | +| `layers["lognorm"]` | `double` | (*Optional*) Normalized values using shifted logarithm . | + +
+ +## Component type: Control Method + +Path: +[`src/control_methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/control_methods) + +A control method. + +Arguments: + +
+ +| Name | Type | Description | +|:---|:---|:---| +| `--perturbation_data` | `file` | Perturbation dataset for benchmarking. | +| `--layer` | `string` | (*Optional*) Which layer of pertubation data to use to find tf-gene relationships. Default: `scgen_pearson`. | +| `--prediction` | `file` | (*Output*) GRN prediction. | +| `--tf_all` | `file` | (*Optional*) NA. | + +
+ +## Component type: Label + +Path: +[`src/metrics`](https://github.com/openproblems-bio/openproblems/tree/main/src/metrics) + +A metric to evaluate the performance of the inferred GRN + +Arguments: + +
+ +| Name | Type | Description | +|:---|:---|:---| +| `--perturbation_data` | `file` | (*Optional*) Perturbation dataset for benchmarking. | +| `--prediction` | `file` | GRN prediction. | +| `--score` | `file` | (*Optional, Output*) File indicating the score of a metric. | +| `--reg_type` | `string` | (*Optional*) name of regretion to use. Default: `ridge`. | +| `--subsample` | `integer` | (*Optional*) number of samples randomly drawn from perturbation data. Default: `-2`. | +| `--max_workers` | `integer` | (*Optional*) NA. Default: `4`. | +| `--method_id` | `string` | (*Optional*) NA. | +| `--tf_all` | `file` | (*Optional*) NA. | +| `--apply_tf` | `boolean` | (*Optional*) NA. Default: `TRUE`. | + +
+ +## File format: GRN + +GRN prediction + +Example file: `resources_test/grn_models/collectri.csv` + +Format: + +
+ + Tabular data + 'source', 'target', 'weight' + +
+ +Slot description: + +
+ +| Column | Type | Description | +|:---------|:---------|:----------------------| +| `source` | `string` | Source of regulation. | +| `target` | `string` | Target of regulation. | +| `weight` | `float` | Weight of regulation. | + +
+ +## File format: Score + +File indicating the score of a metric. + +Example file: `resources_test/scores/score.h5ad` + +Format: + +
+ + AnnData object + uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values' + +
+ +Slot description: + +
+ +| Slot | Type | Description | +|:---|:---|:---| +| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | +| `uns["method_id"]` | `string` | A unique identifier for the method. | +| `uns["metric_ids"]` | `string` | One or more unique metric identifiers. | +| `uns["metric_values"]` | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. | + +
+ +## File format: multiomics rna + +RNA expression for multiomics data. + +Example file: `resources_test/grn-benchmark/multiomics_rna.h5ad` + +Format: + +
+ + AnnData object + obs: 'cell_type', 'donor_id' + +
+ +Slot description: + +
+ +| Slot | Type | Description | +|:---|:---|:---| +| `obs["cell_type"]` | `string` | The annotated cell type of each cell based on RNA expression. | +| `obs["donor_id"]` | `string` | Donor id. | + +
+ +## Component type: Method + +Path: +[`src/methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/methods) + +A GRN inference method + +Arguments: + +
+ +| Name | Type | Description | +|:---|:---|:---| +| `--multiomics_rna` | `file` | (*Optional*) RNA expression for multiomics data. | +| `--multiomics_atac` | `file` | (*Optional*) Peak data for multiomics data. | +| `--prediction` | `file` | (*Optional, Output*) GRN prediction. | +| `--temp_dir` | `string` | (*Optional*) NA. Default: `output/temdir`. | +| `--num_workers` | `integer` | (*Optional*) NA. Default: `4`. | +| `--tf_all` | `file` | (*Optional*) NA. | +| `--max_n_links` | `integer` | (*Optional*) NA. Default: `50000`. | + +
+ +## File format: multiomics atac + +Peak data for multiomics data. + +Example file: `resources_test/grn-benchmark/multiomics_atac.h5ad` + +Format: + +
+ + AnnData object + obs: 'cell_type', 'donor_id' + +
+ +Slot description: + +
+ +| Slot | Type | Description | +|:---|:---|:---| +| `obs["cell_type"]` | `string` | The annotated cell type of each cell based on RNA expression. | +| `obs["donor_id"]` | `string` | Donor id. | + +
+ +## Component type: Method r + +Path: +[`src/methods_r`](https://github.com/openproblems-bio/openproblems/tree/main/src/methods_r) + +A GRN inference method + +Arguments: + +
+ +| Name | Type | Description | +|:----------------------|:----------|:-------------------------------------------| +| `--multiomics_rna_r` | `file` | (*Optional*) NA. | +| `--multiomics_atac_r` | `file` | (*Optional*) NA. | +| `--prediction` | `file` | (*Optional, Output*) GRN prediction. | +| `--temp_dir` | `string` | (*Optional*) NA. Default: `output/temdir`. | +| `--num_workers` | `integer` | (*Optional*) NA. Default: `4`. | + +
+ diff --git a/nextflow.config b/nextflow.config new file mode 100644 index 000000000..a3fd6d7fe --- /dev/null +++ b/nextflow.config @@ -0,0 +1,3 @@ +process.container = 'nextflow/bash:latest' + +process.errorStrategy = "ignore" \ No newline at end of file diff --git a/notebooks/process_results.ipynb b/notebooks/process_results.ipynb index e6345995d..402d643c3 100644 --- a/notebooks/process_results.ipynb +++ b/notebooks/process_results.ipynb @@ -51,7 +51,7 @@ "metadata": {}, "outputs": [], "source": [ - "base_folder = '../../task_grn_benchmark/resources/results/subsample_200_gb/'" + "base_folder = '../../task_grn_inference/resources/results/subsample_200_gb/'" ] }, { diff --git a/runs.ipynb b/runs.ipynb index bb1f3e1a0..93bc2f67a 100644 --- a/runs.ipynb +++ b/runs.ipynb @@ -13,6 +13,61 @@ "grn_models = ['collectri','granie', 'figr', 'celloracle', 'scglue', 'scenicplus']" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## GRN inference: multiomics" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### scGLUE" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "download: s3://openproblems-data/resources/grn/results/scglue/state.yaml to resources/results/scglue/state.yaml\n", + "download: s3://openproblems-data/resources/grn/results/scglue/trace.txt to resources/results/scglue/trace.txt\n", + "download: s3://openproblems-data/resources/grn/results/scglue/output/grn.csv to resources/results/scglue/output/grn.csv\n" + ] + } + ], + "source": [ + "!aws s3 sync s3://openproblems-data/resources/grn/results/scglue ./resources/results/scglue" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[37mReading input files\u001b[0m\n", + "\u001b[37mCalculate basic stats\u001b[0m\n", + "\u001b[37mOutputting stats to : /viash_automount/mnt/c/Users/nourisa/Documents/testProjs/ongoing/task_grn_benchmark/output/stats.json\u001b[0m\n", + "\u001b[37mTopological analysis\u001b[0m\n", + "\u001b[37mPlotting tf-gene in degree, dir: /viash_automount/mnt/c/Users/nourisa/Documents/testProjs/ongoing/task_grn_benchmark/output/tf_gene_indegree.png\u001b[0m\n", + "\u001b[37mPlotting tf-gene out degree, dir: /viash_automount/mnt/c/Users/nourisa/Documents/testProjs/ongoing/task_grn_benchmark/output/tf_gene_outdegree.png\u001b[0m\n" + ] + } + ], + "source": [ + "!viash run src/exp_analysis/config.vsh.yaml -- --tf_gene_net resources/results/scglue/output/grn.csv" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -32,7 +87,7 @@ "temporaryFolder: /tmp/viash_hub_repo10489538114835231235 uri: https://github.com/openproblems-bio/openproblems-v2.git\n", "Cloning into '.'...\n", "checkout out: List(git, checkout, origin/main_build, --, .) 0 \n", - "\u001b[37mExporting run_grn_evaluation (workflows) =nextflow=> /mnt/c/Users/nourisa/Documents/testProjs/ongoing/task_grn_benchmark/target/nextflow/workflows/run_grn_evaluation\u001b[0m\n", + "\u001b[37mExporting run_grn_evaluation (workflows) =nextflow=> /mnt/c/Users/nourisa/Documents/testProjs/ongoing/task_grn_inference/target/nextflow/workflows/run_grn_evaluation\u001b[0m\n", "\u001b[33mNot all configs built successfully\u001b[0m\n", "\u001b[33m 39 configs were disabled\u001b[0m\n", "\u001b[32m 1/1 configs built successfully\u001b[0m\n" @@ -90,7 +145,7 @@ "temporaryFolder: /tmp/viash_hub_repo10268443552668753296 uri: https://github.com/openproblems-bio/openproblems-v2.git\n", "Cloning into '.'...\n", "checkout out: List(git, checkout, origin/main_build, --, .) 0 \n", - "\u001b[37mExporting run_robustness_analysis (workflows) =nextflow=> /mnt/c/Users/nourisa/Documents/testProjs/ongoing/task_grn_benchmark/target/nextflow/workflows/run_robustness_analysis\u001b[0m\n", + "\u001b[37mExporting run_robustness_analysis (workflows) =nextflow=> /mnt/c/Users/nourisa/Documents/testProjs/ongoing/task_grn_inference/target/nextflow/workflows/run_robustness_analysis\u001b[0m\n", "\u001b[33mNot all configs built successfully\u001b[0m\n", "\u001b[33m 39 configs were disabled\u001b[0m\n", "\u001b[32m 1/1 configs built successfully\u001b[0m\n" @@ -1746,143 +1801,315 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "import anndata as ad \n", - "import numpy as np\n", - "import pandas as pd\n", - "import anndata as ad\n", - "import scanpy as sc\n", - "import sys\n", - "import numpy as np\n", - "from sklearn.preprocessing import StandardScaler\n", - "from tqdm import tqdm\n", - "\n", - "par = {\n", - " \"multiomics_rna\": \"resources/grn-benchmark/multiomics_rna.h5ad\",\n", - " \"layer\": \"pearson\",\n", - " \"tf_all\": \"resources/prior/tf_all.csv\"\n", - "}\n", - "def create_positive_control(X: np.ndarray, groups: np.ndarray):\n", - " grns = []\n", - " for group in tqdm(np.unique(groups), desc=\"Processing groups\"):\n", - " X_sub = X[groups == group, :]\n", - " X_sub = StandardScaler().fit_transform(X_sub)\n", - " grn = np.dot(X_sub.T, X_sub) / X_sub.shape[0]\n", - " grns.append(grn)\n", - " return np.mean(grns, axis=0)\n", - "multiomics_rna = ad.read_h5ad(par[\"multiomics_rna\"])\n", - "gene_names = multiomics_rna.var_names.to_numpy()\n", - "tf_all = np.loadtxt(par['tf_all'], dtype=str)\n", - "groups = multiomics_rna.obs.cell_type\n", - "tf_all = np.intersect1d(tf_all, gene_names)" + "!bash scripts/run_robust_analys_causal.sh " ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 3, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "download: s3://openproblems-data/resources/grn/results/robust_analy_causal/state.yaml to resources/results/robust_analy_causal/state.yaml\n", + "download: s3://openproblems-data/resources/grn/results/robust_analy_causal/scores.yaml to resources/results/robust_analy_causal/scores.yaml\n", + "download: s3://openproblems-data/resources/grn/results/robust_analy_causal/trace.txt to resources/results/robust_analy_causal/trace.txt\n", + "download: s3://openproblems-data/resources/grn/results/robust_analy_causal/metric_configs.yaml to resources/results/robust_analy_causal/metric_configs.yaml\n" + ] + } + ], "source": [ - "sc.pp.normalize_total(multiomics_rna)\n", - "sc.pp.log1p(multiomics_rna)\n", - "sc.pp.scale(multiomics_rna)" + "!aws s3 sync s3://openproblems-data/resources/grn/results/robust_analy_causal ./resources/results/robust_analy_causal" ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 10, "metadata": {}, "outputs": [ { - "name": "stderr", + "name": "stdout", + "output_type": "stream", + "text": [ + "reg1-corr-5\n", + "reg1-corr-3\n", + "reg1-corr-2\n", + "reg1-corr-10\n", + "reg1-corr-1\n", + "reg1-corr-9\n", + "reg1-corr-8\n", + "reg1-corr-6\n", + "reg1-corr-7\n", + "reg1-corr-4\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ex(False)_tf(-1)ex(True)_tf(-1)Mean
corr_50.1604330.1687580.164596
corr_30.1918520.2021960.197024
corr_20.2026650.2135070.208086
corr_100.1664900.1752330.170861
corr_10.2226730.2346880.228680
corr_90.1825400.1922680.187404
corr_80.1867100.1966920.191701
corr_60.2172030.2290860.223144
corr_70.2380840.2509730.244528
corr_40.1788510.1879960.183423
\n", + "
" + ], + "text/plain": [ + " ex(False)_tf(-1) ex(True)_tf(-1) Mean\n", + "corr_5 0.160433 0.168758 0.164596\n", + "corr_3 0.191852 0.202196 0.197024\n", + "corr_2 0.202665 0.213507 0.208086\n", + "corr_10 0.166490 0.175233 0.170861\n", + "corr_1 0.222673 0.234688 0.228680\n", + "corr_9 0.182540 0.192268 0.187404\n", + "corr_8 0.186710 0.196692 0.191701\n", + "corr_6 0.217203 0.229086 0.223144\n", + "corr_7 0.238084 0.250973 0.244528\n", + "corr_4 0.178851 0.187996 0.183423" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", "output_type": "stream", "text": [ - "Processing groups: 100%|██████████| 4/4 [00:55<00:00, 13.82s/it]\n" + "\u001b[37mRead data\u001b[0m\n", + "\u001b[37m\u001b[0m\n", + "\u001b[37mNoramlize data\u001b[0m\n", + "\u001b[37mCreate corr net\u001b[0m\n", + "\u001b[37mProcessing groups: 0%| | 0/4 [00:00 5\u001b[0m \u001b[43mnet_corr\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mto_csv\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43mf\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43moutput/causal/grns/net_corr_\u001b[39;49m\u001b[38;5;132;43;01m{\u001b[39;49;00m\u001b[43mi\u001b[49m\u001b[38;5;132;43;01m}\u001b[39;49;00m\u001b[38;5;124;43m.csv\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n", - "File \u001b[0;32m~/anaconda3/envs/py10/lib/python3.10/site-packages/pandas/util/_decorators.py:333\u001b[0m, in \u001b[0;36mdeprecate_nonkeyword_arguments..decorate..wrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 327\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(args) \u001b[38;5;241m>\u001b[39m num_allow_args:\n\u001b[1;32m 328\u001b[0m warnings\u001b[38;5;241m.\u001b[39mwarn(\n\u001b[1;32m 329\u001b[0m msg\u001b[38;5;241m.\u001b[39mformat(arguments\u001b[38;5;241m=\u001b[39m_format_argument_list(allow_args)),\n\u001b[1;32m 330\u001b[0m \u001b[38;5;167;01mFutureWarning\u001b[39;00m,\n\u001b[1;32m 331\u001b[0m stacklevel\u001b[38;5;241m=\u001b[39mfind_stack_level(),\n\u001b[1;32m 332\u001b[0m )\n\u001b[0;32m--> 333\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", - "File \u001b[0;32m~/anaconda3/envs/py10/lib/python3.10/site-packages/pandas/core/generic.py:3967\u001b[0m, in \u001b[0;36mNDFrame.to_csv\u001b[0;34m(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, lineterminator, chunksize, date_format, doublequote, escapechar, decimal, errors, storage_options)\u001b[0m\n\u001b[1;32m 3956\u001b[0m df \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(\u001b[38;5;28mself\u001b[39m, ABCDataFrame) \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mto_frame()\n\u001b[1;32m 3958\u001b[0m formatter \u001b[38;5;241m=\u001b[39m DataFrameFormatter(\n\u001b[1;32m 3959\u001b[0m frame\u001b[38;5;241m=\u001b[39mdf,\n\u001b[1;32m 3960\u001b[0m header\u001b[38;5;241m=\u001b[39mheader,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 3964\u001b[0m decimal\u001b[38;5;241m=\u001b[39mdecimal,\n\u001b[1;32m 3965\u001b[0m )\n\u001b[0;32m-> 3967\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mDataFrameRenderer\u001b[49m\u001b[43m(\u001b[49m\u001b[43mformatter\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mto_csv\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 3968\u001b[0m \u001b[43m \u001b[49m\u001b[43mpath_or_buf\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3969\u001b[0m \u001b[43m \u001b[49m\u001b[43mlineterminator\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mlineterminator\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3970\u001b[0m \u001b[43m \u001b[49m\u001b[43msep\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43msep\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3971\u001b[0m \u001b[43m \u001b[49m\u001b[43mencoding\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mencoding\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3972\u001b[0m \u001b[43m \u001b[49m\u001b[43merrors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43merrors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3973\u001b[0m \u001b[43m \u001b[49m\u001b[43mcompression\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcompression\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3974\u001b[0m \u001b[43m \u001b[49m\u001b[43mquoting\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mquoting\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3975\u001b[0m \u001b[43m \u001b[49m\u001b[43mcolumns\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcolumns\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3976\u001b[0m \u001b[43m \u001b[49m\u001b[43mindex_label\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mindex_label\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3977\u001b[0m \u001b[43m \u001b[49m\u001b[43mmode\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmode\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3978\u001b[0m \u001b[43m \u001b[49m\u001b[43mchunksize\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mchunksize\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3979\u001b[0m \u001b[43m \u001b[49m\u001b[43mquotechar\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mquotechar\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3980\u001b[0m \u001b[43m \u001b[49m\u001b[43mdate_format\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdate_format\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3981\u001b[0m \u001b[43m \u001b[49m\u001b[43mdoublequote\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdoublequote\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3982\u001b[0m \u001b[43m \u001b[49m\u001b[43mescapechar\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mescapechar\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3983\u001b[0m \u001b[43m \u001b[49m\u001b[43mstorage_options\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstorage_options\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3984\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n", - "File \u001b[0;32m~/anaconda3/envs/py10/lib/python3.10/site-packages/pandas/io/formats/format.py:1014\u001b[0m, in \u001b[0;36mDataFrameRenderer.to_csv\u001b[0;34m(self, path_or_buf, encoding, sep, columns, index_label, mode, compression, quoting, quotechar, lineterminator, chunksize, date_format, doublequote, escapechar, errors, storage_options)\u001b[0m\n\u001b[1;32m 993\u001b[0m created_buffer \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mFalse\u001b[39;00m\n\u001b[1;32m 995\u001b[0m csv_formatter \u001b[38;5;241m=\u001b[39m CSVFormatter(\n\u001b[1;32m 996\u001b[0m path_or_buf\u001b[38;5;241m=\u001b[39mpath_or_buf,\n\u001b[1;32m 997\u001b[0m lineterminator\u001b[38;5;241m=\u001b[39mlineterminator,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 1012\u001b[0m formatter\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mfmt,\n\u001b[1;32m 1013\u001b[0m )\n\u001b[0;32m-> 1014\u001b[0m \u001b[43mcsv_formatter\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msave\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1016\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m created_buffer:\n\u001b[1;32m 1017\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(path_or_buf, StringIO)\n", - "File \u001b[0;32m~/anaconda3/envs/py10/lib/python3.10/site-packages/pandas/io/formats/csvs.py:270\u001b[0m, in \u001b[0;36mCSVFormatter.save\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 251\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m get_handle(\n\u001b[1;32m 252\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mfilepath_or_buffer,\n\u001b[1;32m 253\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmode,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 258\u001b[0m ) \u001b[38;5;28;01mas\u001b[39;00m handles:\n\u001b[1;32m 259\u001b[0m \u001b[38;5;66;03m# Note: self.encoding is irrelevant here\u001b[39;00m\n\u001b[1;32m 260\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mwriter \u001b[38;5;241m=\u001b[39m csvlib\u001b[38;5;241m.\u001b[39mwriter(\n\u001b[1;32m 261\u001b[0m handles\u001b[38;5;241m.\u001b[39mhandle,\n\u001b[1;32m 262\u001b[0m lineterminator\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mlineterminator,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 267\u001b[0m quotechar\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mquotechar,\n\u001b[1;32m 268\u001b[0m )\n\u001b[0;32m--> 270\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_save\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n", - "File \u001b[0;32m~/anaconda3/envs/py10/lib/python3.10/site-packages/pandas/io/formats/csvs.py:275\u001b[0m, in \u001b[0;36mCSVFormatter._save\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 273\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_need_to_save_header:\n\u001b[1;32m 274\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_save_header()\n\u001b[0;32m--> 275\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_save_body\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n", - "File \u001b[0;32m~/anaconda3/envs/py10/lib/python3.10/site-packages/pandas/io/formats/csvs.py:313\u001b[0m, in \u001b[0;36mCSVFormatter._save_body\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 311\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m start_i \u001b[38;5;241m>\u001b[39m\u001b[38;5;241m=\u001b[39m end_i:\n\u001b[1;32m 312\u001b[0m \u001b[38;5;28;01mbreak\u001b[39;00m\n\u001b[0;32m--> 313\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_save_chunk\u001b[49m\u001b[43m(\u001b[49m\u001b[43mstart_i\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mend_i\u001b[49m\u001b[43m)\u001b[49m\n", - "File \u001b[0;32m~/anaconda3/envs/py10/lib/python3.10/site-packages/pandas/io/formats/csvs.py:324\u001b[0m, in \u001b[0;36mCSVFormatter._save_chunk\u001b[0;34m(self, start_i, end_i)\u001b[0m\n\u001b[1;32m 321\u001b[0m data \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mlist\u001b[39m(res\u001b[38;5;241m.\u001b[39m_iter_column_arrays())\n\u001b[1;32m 323\u001b[0m ix \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mdata_index[slicer]\u001b[38;5;241m.\u001b[39m_get_values_for_csv(\u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_number_format)\n\u001b[0;32m--> 324\u001b[0m \u001b[43mlibwriters\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mwrite_csv_rows\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 325\u001b[0m \u001b[43m \u001b[49m\u001b[43mdata\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 326\u001b[0m \u001b[43m \u001b[49m\u001b[43mix\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 327\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mnlevels\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 328\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcols\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 329\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mwriter\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 330\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n", - "File \u001b[0;32mwriters.pyx:56\u001b[0m, in \u001b[0;36mpandas._libs.writers.write_csv_rows\u001b[0;34m()\u001b[0m\n", - "\u001b[0;31mKeyboardInterrupt\u001b[0m: " + "name": "stderr", + "output_type": "stream", + "text": [ + "Processing groups: 100%|██████████| 4/4 [00:43<00:00, 10.93s/it]\n" ] } ], "source": [ - "for i in range(n_iter):\n", - " net_corr = net.sample(len(tfs), axis=1)\n", - " net_corr = net_corr.reset_index().melt(id_vars='index', var_name='source', value_name='weight')\n", - " net_corr.rename(columns={'index': 'target'}, inplace=True)\n", - " net_corr.to_csv(f'output/causal/grns/net_corr_{i}.csv')" + "def corr_grn(X: np.ndarray, groups: np.ndarray):\n", + " grns = []\n", + " for group in tqdm(np.unique(groups), desc=\"Processing groups\"):\n", + " X_sub = X[groups == group, :]\n", + " X_sub = StandardScaler().fit_transform(X_sub)\n", + " grn = np.dot(X_sub.T, X_sub) / X_sub.shape[0]\n", + " grns.append(grn)\n", + " return np.mean(grns, axis=0)\n", + "groups = multiomics_rna.obs.cell_type\n", + "corr_net = corr_grn(X, groups)" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 79, + "metadata": {}, + "outputs": [], + "source": [ + "corr_net = pd.DataFrame(corr_net, index=multiomics_rna.var_names, columns=multiomics_rna.var_names)" + ] + }, + { + "cell_type": "code", + "execution_count": 99, + "metadata": {}, + "outputs": [], + "source": [ + "tfs = corr_net.abs().sum(axis=0).argsort()[::-1][:1000].index.to_numpy()\n", + "corr_net_sub = corr_net[tfs]" + ] + }, + { + "cell_type": "code", + "execution_count": 100, "metadata": {}, "outputs": [], "source": [ - "net_pc = net[tfs]\n", - "net_pc = net_pc.reset_index().melt(id_vars='index', var_name='source', value_name='weight')\n", - "net_pc.rename(columns={'index': 'target'}, inplace=True)\n", - "net_pc.to_csv('output/causal/grns/net_pc.csv')" + "corr_net_sub = corr_net_sub.reset_index().melt(id_vars='location', var_name='source', value_name='weight')\n", + "corr_net_sub.rename(columns={'location': 'target'}, inplace=True)\n", + "corr_net_sub.to_csv('output/causal/grns/corr_net_sub.csv')" ] }, { "cell_type": "code", - "execution_count": 54, + "execution_count": 101, "metadata": {}, "outputs": [ { @@ -1895,1880 +2122,13 @@ "ex(False)_tf(-1)\n", "\n", "Processing groups: 0%| | 0/5 [00:00> Downloading resources" -# viash run src/common/sync_test_resources/config.vsh.yaml -- \ -# --input "s3://openproblems-data/resources_test/grn/" \ -# --output "resources_test" \ -# --delete - viash run src/common/sync_test_resources/config.vsh.yaml -- \ - --input "resources_test" \ - --output "s3://openproblems-data/resources_test/grn/"\ + --input "s3://openproblems-data/resources_test/grn/" \ + --output "resources_test" \ --delete + +# viash run src/common/sync_test_resources/config.vsh.yaml -- \ +# --input "resources_test" \ +# --output "s3://openproblems-data/resources_test/grn/"\ +# --delete diff --git a/scripts/render_readme.sh b/scripts/render_readme.sh old mode 100644 new mode 100755 index 176978554..d72583a94 --- a/scripts/render_readme.sh +++ b/scripts/render_readme.sh @@ -2,10 +2,8 @@ set -e -[[ ! -d ../openproblems-v2 ]] && echo "You need to clone the openproblems-v2 repository next to this repository" && exit 1 - -../openproblems-v2/bin/create_task_readme \ +viash run src/common/create_task_readme/config.vsh.yaml -- \ --task "grn_benchmark" \ --task_dir "src" \ - --github_url "https://github.com/openproblems-bio/task_grn_benchmark/tree/main/" \ + --github_url "https://github.com/openproblems-bio/task_grn_inference/tree/main/" \ --output "README.md" diff --git a/scripts/repo/run_grn_evaluation_all_layers.sh b/scripts/repo/run_grn_evaluation_all_layers.sh index b5e474f6c..f4435c94f 100644 --- a/scripts/repo/run_grn_evaluation_all_layers.sh +++ b/scripts/repo/run_grn_evaluation_all_layers.sh @@ -77,7 +77,7 @@ HERE # -params-file ${param_file} # ./tw-windows-x86_64.exe launch ` -# https://github.com/openproblems-bio/task_grn_benchmark.git ` +# https://github.com/openproblems-bio/task_grn_inference.git ` # --revision build/main ` # --pull-latest ` # --main-script target/nextflow/workflows/run_grn_evaluation/main.nf ` diff --git a/scripts/run_benchmark_single_omics.sh b/scripts/run_benchmark_single_omics.sh old mode 100644 new mode 100755 index 1860e9a35..fa7d88368 --- a/scripts/run_benchmark_single_omics.sh +++ b/scripts/run_benchmark_single_omics.sh @@ -39,7 +39,7 @@ HERE # -params-file ${param_file} # ./tw-windows-x86_64.exe launch ` -# https://github.com/openproblems-bio/task_grn_benchmark.git ` +# https://github.com/openproblems-bio/task_grn_inference.git ` # --revision build/main ` # --pull-latest ` # --main-script target/nextflow/workflows/run_benchmark_single_omics/main.nf ` diff --git a/scripts/run_grn_evaluation.sh b/scripts/run_grn_evaluation.sh old mode 100644 new mode 100755 index 6cc7011cf..eeec944a1 --- a/scripts/run_grn_evaluation.sh +++ b/scripts/run_grn_evaluation.sh @@ -86,7 +86,7 @@ nextflow run . \ -params-file ${param_file} # ./tw-windows-x86_64.exe launch ` -# https://github.com/openproblems-bio/task_grn_benchmark.git ` +# https://github.com/openproblems-bio/task_grn_inference.git ` # --revision build/main ` # --pull-latest ` # --main-script target/nextflow/workflows/run_grn_evaluation/main.nf ` diff --git a/scripts/run_pc_vs_nc.sh b/scripts/run_pc_vs_nc.sh old mode 100644 new mode 100755 index 055d43f98..cc2e0e5f5 --- a/scripts/run_pc_vs_nc.sh +++ b/scripts/run_pc_vs_nc.sh @@ -72,7 +72,7 @@ HERE # -params-file ${param_file} # ./tw-windows-x86_64.exe launch ` -# https://github.com/openproblems-bio/task_grn_benchmark.git ` +# https://github.com/openproblems-bio/task_grn_inference.git ` # --revision build/main ` # --pull-latest ` # --main-script target/nextflow/workflows/run_grn_evaluation/main.nf ` diff --git a/scripts/run_process_perturbation_tw.sh b/scripts/run_process_perturbation_tw.sh old mode 100644 new mode 100755 index fe2d6e2ff..5f92d1631 --- a/scripts/run_process_perturbation_tw.sh +++ b/scripts/run_process_perturbation_tw.sh @@ -14,7 +14,7 @@ publish_dir: "$publish_dir" HERE - ./tw-windows-x86_64.exe launch https://github.com/openproblems-bio/task_grn_benchmark.git ` + ./tw-windows-x86_64.exe launch https://github.com/openproblems-bio/task_grn_inference.git ` --revision build/main --pull-latest ` --main-script target/nextflow/workflows/process_perturbation/main.nf ` --workspace 53907369739130 --compute-env 6TeIFgV5OY4pJCk8I0bfOh ` diff --git a/scripts/run_robust_analys.sh b/scripts/run_robust_analys.sh old mode 100644 new mode 100755 diff --git a/scripts/run_robust_analys_causal.sh b/scripts/run_robust_analys_causal.sh old mode 100644 new mode 100755 index cc3fc8494..db5f738ca --- a/scripts/run_robust_analys_causal.sh +++ b/scripts/run_robust_analys_causal.sh @@ -1,8 +1,8 @@ #!/bin/bash -viash ns build --parallel +# viash ns build --parallel RUN_ID="robust_analy_causal" -resources_dir="resources" -# resources_dir="s3://openproblems-data/resources/grn" +# resources_dir="resources" +resources_dir="s3://openproblems-data/resources/grn" publish_dir="${resources_dir}/results/${RUN_ID}" @@ -34,7 +34,7 @@ HERE layers=("pearson") # Array containing the layer(s) for layer in "${layers[@]}"; do # Iterate over each layer in the array - for iter in {1..100}; do # Loop from 1 to 100 iterations + for iter in {1..10}; do # Loop from 1 to 100 iterations append_entry "$iter" "$layer" # Execute the append_entry command done done @@ -46,10 +46,10 @@ output_state: "state.yaml" publish_dir: "$publish_dir" HERE -nextflow run . \ - -main-script target/nextflow/workflows/run_robustness_analysis_causal/main.nf \ - -profile docker \ - -with-trace \ - -c src/common/nextflow_helpers/labels_ci.config \ - -params-file ${param_file} +# nextflow run . \ +# -main-script target/nextflow/workflows/run_robustness_analysis_causal/main.nf \ +# -profile docker \ +# -with-trace \ +# -c src/common/nextflow_helpers/labels_ci.config \ +# -params-file ${param_file} diff --git a/scripts/upload_resources.sh b/scripts/upload_resources.sh old mode 100644 new mode 100755 diff --git a/src/api/comp_control_method.yaml b/src/api/comp_control_method.yaml index 852171ed4..f048d3ec5 100644 --- a/src/api/comp_control_method.yaml +++ b/src/api/comp_control_method.yaml @@ -27,11 +27,13 @@ functionality: type: file required: false direction: input - default: resources/prior/tf_all.csv + example: resources_test/prior/tf_all.csv test_resources: - type: python_script path: /src/common/component_tests/run_and_check_output.py - - path: /resources/grn-benchmark - dest: resources/grn-benchmark \ No newline at end of file + - path: /resources_test/grn-benchmark + dest: resources_test/grn-benchmark + - path: /resources_test/prior + dest: resources_test/prior \ No newline at end of file diff --git a/src/api/comp_method.yaml b/src/api/comp_method.yaml index 64ba87ad7..975a24727 100644 --- a/src/api/comp_method.yaml +++ b/src/api/comp_method.yaml @@ -12,12 +12,10 @@ functionality: __merge__: file_multiomics_rna_h5ad.yaml required: false direction: input - default: resources_test/grn-benchmark/multiomics_rna.h5ad - name: --multiomics_atac __merge__: file_multiomics_atac_h5ad.yaml required: false direction: input - default: resources_test/grn-benchmark/multiomics_atac.h5ad must_exist: false - name: --prediction __merge__: file_prediction.yaml @@ -34,7 +32,7 @@ functionality: default: 4 - name: --tf_all type: file - example: resources/prior/tf_all.csv + example: resources_test/prior/tf_all.csv required: false - name: --max_n_links type: integer @@ -43,5 +41,9 @@ functionality: test_resources: - type: python_script path: /src/common/component_tests/run_and_check_output.py - - path: /resources/grn-benchmark - dest: resources/grn-benchmark \ No newline at end of file + - path: /resources_test/grn-benchmark + dest: resources_test/grn-benchmark + - path: /resources_test/prior + dest: resources_test/prior + - path: /resources_test/supplementary + dest: resources_test/supplementary diff --git a/src/api/comp_method_r.yaml b/src/api/comp_method_r.yaml index 5efed25ef..55ce2593f 100644 --- a/src/api/comp_method_r.yaml +++ b/src/api/comp_method_r.yaml @@ -12,17 +12,16 @@ functionality: type: file required: false direction: input - default: resources_test/grn-benchmark/multiomics_rna.rds + example: resources_test/grn-benchmark/multiomics_rna.rds - name: --multiomics_atac_r type: file required: false direction: input - default: resources_test/grn-benchmark/multiomics_atac.rds + example: resources_test/grn-benchmark/multiomics_atac.rds - name: --prediction __merge__: file_prediction.yaml required: false direction: output - default: output/grn.csv - name: --temp_dir type: string direction: input @@ -34,5 +33,5 @@ functionality: test_resources: - type: python_script path: /src/common/component_tests/run_and_check_output.py - - path: /resources/grn-benchmark - dest: resources/grn-benchmark \ No newline at end of file + - path: /resources_test/grn-benchmark + dest: resources_test/grn-benchmark \ No newline at end of file diff --git a/src/api/comp_metric.yaml b/src/api/comp_metric.yaml index e88a1b6ca..6d6545998 100644 --- a/src/api/comp_metric.yaml +++ b/src/api/comp_metric.yaml @@ -12,32 +12,25 @@ functionality: __merge__: file_perturbation_h5ad.yaml required: false direction: input - default: resources/grn-benchmark/perturbation_data.h5ad - name: --prediction __merge__: file_prediction.yaml required: true direction: input - default: resources/grn_models/collectri.csv - name: --score __merge__: file_score.yaml required: false direction: output - default: out/score.h5ad - name: --reg_type type: string direction: input default: ridge description: name of regretion to use multiple: false - info: - test_default: ridge - name: --subsample type: integer direction: input default: -2 description: number of samples randomly drawn from perturbation data - info: - test_default: 200 - name: --max_workers type: integer direction: input @@ -50,7 +43,7 @@ functionality: - name: --tf_all type: file direction: input - default: 'resources/prior/tf_all.csv' + example: resources_test/prior/tf_all.csv - name: --apply_tf type: boolean required: false @@ -61,5 +54,7 @@ functionality: test_resources: - type: python_script path: /src/common/component_tests/run_and_check_output.py - - path: /resources/grn-benchmark - dest: resources/grn-benchmark \ No newline at end of file + - path: /resources_test/grn-benchmark + dest: resources_test/grn-benchmark + - path: /resources_test/prior + dest: resources_test/prior diff --git a/src/api/file_model.yaml b/src/api/file_model.yaml index dd46c7636..b7a9cf2c3 100644 --- a/src/api/file_model.yaml +++ b/src/api/file_model.yaml @@ -1,5 +1,5 @@ type: file -example: resources/grn-benchmark/model/ +example: resources_test/grn-benchmark/model/ info: label: Model summary: "Optional model output. If no value is passed, the model will be removed at the end of the run." diff --git a/src/api/file_multiomics_atac_h5ad.yaml b/src/api/file_multiomics_atac_h5ad.yaml index de891bf67..3acf53026 100644 --- a/src/api/file_multiomics_atac_h5ad.yaml +++ b/src/api/file_multiomics_atac_h5ad.yaml @@ -1,5 +1,5 @@ type: file -example: resources/grn-benchmark/multiomics_atac.h5ad +example: resources_test/grn-benchmark/multiomics_atac.h5ad info: label: multiomics atac summary: "Peak data for multiomics data." diff --git a/src/api/file_multiomics_rna_h5ad.yaml b/src/api/file_multiomics_rna_h5ad.yaml index 061601e15..8197a74bf 100644 --- a/src/api/file_multiomics_rna_h5ad.yaml +++ b/src/api/file_multiomics_rna_h5ad.yaml @@ -1,5 +1,5 @@ type: file -example: resources/grn-benchmark/multiomics_rna.h5ad +example: resources_test/grn-benchmark/multiomics_rna.h5ad info: label: multiomics rna summary: "RNA expression for multiomics data." diff --git a/src/api/file_perturbation_h5ad.yaml b/src/api/file_perturbation_h5ad.yaml index 878ba8ba9..a8445ad2e 100644 --- a/src/api/file_perturbation_h5ad.yaml +++ b/src/api/file_perturbation_h5ad.yaml @@ -1,5 +1,5 @@ type: file -example: resources/grn-benchmark/perturbation_data.h5ad +example: resources_test/grn-benchmark/perturbation_data.h5ad info: label: perturbation summary: "Perturbation dataset for benchmarking." diff --git a/src/api/file_prediction.yaml b/src/api/file_prediction.yaml index fbd8e9dad..df13d9b24 100644 --- a/src/api/file_prediction.yaml +++ b/src/api/file_prediction.yaml @@ -1,5 +1,5 @@ type: file -example: resources/grn-benchmark/grn_models/collectri.csv +example: resources_test/grn_models/collectri.csv info: label: GRN summary: "GRN prediction" diff --git a/src/api/file_score.yaml b/src/api/file_score.yaml index a6bf71a40..94b8b846d 100644 --- a/src/api/file_score.yaml +++ b/src/api/file_score.yaml @@ -1,5 +1,5 @@ type: file -example: resources/grn-benchmark/score.csv +example: resources_test/scores/score.h5ad info: label: Score summary: "File indicating the score of a metric." diff --git a/src/api/task_info.yaml b/src/api/task_info.yaml index 700384fc8..11bfe8d53 100644 --- a/src/api/task_info.yaml +++ b/src/api/task_info.yaml @@ -9,7 +9,9 @@ description: | Due to its flexible nature, the platform can incorporate various benchmark datasets and evaluation methods, using either prior knowledge or feature-based approaches. In the current version, due to the absence of standardized prior knowledge, we use a feature-based approach to benchmark GRNs. Our evaluation utilizes standardized datasets for GRN inference and evaluation, employing multiple regression analysis approaches to assess both accuracy and comprehensiveness. -summary: Benchmarking GRN inference methods +summary: | + Benchmarking GRN inference methods + The full documentation is hosted on [ReadTheDocs](https://openproblems-grn-task.readthedocs.io/en/latest/index.html). [![Documentation Status](https://readthedocs.org/projects/grn-inference-benchmarking/badge/?version=latest)](https://grn-inference-benchmarking.readthedocs.io/en/latest/?badge=latest) readme: | ## Installation @@ -19,9 +21,9 @@ readme: | ## Download resources ```bash - git clone git@github.com:openproblems-bio/task_grn_benchmark.git + git clone git@github.com:openproblems-bio/task_grn_inference.git - cd task_grn_benchmark + cd task_grn_inference # download resources scripts/download_resources.sh @@ -58,5 +60,18 @@ authors: info: github: rcannood orcid: "0000-0003-3641-729X" + - name: Antoine Passimier + roles: [ contributor ] + info: + github: AntoinePassemiers + - name: Christian Arnold + roles: [ contributor ] + info: + github: chrarnold + - name: Marco Stock + roles: [ contributor ] + info: + github: stkmrc + diff --git a/src/api/comp_test.yaml b/src/api/unit_test.yaml similarity index 100% rename from src/api/comp_test.yaml rename to src/api/unit_test.yaml diff --git a/src/common/component_tests/run_and_check_output.py b/src/common/component_tests/run_and_check_output.py index 2f51739a8..77b4544b7 100644 --- a/src/common/component_tests/run_and_check_output.py +++ b/src/common/component_tests/run_and_check_output.py @@ -109,10 +109,16 @@ def run_and_check_outputs(arguments, cmd): clean_name = re.sub("^--", "", arg["name"]) new_arg["clean_name"] = clean_name + # use example to find test resource file if arg["type"] == "file": + value = None if arg["direction"] == "input": - value = f"{meta['resources_dir']}/{arg['example'][0]}" + example = arg.get("example") + if example: + if isinstance(example, list): + example = example[0] + value = f"{meta['resources_dir']}/{example}" else: example = arg.get("example", ["example"])[0] ext_res = re.search(r"\.(\w+)$", example) @@ -124,6 +130,9 @@ def run_and_check_outputs(arguments, cmd): elif "test_default" in arg_info: new_arg["value"] = arg_info["test_default"] + if arg["required"]: + assert new_arg.get("value") is not None, f"Argument '{clean_name}' is required but has no value" + arguments.append(new_arg) fun_info = config["functionality"].get("info") or {} @@ -154,6 +163,7 @@ def run_and_check_outputs(arguments, cmd): value = arg["value"] if arg["multiple"] and isinstance(value, list): value = arg["multiple_sep"].join(value) - cmd.extend([arg["name"], str(value)]) + if value: + cmd.extend([arg["name"], str(value)]) run_and_check_outputs(argset_args, cmd) \ No newline at end of file diff --git a/src/common/create_task_readme/config.vsh.yaml b/src/common/create_task_readme/config.vsh.yaml new file mode 100644 index 000000000..d268974ce --- /dev/null +++ b/src/common/create_task_readme/config.vsh.yaml @@ -0,0 +1,69 @@ +functionality: + name: create_task_readme + namespace: common + description: | + Create a README for the task. + argument_groups: + - name: Inputs + arguments: + - type: string + name: --task + description: Which task the component will be added to. + example: denoising + required: false + - type: file + name: --task_dir + description: Path to the task directory. + default: src/tasks/${VIASH_PAR_TASK} + required: false + - type: file + name: --viash_yaml + description: | + Path to the project config file. Needed for knowing the relative location of a file to the project root. + default: "_viash.yaml" + - type: string + name: --github_url + description: | + URL to the GitHub repository. Needed for linking to the source code. + default: "https://github.com/openproblems-bio/openproblems-v2/tree/main/" + - name: Outputs + arguments: + - type: file + name: --output + direction: output + description: Path to the component directory. Suggested location is `src/tasks//README.md`. + default: src/tasks/${VIASH_PAR_TASK}/README.md + resources: + - type: r_script + path: script.R + - path: /src/common/helper_functions/read_and_merge_yaml.R + - path: /src/common/helper_functions/read_api_files.R + - path: /src/common/helper_functions/strip_margin.R + test_resources: + - type: r_script + path: test.R + - path: /src + dest: openproblems-v2/src + - path: /_viash.yaml + dest: openproblems-v2/_viash.yaml +platforms: + - type: docker + image: openproblems/base_r:1.0.0 + setup: + - type: r + packages: [dplyr, purrr, rlang, glue, yaml, fs, cli, igraph, rmarkdown, processx] + - type: apt + packages: [jq, curl] + - type: docker + # download and install quarto-*-linux-amd64.deb from latest release + run: | + release_info=$(curl -s https://api.github.com/repos/quarto-dev/quarto-cli/releases/latest) && \ + download_url=$(printf "%s" "$release_info" | jq -r '.assets[] | select(.name | test("quarto-.*-linux-amd64.deb")) | .browser_download_url') && \ + curl -sL "$download_url" -o /opt/quarto.deb && \ + dpkg -i /opt/quarto.deb && \ + rm /opt/quarto.deb + - type: native + - type: nextflow + directives: + label: [midtime, lowmem, lowcpu] + diff --git a/src/common/create_task_readme/render_all.sh b/src/common/create_task_readme/render_all.sh new file mode 100644 index 000000000..e44195c1e --- /dev/null +++ b/src/common/create_task_readme/render_all.sh @@ -0,0 +1,10 @@ +#!/bin/bash + +set -e + +TASK_IDS=`ls src/tasks` + +for task_id in $TASK_IDS; do + echo ">> Processing $task_id" + viash run src/common/create_task_readme/config.vsh.yaml -- --task $task_id +done \ No newline at end of file diff --git a/src/common/create_task_readme/script.R b/src/common/create_task_readme/script.R new file mode 100644 index 000000000..1d76320d0 --- /dev/null +++ b/src/common/create_task_readme/script.R @@ -0,0 +1,142 @@ +library(rlang, quietly = TRUE, warn.conflicts = FALSE) +library(purrr, quietly = TRUE, warn.conflicts = FALSE) +library(dplyr, quietly = TRUE, warn.conflicts = FALSE) + +## VIASH START +par <- list( + "task" = "grn_benchmark", + "task_dir" = "src", + "output" = "README.md", + "viash_yaml" = "_viash.yaml", + "github_url" = "https://github.com/openproblems-bio/task_grn_inference/tree/main/" +) +meta <- list( + "resources_dir" = "src/common/helper_functions", + "temp_dir" = "temp/" +) +## VIASH END + +if (is.null(par$task) && is.null(par$task_dir)) { + stop("Either 'task' or 'task_dir' must be provided") +} +if (is.null(par$viash_yaml)) { + stop("Argument 'viash_yaml' must be provided") +} +if (is.null(par$output)) { + stop("Argument 'output' must be provided") +} + +# import helper function +source(paste0(meta["resources_dir"], "/read_and_merge_yaml.R")) +source(paste0(meta["resources_dir"], "/strip_margin.R")) +source(paste0(meta["resources_dir"], "/read_api_files.R")) + +cat("Read task info\n") +task_api <- read_task_api(par[["task_dir"]]) + +# determine ordering +root <- .task_graph_get_root(task_api) + +r_graph <- tryCatch({ + render_task_graph(task_api, root) +}, error = function(e) { + stop("Failed to render task graph: ", e$message) +}) + +cat("Render API details\n") +order <- names(igraph::bfs(task_api$task_graph, root)$order) +r_details <- map_chr( + order, + function(file_name) { + tryCatch({ + if (file_name %in% names(task_api$comp_specs)) { + render_component(task_api$comp_specs[[file_name]]) + } else { + render_file(task_api$file_specs[[file_name]]) + } + }, error = function(e) { + stop("Failed to render API details: ", e$message) + }) + } +) + +cat("Render authors\n") +authors_str <- + if (nrow(task_api$authors) > 0) { + paste0( + "\n## Authors & contributors\n\n", + task_api$authors %>% knitr::kable() %>% paste(collapse = "\n"), + "\n" + ) + } else { + "" + } +readme_str <- + if (is.null(task_api$task_info$readme) || is.na(task_api$task_info$readme)) { + "" + } else { + paste0( + "\n## README\n\n", + task_api$task_info$readme, + "\n" + ) + } + +cat("Generate qmd content\n") +relative_path <- par[["task_dir"]] %>% + gsub(paste0(dirname(par[["viash_yaml"]]), "/*"), "", .) %>% + gsub("/*$", "", .) +source_url <- paste0(par[["github_url"]], relative_path) +qmd_content <- strip_margin(glue::glue(" + §--- + §title: \"{task_api$task_info$label}\" + §format: gfm + §--- + § + § + § + §{task_api$task_info$summary} + § + §Path to source: [`{relative_path}`]({source_url}) + § + §{readme_str} + § + §## Motivation + § + §{task_api$task_info$motivation} + § + §## Description + § + §{task_api$task_info$description} + §{authors_str} + §## API + § + §{r_graph} + § + §{paste(r_details, collapse = '\n\n')} + § + §"), symbol = "§") + +cat("Write README.qmd to file\n") +qmd_file <- tempfile( + pattern = "README_", + fileext = ".qmd", + tmpdir = meta$temp_dir +) + +if (!dir.exists(meta$temp_dir)) { + dir.create(meta$temp_dir, recursive = TRUE) +} +writeLines(qmd_content, qmd_file) + +cat("Render README.qmd to README.md\n") +out <- processx::run( + command = "quarto", + args = c("render", qmd_file, "--output", "-"), + echo = TRUE +) + +writeLines(out$stdout, par$output) diff --git a/src/common/create_task_readme/test.R b/src/common/create_task_readme/test.R new file mode 100644 index 000000000..9af1fe973 --- /dev/null +++ b/src/common/create_task_readme/test.R @@ -0,0 +1,30 @@ +requireNamespace("assertthat", quietly = TRUE) + +## VIASH START +## VIASH END + +opv2 <- paste0(meta$resources_dir, "/openproblems-v2") +output_path <- "output.md" + +cat(">> Running the script as test\n") +system(paste( + meta["executable"], + "--task", "label_projection", + "--output", output_path, + "--task_dir", paste0(opv2, "/src/tasks/label_projection"), + "--viash_yaml", paste0(opv2, "/_viash.yaml") +)) + +cat(">> Checking whether output files exist\n") +assertthat::assert_that(file.exists(output_path)) + +cat(">> Checking file contents\n") +lines <- readLines(output_path) +assertthat::assert_that(any(grepl("# Label projection", lines))) +assertthat::assert_that(any(grepl("# Description", lines))) +assertthat::assert_that(any(grepl("# Motivation", lines))) +assertthat::assert_that(any(grepl("# Authors", lines))) +assertthat::assert_that(any(grepl("flowchart LR", lines))) +assertthat::assert_that(any(grepl("# File format:", lines))) + +cat("All checks succeeded!\n") diff --git a/src/common/helper_functions/read_and_merge_yaml.R b/src/common/helper_functions/read_and_merge_yaml.R new file mode 100644 index 000000000..932d3feb9 --- /dev/null +++ b/src/common/helper_functions/read_and_merge_yaml.R @@ -0,0 +1,144 @@ +#' Read a Viash YAML +#' +#' If the YAML contains a "__merge__" key anywhere in the yaml, +#' the path specified in that YAML will be read and the two +#' lists will be merged. This is a recursive procedure. +#' +#' @param path Path to Viash YAML +read_and_merge_yaml <- function(path, project_path = .ram_find_project(path)) { + path <- normalizePath(path, mustWork = FALSE) + data <- tryCatch({ + suppressWarnings(yaml::read_yaml(path)) + }, error = function(e) { + stop("Could not read ", path, ". Error: ", e) + }) + .ram_process_merge(data, data, path, project_path) +} + +.ram_find_project <- function(path) { + path <- normalizePath(path, mustWork = FALSE) + check <- paste0(dirname(path), "/_viash.yaml") + if (file.exists(check)) { + dirname(check) + } else if (check == "//_viash.yaml") { + NULL + } else { + .ram_find_project(dirname(check)) + } +} + +.ram_is_named_list <- function(obj) { + is.null(obj) || (is.list(obj) && (length(obj) == 0 || !is.null(names(obj)))) +} + +.ram_process_merge <- function(data, root_data, path, project_path) { + if (.ram_is_named_list(data)) { + # check whether children have `__merge__` entries + processed_data <- lapply(data, function(dat) { + .ram_process_merge(dat, root_data, path, project_path) + }) + processed_data <- lapply(names(data), function(nm) { + dat <- data[[nm]] + .ram_process_merge(dat, root_data, path, project_path) + }) + names(processed_data) <- names(data) + + # if current element has __merge__, read list2 yaml and combine with data + new_data <- + if ("__merge__" %in% names(processed_data) && !.ram_is_named_list(processed_data$`__merge__`)) { + new_data_path <- .ram_resolve_path( + path = processed_data$`__merge__`, + project_path = project_path, + parent_path = dirname(path) + ) + read_and_merge_yaml(new_data_path, project_path) + } else if ("$ref" %in% names(processed_data) && !.ram_is_named_list(processed_data$`$ref`)) { + ref_parts <- strsplit(processed_data$`$ref`, "#")[[1]] + + # resolve the path in $ref + x <- + if (ref_parts[[1]] == "") { + root_data + } else { + new_data_path <- .ram_resolve_path( + path = ref_parts[[1]], + project_path = project_path, + parent_path = dirname(path) + ) + new_data_path <- normalizePath(new_data_path, mustWork = FALSE) + + # read in the new data + tryCatch({ + suppressWarnings(yaml::read_yaml(new_data_path)) + }, error = function(e) { + stop("Could not read ", new_data_path, ". Error: ", e) + }) + } + x_root <- x + + + # Navigate the path and retrieve the referenced data + ref_path_parts <- unlist(strsplit(ref_parts[[2]], "/")) + for (part in ref_path_parts) { + if (part == "") { + next + } else if (part %in% names(x)) { + x <- x[[part]] + } else { + stop("Could not find ", processed_data$`$ref`, " in ", path) + } + } + + # postprocess the new data + if (ref_parts[[1]] == "") { + x + } else { + .ram_process_merge(x, x_root, new_data_path, project_path) + } + } else { + list() + } + + .ram_deep_merge(new_data, processed_data) + } else if (is.list(data)) { + lapply(data, function(dat) { + .ram_process_merge(dat, root_data, path, project_path) + }) + } else { + data + } +} + +.ram_resolve_path <- function(path, project_path, parent_path) { + ifelse( + grepl("^/", path), + paste0(project_path, "/", path), + fs::path_abs(path, parent_path) + ) +} + +.ram_deep_merge <- function(list1, list2) { + if (.ram_is_named_list(list1) && .ram_is_named_list(list2)) { + # if list1 and list2 are objects, recursively merge + keys <- unique(c(names(list1), names(list2))) + out <- lapply(keys, function(key) { + if (key %in% names(list1)) { + if (key %in% names(list2)) { + .ram_deep_merge(list1[[key]], list2[[key]]) + } else { + list1[[key]] + } + } else { + list2[[key]] + } + }) + names(out) <- keys + out + } else if (is.list(list1) && is.list(list2)) { + # if list1 and list2 are both lists, append + c(list1, list2) + } else { + # else override list1 with list2 + list2 + } +} \ No newline at end of file diff --git a/src/common/helper_functions/read_and_merge_yaml.py b/src/common/helper_functions/read_and_merge_yaml.py new file mode 100644 index 000000000..b74995aed --- /dev/null +++ b/src/common/helper_functions/read_and_merge_yaml.py @@ -0,0 +1,52 @@ +def read_and_merge_yaml(path): + """Read a Viash YAML + + If the YAML contains a "__merge__" key anywhere in the yaml, + the path specified in that YAML will be read and the two + lists will be merged. This is a recursive procedure. + + Arguments: + path -- Path to the Viash YAML""" + from ruamel.yaml import YAML + + yaml = YAML(typ='safe', pure=True) + + with open(path, 'r') as stream: + data = yaml.load(stream) + return _ram_process_merge(data, path) + +def _ram_deep_merge(dict1, dict2): + if isinstance(dict1, dict) and isinstance(dict2, dict): + keys = set(list(dict1.keys()) + list(dict2.keys())) + out = {} + for key in keys: + if key in dict1: + if key in dict2: + out[key] = _ram_deep_merge(dict1[key], dict2[key]) + else: + out[key] = dict1[key] + else: + out[key] = dict2[key] + return out + elif isinstance(dict1, list) and isinstance(dict2, list): + return dict1 + dict2 + else: + return dict2 + +def _ram_process_merge(data, path): + import os + if isinstance(data, dict): + processed_data = {k: _ram_process_merge(v, path) for k, v in data.items()} + + if "__merge__" in processed_data: + new_data_path = os.path.join(os.path.dirname(path), processed_data["__merge__"]) + new_data = read_and_merge_yaml(new_data_path) + else: + new_data = {} + + return _ram_deep_merge(new_data, processed_data) + elif isinstance(data, list): + return [_ram_process_merge(dat, path) for dat in data] + else: + return data + diff --git a/src/common/helper_functions/read_anndata_partial.py b/src/common/helper_functions/read_anndata_partial.py new file mode 100644 index 000000000..efbea0592 --- /dev/null +++ b/src/common/helper_functions/read_anndata_partial.py @@ -0,0 +1,77 @@ +import warnings +from pathlib import Path +import anndata as ad +import h5py +from scipy.sparse import csr_matrix +from anndata.experimental import read_elem, sparse_dataset + + +def read_anndata( + file: str, + backed: bool = False, + **kwargs +) -> ad.AnnData: + """ + Read anndata file + :param file: path to anndata file in h5ad format + :param kwargs: AnnData parameter to group mapping + """ + assert Path(file).exists(), f'File not found: {file}' + + f = h5py.File(file, 'r') + kwargs = {x: x for x in f} if not kwargs else kwargs + if len(f.keys()) == 0: + return ad.AnnData() + # check if keys are available + for name, slot in kwargs.items(): + if slot not in f: + warnings.warn( + f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"' + ) + adata = read_partial(f, backed=backed, **kwargs) + if not backed: + f.close() + + return adata + + +def read_partial( + group: h5py.Group, + backed: bool = False, + force_sparse_types: [str, list] = None, + **kwargs +) -> ad.AnnData: + """ + Partially read h5py groups + :params group: file group + :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix + :params backed: read sparse matrix as sparse_dataset + :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file + :return: AnnData object + """ + if force_sparse_types is None: + force_sparse_types = [] + elif isinstance(force_sparse_types, str): + force_sparse_types = [force_sparse_types] + slots = {} + if backed: + print('Read as backed sparse matrix...') + + for slot_name, slot in kwargs.items(): + print(f'Read slot "{slot}", store as "{slot_name}"...') + if slot not in group: + warnings.warn(f'Slot "{slot}" not found, skip...') + slots[slot_name] = None + else: + elem = group[slot] + iospec = ad._io.specs.get_spec(elem) + if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed: + slots[slot_name] = sparse_dataset(elem) + elif iospec.encoding_type in force_sparse_types: + slots[slot_name] = csr_matrix(read_elem(elem)) + if backed: + slots[slot_name] = sparse_dataset(slots[slot_name]) + else: + slots[slot_name] = read_elem(elem) + return ad.AnnData(**slots) + diff --git a/src/common/helper_functions/read_api_files.R b/src/common/helper_functions/read_api_files.R new file mode 100644 index 000000000..1b829bf02 --- /dev/null +++ b/src/common/helper_functions/read_api_files.R @@ -0,0 +1,522 @@ + +anndata_struct_names <- c("obs", "var", "obsm", "obsp", "varm", "varp", "layers", "uns") + +read_file_spec <- function(path) { + spec <- read_and_merge_yaml(path) + out <- list( + info = read_file_info(spec, path) + ) + if (out$info$file_type == "h5ad" || "slots" %in% names(spec$info)) { + out$info$file_type <- "h5ad" + out$slots <- read_anndata_slots(spec, path) + } + if (out$info$file_type == "csv" || out$info$file_type == "tsv" || out$info$file_type == "parquet") { + out$columns <- read_tabular_columns(spec, path) + } + out +} +read_file_info <- function(spec, path) { + # TEMP: make it readable + spec$info$slots <- NULL + df <- list_as_tibble(spec) + if (list_contains_tibble(spec$info)) { + df <- dplyr::bind_cols(df, list_as_tibble(spec$info)) + } + df$file_name <- basename(path) %>% gsub("\\.yaml", "", .) + df$description <- df$description %||% NA_character_ %>% as.character + df$summary <- df$summary %||% NA_character_ %>% as.character + as_tibble(df) +} +read_anndata_slots <- function(spec, path) { + map_df( + anndata_struct_names, + function(struct_name, slot) { + slot <- spec$info$slots[[struct_name]] + if (is.null(slot)) return(NULL) + df <- map_df(slot, as.data.frame) + df$struct <- struct_name + df$file_name <- basename(path) %>% gsub("\\.yaml", "", .) + df$required <- df$required %||% TRUE %|% TRUE + df$multiple <- df$multiple %||% FALSE %|% FALSE + as_tibble(df) + } + ) +} +read_tabular_columns <- function(spec, path) { + map_df( + spec$info$columns, + function(column) { + df <- list_as_tibble(column) + df$file_name <- basename(path) %>% gsub("\\.yaml", "", .) + df$required <- df$required %||% TRUE %|% TRUE + df$multiple <- df$multiple %||% FALSE %|% FALSE + as_tibble(df) + } + ) +} + +format_file_format <- function(spec) { + if (spec$info$file_type == "h5ad") { + example <- spec$slots %>% + group_by(struct) %>% + summarise( + str = paste0(unique(struct), ": ", paste0("'", name, "'", collapse = ", ")) + ) %>% + arrange(match(struct, anndata_struct_names)) + + c(" AnnData object", paste0(" ", example$str)) + } else if (spec$info$file_type == "csv" || spec$info$file_type == "tsv" || spec$info$file_type == "parquet") { + example <- spec$columns %>% + summarise( + str = paste0("'", name, "'", collapse = ", ") + ) + + c(" Tabular data", paste0(" ", example$str)) + } else { + "" + } +} + +format_file_format_as_kable <- function(spec) { + if (spec$info$file_type == "h5ad") { + spec$slots %>% + mutate( + tag_str = pmap_chr(lst(required), function(required) { + out <- c() + if (!required) { + out <- c(out, "Optional") + } + if (length(out) == 0) { + "" + } else { + paste0("(_", paste(out, collapse = ", "), "_) ") + } + }) + ) %>% + transmute( + Slot = paste0("`", struct, "[\"", name, "\"]`"), + Type = paste0("`", type, "`"), + Description = paste0( + tag_str, + description %>% gsub(" *\n *", " ", .) %>% gsub("\\. *$", "", .), + "." + ) + ) %>% + knitr::kable() + } else if (spec$info$file_type == "csv" || spec$info$file_type == "tsv" || spec$info$file_type == "parquet") { + spec$columns %>% + mutate( + tag_str = pmap_chr(lst(required), function(required) { + out <- c() + if (!required) { + out <- c(out, "Optional") + } + if (length(out) == 0) { + "" + } else { + paste0("(_", paste(out, collapse = ", "), "_) ") + } + }) + ) %>% + transmute( + Column = paste0("`", name, "`"), + Type = paste0("`", type, "`"), + Description = paste0( + tag_str, + description %>% gsub(" *\n *", " ", .) %>% gsub("\\. *$", "", .), + "." + ) + ) %>% + knitr::kable() + } else { + "" + } +} + +list_contains_tibble <- function(li) { + is.list(li) && any(sapply(li, is.atomic)) +} + +list_as_tibble <- function(li) { + as.data.frame(li[sapply(li, is.atomic)], check.names = FALSE) +} + +read_comp_spec <- function(path) { + spec_yaml <- read_and_merge_yaml(path) + list( + info = read_comp_info(spec_yaml, path), + args = read_comp_args(spec_yaml, path) + ) +} + +read_comp_info <- function(spec_yaml, path) { + # TEMP: make it readable + spec_yaml$functionality$arguments <- NULL + spec_yaml$functionality$argument_groups <- NULL + + df <- list_as_tibble(spec_yaml$functionality) + if (nrow(df) == 0) { + df <- data.frame(a = 1)[, integer(0)] + } + if (list_contains_tibble(spec_yaml$functionality$info)) { + df <- dplyr::bind_cols(df, list_as_tibble(spec_yaml$functionality$info)) + } + if (list_contains_tibble(spec_yaml$functionality$info$type_info)) { + df <- dplyr::bind_cols(df, list_as_tibble(spec_yaml$functionality$info$type_info)) + } + df$file_name <- basename(path) %>% gsub("\\.yaml", "", .) + as_tibble(df) +} + +read_comp_args <- function(spec_yaml, path) { + arguments <- spec_yaml$functionality$arguments + for (arg_group in spec_yaml$functionality$argument_groups) { + arguments <- c(arguments, arg_group$arguments) + } + map_df(arguments, function(arg) { + df <- list_as_tibble(arg) + if (list_contains_tibble(arg$info)) { + df <- dplyr::bind_cols(df, list_as_tibble(arg$info)) + } + df$test_default <- NULL + df$file_name <- basename(path) %>% gsub("\\.yaml", "", .) + df$arg_name <- gsub("^-*", "", arg$name) + df$direction <- df$direction %||% "input" %|% "input" + df$parent <- df$`__merge__` %||% NA_character_ %>% basename() %>% gsub("\\.yaml", "", .) + df$required <- df$required %||% FALSE %|% FALSE + df$default <- df$default %||% NA_character_ %>% as.character + df$example <- df$example %||% NA_character_ %>% as.character + df$description <- df$description %||% NA_character_ %>% as.character + df$summary <- df$summary %||% NA_character_ %>% as.character + df + }) +} + +format_comp_args_as_tibble <- function(spec) { + if (nrow(spec$args) == 0) return("") + spec$args %>% + mutate( + tag_str = pmap_chr(lst(required, direction), function(required, direction) { + out <- c() + if (!required) { + out <- c(out, "Optional") + } + if (direction == "output") { + out <- c(out, "Output") + } + if (length(out) == 0) { + "" + } else { + paste0("(_", paste(out, collapse = ", "), "_) ") + } + }) + ) %>% + transmute( + Name = paste0("`--", arg_name, "`"), + Type = paste0("`", type, "`"), + Description = paste0( + tag_str, + (summary %|% description) %>% gsub(" *\n *", " ", .) %>% gsub("\\. *$", "", .), + ".", + ifelse(!is.na(default), paste0(" Default: `", default, "`."), "") + ) + ) %>% + knitr::kable() +} + +# path <- "src/datasets/api/comp_processor_knn.yaml" +render_component <- function(spec) { + if (is.character(spec)) { + spec <- read_comp_spec(spec) + } + + strip_margin(glue::glue(" + §## Component type: {spec$info$label} + § + §Path: [`src/{spec$info$namespace}`](https://github.com/openproblems-bio/openproblems/tree/main/src/{spec$info$namespace}) + § + §{spec$info$summary} + § + §Arguments: + § + §:::{{.small}} + §{paste(format_comp_args_as_tibble(spec), collapse = '\n')} + §::: + § + §"), symbol = "§") +} + +# path <- "src/datasets/api/file_pca.yaml" +render_file <- function(spec) { + if (is.character(spec)) { + spec <- read_file_spec(spec) + } + + if (!"label" %in% names(spec$info)) { + spec$info$label <- basename(spec$info$example) + } + + example <- + if (is.null(spec$info$example) || is.na(spec$info$example)) { + "" + } else { + paste0("Example file: `", spec$info$example, "`") + } + + description <- + if (is.null(spec$info$description) || is.na(spec$info$description)) { + "" + } else { + paste0("Description:\n\n", spec$info$description) + } + + strip_margin(glue::glue(" + §## File format: {spec$info$label} + § + §{spec$info$summary %||% ''} + § + §{example} + § + §{description} + § + §Format: + § + §:::{{.small}} + §{paste(format_file_format(spec), collapse = '\n')} + §::: + § + §Slot description: + § + §:::{{.small}} + §{paste(format_file_format_as_kable(spec), collapse = '\n')} + §::: + § + §"), symbol = "§") +} + +# path <- "src/tasks/denoising" +read_task_api <- function(path) { + cli::cli_inform("Looking for project root") + project_path <- .ram_find_project(path) + api_dir <- paste0(path, "/api") + + cli::cli_inform("Reading task info") + task_info_yaml <- list.files(api_dir, pattern = "task_info.ya?ml", full.names = TRUE) + assertthat::assert_that(length(task_info_yaml) == 1) + task_info <- + tryCatch({ + read_and_merge_yaml(task_info_yaml, project_path) + }, error = function(e) { + stop("Failed to read task info yaml: ", e$message) + }) + + cli::cli_inform("Reading task authors") + authors <- + tryCatch({ + map_df(task_info$authors, function(aut) { + aut$roles <- paste(aut$roles, collapse = ", ") + list_as_tibble(aut) + }) + }, error = function(e) { + stop("Failed to read task authors: ", e$message) + }) + + cli::cli_inform("Reading component yamls") + comp_yamls <- list.files(api_dir, pattern = "comp_.*\\.ya?ml", full.names = TRUE) + comps <- map(comp_yamls, function(yaml) { + tryCatch({ + read_comp_spec(yaml) + }, error = function(e) { + stop("Failed to read component yaml: ", e$message) + }) + }) + comp_info <- map_df(comps, "info") + comp_args <- map_df(comps, "args") + names(comps) <- basename(comp_yamls) %>% gsub("\\..*$", "", .) + + cli::cli_inform("Reading file yamls") + file_yamls <- .ram_resolve_path( + path = na.omit(unique(comp_args$`__merge__`)), + project_path = project_path, + parent_path = api_dir + ) + files <- map(file_yamls, function(yaml) { + tryCatch({ + read_file_spec(yaml) + }, error = function(e) { + stop("Failed to read file yaml: ", e$message) + }) + }) + names(files) <- basename(file_yamls) %>% gsub("\\..*$", "", .) + file_info <- map_df(files, "info") + file_slots <- map_df(files, "slots") + + cli::cli_inform("Generating task graph") + task_graph <- tryCatch({ + create_task_graph(file_info, comp_info, comp_args) + }, error = function(e) { + stop("Failed to create task graph: ", e$message) + }) + + list( + task_info = task_info, + file_specs = files, + file_info = file_info, + file_slots = file_slots, + comp_specs = comps, + comp_info = comp_info, + comp_args = comp_args, + task_graph = task_graph, + authors = authors + ) +} + + +create_task_graph <- function(file_info, comp_info, comp_args) { + clean_id <- function(id) { + gsub("graph", "graaf", id) + } + nodes <- + bind_rows( + file_info %>% + mutate(id = file_name, label = label, is_comp = FALSE), + comp_info %>% + mutate(id = file_name, label = label, is_comp = TRUE) + ) %>% + select(id, label, everything()) %>% + mutate(str = paste0( + " ", + clean_id(id), + ifelse(is_comp, "[/\"", "(\""), + label, + ifelse(is_comp, "\"/]", "\")") + )) + edges <- bind_rows( + comp_args %>% + filter(type == "file", direction == "input") %>% + mutate( + from = parent, + to = file_name, + arrow = "---" + ), + comp_args %>% + filter(type == "file", direction == "output") %>% + mutate( + from = file_name, + to = parent, + arrow = "-->" + ) + ) %>% + select(from, to, everything()) %>% + mutate(str = paste0(" ", clean_id(from), arrow, clean_id(to))) + + edges <- edges %>% filter(!is.na(from)) + + igraph::graph_from_data_frame( + edges, + vertices = nodes, + directed = TRUE + ) +} + +.task_graph_get_root <- function(task_api) { + root <- names(which(igraph::degree(task_api$task_graph, mode = "in") == 0)) + if (length(root) > 1) { + warning( + "There should probably only be one node with in-degree equal to 0.\n", + " Nodes with in-degree == 0: ", paste(root, collapse = ", ") + ) + } + root[[1]] +} + +render_task_graph <- function(task_api, root = .task_graph_get_root(task_api)) { + order <- names(igraph::bfs(task_api$task_graph, root)$order) + + vdf <- igraph::as_data_frame(task_api$task_graph, "vertices") %>% + arrange(match(name, order)) + edf <- igraph::as_data_frame(task_api$task_graph, "edges") %>% + arrange(match(from, order), match(to, order)) + + strip_margin(glue::glue(" + §```mermaid + §flowchart LR + §{paste(vdf$str, collapse = '\n')} + §{paste(edf$str, collapse = '\n')} + §``` + §"), symbol = "§") +} + + + +# Recursive function to process each property with indentation +.render_example_process_property <- function(prop, prop_name = NULL, indent_level = 0) { + if (is.null(prop_name)) { + prop_name <- "" + } + + out <- c() + + # define helper variables + indent_spaces <- strrep(" ", indent_level) + next_indent_spaces <- strrep(" ", indent_level + 2) + + # add comment if available + if ("description" %in% names(prop)) { + comment <- gsub("\n", paste0("\n", indent_spaces, "# "), stringr::str_trim(prop$description)) + out <- c(out, indent_spaces, "# ", comment, "\n") + } + + # add variable + out <- c(out, indent_spaces, prop_name, ": ") + + if (prop$type == "object" && "properties" %in% names(prop)) { + # Handle object with properties + prop_names <- setdiff(names(prop$properties), "additionalProperties") + sub_props <- unlist(lapply(prop_names, function(sub_prop_name) { + prop_out <- .render_example_process_property( + prop$properties[[sub_prop_name]], + sub_prop_name, + indent_level + 2 + ) + c(prop_out, "\n") + })) + c(out, "\n", sub_props[-length(sub_props)]) + } else if (prop$type == "array") { + if (is.list(prop$items) && "properties" %in% names(prop$items)) { + # Handle array of objects + array_items_yaml <- unlist(lapply(names(prop$items$properties), function(item_prop_name) { + prop_out <- .render_example_process_property( + prop$items$properties[[item_prop_name]], + item_prop_name, + indent_level + 4 + ) + c(prop_out, "\n") + })) + c(out, "\n", next_indent_spaces, "- ", array_items_yaml[-1]) + } else { + # Handle simple array + c(out, "[ ... ]") + } + } else { + c(out, "...") + } +} + +# Function for rendering an example yaml based on a JSON schema +render_example <- function(json_schema) { + if (!"properties" %in% names(json_schema)) { + return("") + } + text <- + unlist(lapply(names(json_schema$properties), function(prop_name) { + out <- .render_example_process_property( + json_schema$properties[[prop_name]], + prop_name, + 0 + ) + c(out, "\n") + })) + + paste(text, collapse = "") +} \ No newline at end of file diff --git a/src/common/helper_functions/setup_logger.py b/src/common/helper_functions/setup_logger.py new file mode 100644 index 000000000..ae71eb961 --- /dev/null +++ b/src/common/helper_functions/setup_logger.py @@ -0,0 +1,12 @@ +def setup_logger(): + import logging + from sys import stdout + + logger = logging.getLogger() + logger.setLevel(logging.INFO) + console_handler = logging.StreamHandler(stdout) + logFormatter = logging.Formatter("%(asctime)s %(levelname)-8s %(message)s") + console_handler.setFormatter(logFormatter) + logger.addHandler(console_handler) + + return logger \ No newline at end of file diff --git a/src/common/helper_functions/strip_margin.R b/src/common/helper_functions/strip_margin.R new file mode 100644 index 000000000..3830d58d7 --- /dev/null +++ b/src/common/helper_functions/strip_margin.R @@ -0,0 +1,3 @@ +strip_margin <- function(text, symbol = "\\|") { + gsub(paste0("(^|\n)[ \t]*", symbol), "\\1", text) +} \ No newline at end of file diff --git a/src/common/helper_functions/strip_margin.py b/src/common/helper_functions/strip_margin.py new file mode 100644 index 000000000..fbfb39dec --- /dev/null +++ b/src/common/helper_functions/strip_margin.py @@ -0,0 +1,3 @@ +def strip_margin(text: str) -> str: + import re + return re.sub("(^|\n)[ \t]*\|", "\\1", text) \ No newline at end of file diff --git a/src/common/helper_functions/subset_anndata.py b/src/common/helper_functions/subset_anndata.py new file mode 100644 index 000000000..80bd16087 --- /dev/null +++ b/src/common/helper_functions/subset_anndata.py @@ -0,0 +1,83 @@ +"""Helper functions related to subsetting AnnData objects based on the file format +specifications in the .config.vsh.yaml and slot mapping overrides.""" + +def read_config_slots_info(config_file, slot_mapping = {}): + """Read the .config.vsh.yaml to find out which output slots need to be copied to which output file. + + Arguments: + config_file -- Path to the .config.vsh.yaml file (required). + slot_mapping -- Which slots to retain. Must be a dictionary whose keys are the names + of the AnnData structs, and values is another dictionary with destination value + names as keys and source value names as values. + Example of slot_mapping: + ``` + slot_mapping = { + "layers": { + "counts": par["layer_counts"], + }, + "obs": { + "cell_type": par["obs_cell_type"], + "batch": par["obs_batch"], + } + } + ``` + """ + import yaml + import re + + # read output spec from yaml + with open(config_file, "r") as object_name: + config = yaml.safe_load(object_name) + + output_struct_slots = {} + + # fetch info on which slots should be copied to which file + for arg in config["functionality"]["arguments"]: + # argument is an output file with a slot specification + if arg["direction"] == "output" and arg.get("info", {}).get("slots"): + object_name = re.sub("--", "", arg["name"]) + + struct_slots = arg['info']['slots'] + out = {} + for (struct, slots) in struct_slots.items(): + out_struct = {} + for slot in slots: + # if slot_mapping[struct][slot['name']] exists, use that as the source slot name + # otherwise use slot['name'] + source_slot = slot_mapping.get(struct, {}).get(slot["name"], slot["name"]) + out_struct[slot["name"]] = source_slot + out[struct] = out_struct + + output_struct_slots[object_name] = out + + return output_struct_slots + +# create new anndata objects according to api spec +def subset_anndata(adata, slot_info): + """Create new anndata object according to slot info specifications. + + Arguments: + adata -- An AnnData object to subset (required) + slot_info -- Which slots to retain, typically one of the items in the output of read_config_slots_info. + Must be a dictionary whose keys are the names of the AnnData structs, and values is another + dictionary with destination value names as keys and source value names as values. + """ + import pandas as pd + import anndata as ad + + structs = ["layers", "obs", "var", "uns", "obsp", "obsm", "varp", "varm"] + kwargs = {} + + for struct in structs: + slot_mapping = slot_info.get(struct, {}) + data = {dest : getattr(adata, struct)[src] for (dest, src) in slot_mapping.items()} + if len(data) > 0: + if struct in ['obs', 'var']: + data = pd.concat(data, axis=1) + kwargs[struct] = data + elif struct in ['obs', 'var']: + # if no columns need to be copied, we still need an 'obs' and a 'var' + # to help determine the shape of the adata + kwargs[struct] = getattr(adata, struct).iloc[:,[]] + + return ad.AnnData(**kwargs) \ No newline at end of file diff --git a/src/exp_analysis/config.vsh.yaml b/src/exp_analysis/config.vsh.yaml index 49f9ba356..f6969e898 100644 --- a/src/exp_analysis/config.vsh.yaml +++ b/src/exp_analysis/config.vsh.yaml @@ -11,6 +11,7 @@ functionality: required: false direction: input default: resources/grn-benchmark/perturbation_data.h5ad + example: resources_test/grn-benchmark/perturbation_data.h5ad - name: --tf_gene_net type: file required: true @@ -23,7 +24,8 @@ functionality: type: file required: false direction: input - default: resources/supplements/annot_peak_database.csv + must_exist: false + example: resources/supplements/annot_peak_database.csv # - name: --annot_gene_database # type: file # required: false diff --git a/src/exp_analysis/script.py b/src/exp_analysis/script.py index 053efd573..966dc4dec 100644 --- a/src/exp_analysis/script.py +++ b/src/exp_analysis/script.py @@ -26,13 +26,14 @@ perturbation_data = ad.read_h5ad(par["perturbation_data"]) tf_gene_net = pd.read_csv(par["tf_gene_net"]) # peak_gene_net = pd.read_csv(par["peak_gene_net"]) -annot_peak_database = pd.read_csv(par["annot_peak_database"]) +# annot_peak_database = pd.read_csv(par["annot_peak_database"]) # hvgs = pd.read_csv(par["hvgs"]) # peak_gene_net['source'] = peak_gene_net['peak'] info_obj = Explanatory_analysis(net=tf_gene_net) print("Calculate basic stats") stats = info_obj.calculate_basic_stats() +print("Outputting stats to :", par['stats']) with open(par['stats'], 'w') as ff: json.dump(stats, ff) # print("Annotation of peaks") @@ -45,6 +46,8 @@ tf_gene_in = info_obj.tf_gene.in_deg tf_gene_out = info_obj.tf_gene.out_deg +print("Plotting tf-gene in degree, dir: ", par['tf_gene_indegee_fig']) +print("Plotting tf-gene out degree, dir: ", par['tf_gene_outdegee_fig']) fig, ax = info_obj.plot_cdf(tf_gene_in, title='In degree TF-gene') fig.savefig(par['tf_gene_indegee_fig'], dpi=300, bbox_inches='tight', format='png') fig, ax = info_obj.plot_cdf(tf_gene_out, title='Out degree TF-gene') diff --git a/src/methods/multi_omics/celloracle/config.vsh.yaml b/src/methods/multi_omics/celloracle/config.vsh.yaml index 048587560..bc976fe5f 100644 --- a/src/methods/multi_omics/celloracle/config.vsh.yaml +++ b/src/methods/multi_omics/celloracle/config.vsh.yaml @@ -14,10 +14,10 @@ functionality: type: file direction: output default: output/celloracle/base_grn.csv - - name: --links - type: file - direction: output - default: output/celloracle/links.celloracle.links + # - name: --links + # type: file + # direction: output + # default: output/celloracle/links.celloracle.links resources: - type: python_script path: script.py diff --git a/src/methods/multi_omics/celloracle/main.py b/src/methods/multi_omics/celloracle/main.py index d091cd9f7..90f1d7f16 100644 --- a/src/methods/multi_omics/celloracle/main.py +++ b/src/methods/multi_omics/celloracle/main.py @@ -55,6 +55,8 @@ def base_grn(par) -> None: def preprocess_rna(par) -> None: print("Processing rna data") adata = ad.read_h5ad(par['multiomics_rna']) + if True: #only one cluster + adata.obs['cell_type'] = 'one_cell_type' adata.layers["counts"] = adata.X.copy() sc.pp.normalize_per_cell(adata, key_n_counts='n_counts_all') n_top_genes = min([3000, adata.shape[1]]) diff --git a/src/methods/multi_omics/celloracle/script.py b/src/methods/multi_omics/celloracle/script.py index 95c8bc468..7e7b5629b 100644 --- a/src/methods/multi_omics/celloracle/script.py +++ b/src/methods/multi_omics/celloracle/script.py @@ -15,7 +15,7 @@ # meta = { # "resources_dir":'resources' # } - +par['links'] = f"{par['temp_dir']}/links.celloracle.links" sys.path.append(meta["resources_dir"]) from main import main diff --git a/src/methods/multi_omics/celloracle_ns/run.sh b/src/methods/multi_omics/celloracle_ns/run.sh index 2af30c8ff..bd5b3160a 100644 --- a/src/methods/multi_omics/celloracle_ns/run.sh +++ b/src/methods/multi_omics/celloracle_ns/run.sh @@ -33,7 +33,7 @@ if [ "$submit" = true ]; then ./tw-windows-x86_64.exe launch ` - https://github.com/openproblems-bio/task_grn_benchmark.git ` + https://github.com/openproblems-bio/task_grn_inference.git ` --revision build/main ` --pull-latest ` --main-script target/nextflow/workflows/run_grn_inference/main.nf ` @@ -51,7 +51,7 @@ fi ./tw-windows-x86_64.exe launch ` - https://github.com/openproblems-bio/task_grn_benchmark.git ` + https://github.com/openproblems-bio/task_grn_inference.git ` --revision build/main ` --pull-latest ` --main-script target/nextflow/workflows/grn_inference_celloracle/main.nf ` diff --git a/src/methods/multi_omics/granie_ns/run.sh b/src/methods/multi_omics/granie_ns/run.sh index c4abb32db..9ee2b676f 100644 --- a/src/methods/multi_omics/granie_ns/run.sh +++ b/src/methods/multi_omics/granie_ns/run.sh @@ -25,7 +25,7 @@ HERE ./tw-windows-x86_64.exe launch ` - https://github.com/openproblems-bio/task_grn_benchmark.git ` + https://github.com/openproblems-bio/task_grn_inference.git ` --revision build/main ` --pull-latest ` --main-script target/nextflow/workflows/grn_inference_scglue/main.nf ` diff --git a/src/methods/multi_omics/scenicplus_ns/run.sh b/src/methods/multi_omics/scenicplus_ns/run.sh index 5edebabce..eef0d8c0a 100644 --- a/src/methods/multi_omics/scenicplus_ns/run.sh +++ b/src/methods/multi_omics/scenicplus_ns/run.sh @@ -32,7 +32,7 @@ nextflow run . \ # ./tw-windows-x86_64.exe launch ` -# https://github.com/openproblems-bio/task_grn_benchmark.git ` +# https://github.com/openproblems-bio/task_grn_inference.git ` # --revision build/main ` # --pull-latest ` # --main-script target/nextflow/workflows/grn_inference_scenicplus/main.nf ` diff --git a/src/methods/multi_omics/scglue/config.vsh.yaml b/src/methods/multi_omics/scglue/config.vsh.yaml index cfc0337a4..8b9d3f33e 100644 --- a/src/methods/multi_omics/scglue/config.vsh.yaml +++ b/src/methods/multi_omics/scglue/config.vsh.yaml @@ -13,12 +13,13 @@ functionality: arguments: - name: --annotation_file type: file - default: resources/supplementary/gencode.v45.annotation.gtf.gz + # default: resources/supplementary/gencode.v45.annotation.gtf.gz + example: resources_test/supplementary/gencode.v45.annotation.gtf.gz required: false direction: input - name: --motif_file type: file - default: resources/supplementary/JASPAR2022-hg38.bed.gz + example: resources_test/supplementary/JASPAR2022-hg38.bed.gz required: false direction: input @@ -34,7 +35,7 @@ platforms: image: nvcr.io/nvidia/pytorch:24.06-py3 setup: - type: python - packages: [ scglue==0.3.2, pyscenic==0.12.1, numpy==1.23.4, scanpy, networkx, pyarrow, cytoolz, scikit-misc, cuda-python] + packages: [ scglue==0.3.2, pyscenic==0.12.1, numpy==1.23.4, scanpy, networkx, pyarrow, cytoolz, scikit-misc, cuda-python, faiss-gpu] - type: apt packages: [bedtools] diff --git a/src/methods/multi_omics/scglue_ns/run.sh b/src/methods/multi_omics/scglue_ns/run.sh index 4371b8197..6677eb0d3 100644 --- a/src/methods/multi_omics/scglue_ns/run.sh +++ b/src/methods/multi_omics/scglue_ns/run.sh @@ -24,7 +24,7 @@ HERE # ./tw-windows-x86_64.exe launch ` -# https://github.com/openproblems-bio/task_grn_benchmark.git ` +# https://github.com/openproblems-bio/task_grn_inference.git ` # --revision build/main ` # --pull-latest ` # --main-script target/nextflow/workflows/grn_inference_scglue/main.nf ` diff --git a/src/methods/single_omics/scsgl/config.vsh.yaml b/src/methods/single_omics/scsgl/config.vsh.yaml index d781caff5..a69351cc5 100644 --- a/src/methods/single_omics/scsgl/config.vsh.yaml +++ b/src/methods/single_omics/scsgl/config.vsh.yaml @@ -24,7 +24,7 @@ platforms: apt-get install -y r-base time && \ Rscript -e "install.packages('pcaPP')" - type: python - packages: [ anndata, numba==0.53.1, scipy==1.6.3, pandas==1.2.4, rpy2==3.4.4, numpy==1.20.2, scikit-learn==0.24.1 ] + packages: [ anndata, numba==0.53.1, scipy==1.6.3, pandas==1.2.4, rpy2==3.4.4, numpy==1.20.2, scikit-learn==0.24.1, pyyaml ] - type: native - type: nextflow directives: diff --git a/src/process_data/explanatory_analysis/hvgs/config.novsh.yaml b/src/process_data/explanatory_analysis/hvgs/config.novsh.yaml index c355cf54f..684dafa0c 100644 --- a/src/process_data/explanatory_analysis/hvgs/config.novsh.yaml +++ b/src/process_data/explanatory_analysis/hvgs/config.novsh.yaml @@ -32,7 +32,7 @@ functionality: platforms: - type: docker - image: ghcr.io/openproblems-bio/base_images/r:1.1.0 + image: openproblems/base_r:1.0.0 setup: - type: r bioc: [scry] diff --git a/src/process_data/perturbation/batch_correction_evaluation/config.vsh.yaml b/src/process_data/perturbation/batch_correction_evaluation/config.vsh.yaml index 60bad31dc..1f50f5ad9 100644 --- a/src/process_data/perturbation/batch_correction_evaluation/config.vsh.yaml +++ b/src/process_data/perturbation/batch_correction_evaluation/config.vsh.yaml @@ -24,7 +24,7 @@ functionality: platforms: - type: docker # image: ghcr.io/openproblems-bio/base_python:1.0.4 - image: ghcr.io/openproblems-bio/base_images/r:1.1.0 + image: openproblems/base_r:1.0.0 setup: - type: python diff --git a/src/process_data/perturbation/batch_correction_scgen/config.vsh.yaml b/src/process_data/perturbation/batch_correction_scgen/config.vsh.yaml index 5aecacedb..f61b32495 100644 --- a/src/process_data/perturbation/batch_correction_scgen/config.vsh.yaml +++ b/src/process_data/perturbation/batch_correction_scgen/config.vsh.yaml @@ -1,4 +1,4 @@ -__merge__: ../../../api/comp_test.yaml +__merge__: ../../../api/unit_test.yaml functionality: name: batch_correction_scgen diff --git a/src/process_data/perturbation/batch_correction_seurat/config.vsh.yaml b/src/process_data/perturbation/batch_correction_seurat/config.vsh.yaml index cd4903b8b..61166e2eb 100644 --- a/src/process_data/perturbation/batch_correction_seurat/config.vsh.yaml +++ b/src/process_data/perturbation/batch_correction_seurat/config.vsh.yaml @@ -1,4 +1,4 @@ -__merge__: ../../../api/comp_test.yaml +__merge__: ../../../api/unit_test.yaml functionality: name: batch_correction_seurat @@ -30,7 +30,7 @@ functionality: required: true required: true direction: input - default: resources/grn-benchmark/perturbation_data.h5ad + example: resources_test/grn-benchmark/perturbation_data.h5ad - name: --perturbation_data_bc type: file info: @@ -39,7 +39,7 @@ functionality: __merge__: ../../../api/file_perturbation_h5ad.yaml required: false direction: output - default: resources/grn-benchmark/perturbation_data.h5ad + example: resources_test/grn-benchmark/perturbation_data.h5ad resources: diff --git a/src/process_data/perturbation/normalization/config.vsh.yaml b/src/process_data/perturbation/normalization/config.vsh.yaml index 297456d9e..9d45aaf66 100644 --- a/src/process_data/perturbation/normalization/config.vsh.yaml +++ b/src/process_data/perturbation/normalization/config.vsh.yaml @@ -1,4 +1,4 @@ -__merge__: ../../../api/comp_test.yaml +__merge__: ../../../api/unit_test.yaml functionality: name: normalization diff --git a/src/process_data/perturbation/sc_counts/config.vsh.yaml b/src/process_data/perturbation/sc_counts/config.vsh.yaml index b51ffd3f9..0198f38a2 100644 --- a/src/process_data/perturbation/sc_counts/config.vsh.yaml +++ b/src/process_data/perturbation/sc_counts/config.vsh.yaml @@ -1,4 +1,4 @@ -__merge__: ../../../api/comp_test.yaml +__merge__: ../../../api/unit_test.yaml functionality: name: sc_counts diff --git a/src/robustness_analysis/causal/config.vsh.yaml b/src/robustness_analysis/causal/config.vsh.yaml index 7537b320b..00d3d23b4 100644 --- a/src/robustness_analysis/causal/config.vsh.yaml +++ b/src/robustness_analysis/causal/config.vsh.yaml @@ -9,16 +9,24 @@ functionality: type: file direction: input example: resources_test/grn-benchmark/multiomics_rna.h5ad + default: resources/grn-benchmark/multiomics_rna.h5ad - name: --tf_all type: file direction: input example: resources_test/prior/tf_all.csv + default: resources/prior/tf_all.csv - name: --prediction type: file direction: output example: resources_test/grn_models/collectri.csv + default: output/prediction.csv + + - name: --causal + type: boolean + direction: input + default: false resources: - type: python_script diff --git a/src/robustness_analysis/causal/script.py b/src/robustness_analysis/causal/script.py index 006e9bb31..08ac633e0 100644 --- a/src/robustness_analysis/causal/script.py +++ b/src/robustness_analysis/causal/script.py @@ -39,11 +39,14 @@ def create_corr_net(X: np.ndarray, groups: np.ndarray): print('Create corr net') net = create_corr_net(multiomics_rna.X, groups) net = pd.DataFrame(net, index=gene_names, columns=gene_names) - -net_corr = net.sample(len(tf_all), axis=1) +if par['causal']: + net_corr = net[tf_all] +else: + net_corr = net.sample(len(tf_all), axis=1) net_corr = net_corr.reset_index().melt(id_vars='index', var_name='source', value_name='weight') net_corr.rename(columns={'index': 'target'}, inplace=True) -print('Output noised GRN') + +print('Output GRN') net_corr.to_csv(par['prediction']) diff --git a/src/workflows/process_multiomics/config.vsh.yaml b/src/workflows/process_multiomics/config.vsh.yaml index 6c0265c43..d93858fb8 100644 --- a/src/workflows/process_multiomics/config.vsh.yaml +++ b/src/workflows/process_multiomics/config.vsh.yaml @@ -1,5 +1,5 @@ -__merge__: ../../api/comp_test.yaml +__merge__: ../../api/unit_test.yaml functionality: name: process_multiomics diff --git a/src/workflows/process_perturbation/config.vsh.yaml b/src/workflows/process_perturbation/config.vsh.yaml index d7e52ad42..b61b72668 100644 --- a/src/workflows/process_perturbation/config.vsh.yaml +++ b/src/workflows/process_perturbation/config.vsh.yaml @@ -1,5 +1,5 @@ -__merge__: ../../api/comp_test.yaml +__merge__: ../../api/unit_test.yaml functionality: name: process_perturbation diff --git a/src/workflows/run_benchmark_single_omics/config.vsh.yaml b/src/workflows/run_benchmark_single_omics/config.vsh.yaml index bfc0b1709..435341415 100644 --- a/src/workflows/run_benchmark_single_omics/config.vsh.yaml +++ b/src/workflows/run_benchmark_single_omics/config.vsh.yaml @@ -71,7 +71,7 @@ functionality: path: ../../api/task_info.yaml dependencies: - name: common/extract_metadata - repository: openproblemsv2 + repository: openproblems - name: metrics/regression_2 - name: metrics/regression_1 - name: control_methods/positive_control @@ -85,10 +85,10 @@ functionality: - name: grn_methods/scsgl - name: grn_methods/tigress repositories: - - name: openproblemsv2 + - name: openproblems type: github - repo: openproblems-bio/openproblems-v2 - tag: main_build + repo: openproblems-bio/openproblems + tag: v2.0.0 platforms: - type: nextflow directives: diff --git a/src/workflows/run_grn_evaluation/config.vsh.yaml b/src/workflows/run_grn_evaluation/config.vsh.yaml index 1c913823a..0dd99f945 100644 --- a/src/workflows/run_grn_evaluation/config.vsh.yaml +++ b/src/workflows/run_grn_evaluation/config.vsh.yaml @@ -81,16 +81,16 @@ functionality: path: ../../api/task_info.yaml dependencies: - name: common/extract_metadata - repository: openproblemsv2 + repository: openproblems - name: metrics/regression_2 - name: metrics/regression_1 - name: control_methods/positive_control - name: control_methods/negative_control repositories: - - name: openproblemsv2 + - name: openproblems type: github - repo: openproblems-bio/openproblems-v2 - tag: main_build + repo: openproblems-bio/openproblems + tag: v2.0.0 platforms: - type: nextflow directives: diff --git a/src/workflows/run_robustness_analysis/config.vsh.yaml b/src/workflows/run_robustness_analysis/config.vsh.yaml index a6e88305e..90ac3583c 100644 --- a/src/workflows/run_robustness_analysis/config.vsh.yaml +++ b/src/workflows/run_robustness_analysis/config.vsh.yaml @@ -77,15 +77,15 @@ functionality: path: ../../api/task_info.yaml dependencies: - name: common/extract_metadata - repository: openproblemsv2 + repository: openproblems - name: metrics/regression_1 - name: metrics/regression_2 - name: robustness_analysis/noise_grn repositories: - - name: openproblemsv2 + - name: openproblems type: github - repo: openproblems-bio/openproblems-v2 - tag: main_build + repo: openproblems-bio/openproblems + tag: v2.0.0 platforms: - type: nextflow directives: diff --git a/src/workflows/run_robustness_analysis_causal/config.vsh.yaml b/src/workflows/run_robustness_analysis_causal/config.vsh.yaml index d8eed5c66..e72991fe1 100644 --- a/src/workflows/run_robustness_analysis_causal/config.vsh.yaml +++ b/src/workflows/run_robustness_analysis_causal/config.vsh.yaml @@ -65,15 +65,15 @@ functionality: path: ../../api/task_info.yaml dependencies: - name: common/extract_metadata - repository: openproblemsv2 + repository: openproblems - name: metrics/regression_1 - name: metrics/regression_2 - name: robustness_analysis/causal_grn repositories: - - name: openproblemsv2 + - name: openproblems type: github - repo: openproblems-bio/openproblems-v2 - tag: main_build + repo: openproblems-bio/openproblems + tag: v2.0.0 platforms: - type: nextflow directives: diff --git a/src/workflows/run_robustness_analysis_causal/main.nf b/src/workflows/run_robustness_analysis_causal/main.nf index 8ed336618..6c32c942a 100644 --- a/src/workflows/run_robustness_analysis_causal/main.nf +++ b/src/workflows/run_robustness_analysis_causal/main.nf @@ -15,7 +15,7 @@ workflow run_wf { // construct list of metrics metrics = [ regression_1, - regression_1 + regression_2 ] /***************************