Skip to content

Commit

Permalink
Add dataset_organism to training dataset files
Browse files Browse the repository at this point in the history
  • Loading branch information
lazappi committed Nov 27, 2024
1 parent 8db6af0 commit a46394e
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 2 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@

* Add a base method API schema (PR #24)

* Add `dataset_organism` to training input files (PR #24)

## BUG FIXES

* Update the nextflow workflow dependencies (PR #17).
Expand Down
4 changes: 4 additions & 0 deletions src/api/file_train.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,7 @@ info:
name: dataset_id
description: "A unique identifier for the dataset"
required: true
- name: dataset_organism
type: string
description: The organism of the sample in the dataset.
required: false
5 changes: 3 additions & 2 deletions src/data_processors/process_dataset/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
obs_filt = np.ones(dtype=np.bool_, shape=adata_output.n_obs)
obs_index = np.random.choice(np.where(obs_filt)[0], par["n_obs_limit"], replace=False)
adata_output = adata_output[obs_index].copy()

# remove all layers except for counts
print(">> Remove all layers except for counts", flush=True)
for key in list(adata_output.layers.keys()):
Expand All @@ -70,11 +70,12 @@

# copy adata to train_set, test_set
print(">> Create AnnData output objects", flush=True)
train_uns_keys = ["dataset_id", "dataset_organism"]
output_train = ad.AnnData(
layers={"counts": X_train},
obs=adata_output.obs[[]],
var=adata_output.var[[]],
uns={"dataset_id": adata_output.uns["dataset_id"]}
uns={key: adata_output.uns[key] for key in train_uns_keys}
)
test_uns_keys = ["dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]
output_test = ad.AnnData(
Expand Down

0 comments on commit a46394e

Please sign in to comment.