Bgc ocean new tool (#148)

* Add ingestor tool * Add test data files * fix lint rm directory * fix test * fix test * Rm useless test data file * Improve id, name, description help * Add new tool on bgc ocean * Delete tools/ocean/bgc_harmonizer.xml * Delete tools/ocean/test-data/D6901758_001.nc * Delete tools/ocean/test-data/D6901758_002.nc * Delete tools/ocean/test-data/D6901758_003.nc * Delete tools/ocean/test-data/D6901758_004.nc * Delete tools/ocean/test-data/D6901758_005.nc * Create .shed.yml * rm useless if
galaxyecology · Dec 5, 2024 · 124fc52 · 124fc52
1 parent fa49586
commit 124fc52
Show file tree

Hide file tree

Showing 7 changed files with 141 additions and 0 deletions.
diff --git a/tools/bgc_ocean/.shed.yml b/tools/bgc_ocean/.shed.yml
@@ -0,0 +1,15 @@
+categories:
+    - Ecology
+owner: ecology
+remote_repository_url: https://github.com/galaxyecology/tools-ecology/tree/master/tools/bgc_ocean
+homepage_url: https://github.com/Marie59/FE-ft-ESG/tree/main/ocean
+long_description: |
+  Process in-situ and biogechemical oceanographic Argo or Glider data for the Earth System
+type: unrestricted
+auto_tool_repositories:
+  name_template: "{{ tool_id }}"
+  description_template: "Wrapper for ocean biogechemical data tool: {{ tool_name }}."
+suite:
+  name: "bgc_ocean_suite"
+  description: "A suite of tools for the ocean biogeochemical compartment of the Earth System"
+  type: unrestricted
diff --git a/tools/bgc_ocean/bgc_harmonizer.xml b/tools/bgc_ocean/bgc_harmonizer.xml
@@ -0,0 +1,126 @@
+<tool id="harmonize_insitu_to_netcdf" name="QCV harmonizer" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="22.01" license="MIT">
+    <description>and aggregator of insitu marine physical and biogeochemical data</description>
+    <macros>
+        <token name="@TOOL_VERSION@">1.1</token>
+        <token name="@VERSION_SUFFIX@">0</token>
+    </macros>
+    <requirements>
+        <container type="docker">pokapok/qcv_ingester:@TOOL_VERSION@</container>
+    </requirements>
+    <command detect_errors="exit_code"><![CDATA[
+        export HOME=\$PWD &&
+
+        #for $i, $infile in enumerate($infiles):
+            cp '$infile' '/runtime/data/original_data/work/ga_la_xy/${infile.element_identifier}' &&
+        #end for
+
+        /app/launchers/start-app.sh GALAXY &&
+        
+        cp /runtime/data/harmonized_data/work/ga_la_xy_harm_agg.nc '$output_net'
+    ]]></command>
+    <inputs>
+        <param name="infiles" type="data" format="netcdf" multiple="true" label="Input the netcdf data files" help="This files can netcdf raw Argo or Gliders datafiles following CMEMS convention."/>
+    </inputs>
+    <outputs>
+        <data name="output_net" format="netcdf" from_work_dir="/runtime/data/harmonized_data/work/*.nc" label="${tool.name} netcdf data" />
+    </outputs>
+    <tests>
+        <test expect_num_outputs="1">
+            <param name="infiles" value="D6901758_001.nc,D6901758_002.nc,D6901758_003.nc,D6901758_004.nc,D6901758_005.nc"/>
+            <output name="output_net">
+                <assert_contents>
+            	    <has_size value="427535" delta="0"/>
+            	</assert_contents>
+            </output>
+        </test>
+    </tests>
+    <help><![CDATA[
+
+.. class:: infomark
+
+**What it does**
+General presentation
+
+The cerb_harmonizer tool aggregates and harmonizes marine in-situ data following the needs of the UseCase 2.1-BCG - Fair-Ease. It converts files of individual or already aggregated data profiles into concatened single file with harmonized vocabullary needed for the project.
+Profiles are concatenated along the time dimension in the order given by the lising : if BGC data preceed CORE profiles, all BGC data will preceed CORE data. Use cycle_number variable to associate them again.
+Vocabulary translations are below. On the left, the writing of each variable in the output file, on the right, all possible translations found over data exploration.
+
+Coordinates and variables :
+
+    "lon" : ["longitude", "LONGITUDE", "LON", "lon"],
+    "lat" : ["latitude", "LATITUDE", "LAT", "lat"],
+    "time" : ["date", "time", "TIME", "JULD"],
+    "time_qc" : ["date_qc", "time_qc", "TIME_QC", "JULD_QC"],
+    "depth" : ["DEPTH", "depth"],
+    "cycle_number" : ["CYCLE_NUMBER"],
+    "ref_time" : ["REFERENCE_DATE_TIME"],
+    "pos_qc" : ["POSITION_QC",]
+
+--
+
+    "temperature" : ["TEMP", "TEMPERATURE"],
+    "salinity" : ["PSAL", "PRACTICAL_SALINITY"],
+    "oxygen" : ["DOXY"],
+    "pressure" : ["PRES"],
+    "chlorophylle" : ["CHLA"],
+    "nitrate" : ["NO3", "NITRATE", "n_an"],
+    "bbp700" : ["BBP700"],
+
+All variables are tagged with a suffix to indicate the state of the data (_raw, _dmadjusted, _rtadjusted)
+
+--
+
+Variables extracted from meta files are :
+
+    LAUNCH_DATE
+    PLATFORM_TYPE
+    PI_NAME meta variables are stored as metadata
+
+--
+
+Units :
+
+    "degree_east" : ["degree_east"],
+    "degree_north" : ["degree_north"],
+    "degree_celsius" : ["degree_celsius", "degree_Celsius"], # temperature
+    "psu" : ["psu", "practical_salinity_unit", "PSU"], # salinity
+    "micromol/l" : ["micromole.l-1", "micromole_per_liter", "μmol.l-1", "μmol/l", "μmol/L", "μmol.L-1"], # concentration liters
+    "mg/m3" : ["mg/m3",],
+    "micromole/kg" : ["micromole/kg",],
+    "m-1" : ["m-1"],
+    "decibar" : ["decibar", "dbar"], # pressure
+
+Arbitrarily,
+
+    dates are written following : "seconds since 1950-01-01T00:00:00 in julian calendar"
+    longitude is set between : -180° and 180°
+    latitude is set between : -90° and 90°
+
+WARNING : This application works only platform by platform. For example, it is possible to aggregate and harmonize a whole argo trajectory but one at a time. If two argo trajectories are needed to process, this tool needs to be run 2 times.
+
+**Input**
+
+a list of files in a txt file named cerb_listing.txt copntaining only filenames without paths. The tool will find the files automatically. A listing example here :
+
+BD6901580_003.nc BD6901580_004.nc BD6901580_005.nc BD6901580_006.nc D6901580_003.nc D6901580_004.nc D6901580_005.nc D6901580_006.nc 6901580_meta.nc
+
+paths of where are the data (volumes) containing configurations, listings and data. Paths are :
+
+    config path : where your textfile containing the list of files names is : it contains the listing cerb_listing.txt
+    data_path : highest folder including all the files to harmonize written in the textfile listing
+
+**Output**
+
+A concatenated and harmonized netcdf file
+
+    ]]></help>
+    <citations>
+        <citation type="bibtex">
+            @Manual{,
+            title = {QCV Ingester},
+            author = {Pokapok},
+            year = {2024},
+            note = {https://gitlab.com/pokapok-projects/PKP8-OGI-BGC/qcv_ingester}
+        </citation>    
+    </citations>
+</tool>
diff --git a/tools/bgc_ocean/test-data/D6901758_001.nc b/tools/bgc_ocean/test-data/D6901758_001.nc
diff --git a/tools/bgc_ocean/test-data/D6901758_002.nc b/tools/bgc_ocean/test-data/D6901758_002.nc
diff --git a/tools/bgc_ocean/test-data/D6901758_003.nc b/tools/bgc_ocean/test-data/D6901758_003.nc
diff --git a/tools/bgc_ocean/test-data/D6901758_004.nc b/tools/bgc_ocean/test-data/D6901758_004.nc
diff --git a/tools/bgc_ocean/test-data/D6901758_005.nc b/tools/bgc_ocean/test-data/D6901758_005.nc