Update to use deletions.csv file internally (#161)

* refactor: Update to use deletions.csv file internally Changed code routine to create 'deletions.csv' on each run of the solve routine. This is created from the outliers found after specifying the residual limits. This file is used to filter out outliers within the dataset. Additionally, the program will now by default extract the residuals.csv file and compute the distance from center outliers. * docs: Update README content
seafloor-geodesy · Oct 4, 2023 · 0854d95 · 0854d95
1 parent 3c9e1a6
commit 0854d95
Show file tree

Hide file tree

Showing 7 changed files with 157 additions and 128 deletions.
diff --git a/README.md b/README.md
@@ -32,20 +32,23 @@ Once the software is installed, you should be able to get to the GNATSS Command
 using the command `gnatss`. For example: `gnatss --help`, will get you to the main GNSS-A Processing in Python help page.
 
 ```console
-Usage: gnatss [OPTIONS] COMMAND [ARGS]...
-
-  GNSS-A Processing in Python
-
-Options:
-  --install-completion [bash|zsh|fish|powershell|pwsh]
-                                  Install completion for the specified shell.
-  --show-completion [bash|zsh|fish|powershell|pwsh]
-                                  Show completion for the specified shell, to
-                                  copy it or customize the installation.
-  --help                          Show this message and exit.
-
-Commands:
-  run  Runs the full pre-processing routine for GNSS-A
+ Usage: gnatss [OPTIONS] COMMAND [ARGS]...
+
+ GNSS-A Processing in Python
+
+╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────╮
+│ --install-completion        [bash|zsh|fish|powershell|pwsh]  Install completion for the specified │
+│                                                              shell.                               │
+│                                                              [default: None]                      │
+│ --show-completion           [bash|zsh|fish|powershell|pwsh]  Show completion for the specified    │
+│                                                              shell, to copy it or customize the   │
+│                                                              installation.                        │
+│                                                              [default: None]                      │
+│ --help                                                       Show this message and exit.          │
+╰───────────────────────────────────────────────────────────────────────────────────────────────────╯
+╭─ Commands ────────────────────────────────────────────────────────────────────────────────────────╮
+│ run      Runs the full pre-processing routine for GNSS-A                                          │
+╰───────────────────────────────────────────────────────────────────────────────────────────────────╯
 ```
 
 ## Pre-processing solve routine
@@ -54,19 +57,40 @@ Currently there's a single command available in the CLI, `run`, which will run t
 You can retrieve the helper text for this command by running `gnatss run --help`.
 
 ```console
-Usage: gnatss run [OPTIONS]
-
-  Runs the full pre-processing routine for GNSS-A
-
-  Note: Currently only supports 3 transponders
-
-Options:
-  --config-yaml TEXT              Custom path to configuration yaml file.
-                                  **Currently only support local files!**
-  --extract-res / --no-extract-res
-                                  Flag to extract residual files from run.
-                                  [default: no-extract-res]
-  --help                          Show this message and exit.
+ Usage: gnatss run [OPTIONS]
+
+ Runs the full pre-processing routine for GNSS-A
+ Note: Currently only supports 3 transponders
+
+╭─ Options ──────────────────────────────────────────────────────────────────────────────────────╮
+│ --config-yaml                                              TEXT   Custom path to configuration │
+│                                                                   yaml file. **Currently only  │
+│                                                                   support local files!**       │
+│                                                                   [default: None]              │
+│ --extract-dist-center        --no-extract-dist-center             Flag to extract distance     │
+│                                                                   from center from run.        │
+│                                                                   [default:                    │
+│                                                                   no-extract-dist-center]      │
+│ --extract-process-dataset    --no-extract-process-data…           Flag to extract process      │
+│                                                                   results.                     │
+│                                                                   [default:                    │
+│                                                                   no-extract-process-dataset]  │
+│ --distance-limit                                           FLOAT  Distance in meters from      │
+│                                                                   center beyond which points   │
+│                                                                   will be excluded from        │
+│                                                                   solution. Note that this     │
+│                                                                   will override the value set  │
+│                                                                   as configuration.            │
+│                                                                   [default: None]              │
+│ --residual-limit                                           FLOAT  Maximum residual in          │
+│                                                                   centimeters beyond which     │
+│                                                                   data points will be excluded │
+│                                                                   from solution. Note that     │
+│                                                                   this will override the value │
+│                                                                   set as configuration.        │
+│                                                                   [default: None]              │
+│ --help                                                            Show this message and exit.  │
+╰────────────────────────────────────────────────────────────────────────────────────────────────╯
 ```
 
 *Currently the pre-processing routine have been tested to only supports 3 transponders, but this will be expanded in the future.*
@@ -121,17 +145,18 @@ solver:
       path: /path/to/**/WG_*/pxp_tt # this option will take in glob patterns
     gps_solution:
       path: /path/to/**/posfilter/POS_FREED_TRANS_TWTT # this option will take in glob patterns
-    deletions:
-      path: /path/to/deletns.dat
-
 
 output:
   path: /my/output/dir/
 ```
 
+### Deletions file
+
+This will output the final resulting deletions file to the output directory specified in the configuration yaml file.
+This file will be in Comma Separated Value (CSV) format called `deletions.csv`.
+
 ### Residual file
 
-Currently, you can extract the residual file from the run command by passing in the `--extract-res` flag.
 This will output the final resulting residual file to the output directory specified in the configuration yaml file.
 This file will be in Comma Separated Value (CSV) format called `residuals.csv`.
 

diff --git a/src/gnatss/cli.py b/src/gnatss/cli.py
@@ -29,9 +29,6 @@ def run(
         None,
         help="Custom path to configuration yaml file. **Currently only support local files!**",
     ),
-    extract_res: Optional[bool] = typer.Option(
-        False, help="Flag to extract residual files from run."
-    ),
     extract_dist_center: Optional[bool] = typer.Option(
         False, help="Flag to extract distance from center from run."
     ),
@@ -76,8 +73,6 @@ def run(
     _, _, resdf, dist_center_df, process_ds, outliers_df = main(
         config,
         all_files_dict,
-        extract_res=extract_res,
-        extract_dist_center=extract_dist_center,
         extract_process_dataset=extract_process_dataset,
     )
 
@@ -90,18 +85,18 @@ def run(
         )
         dist_center_df.to_csv(dist_center_csv, index=False)
 
-    if extract_res:
-        # Write out to residuals.csv file
-        res_csv = output_path / CSVOutput.residuals.value
-        typer.echo(f"Saving the latest residuals to {str(res_csv.absolute())}")
-        resdf.to_csv(res_csv, index=False)
-
-        if len(outliers_df) > 0:
-            outliers_csv = output_path / CSVOutput.outliers.value
-            typer.echo(
-                f"Saving the latest residual outliers to {str(outliers_csv.absolute())}"
-            )
-            outliers_df.to_csv(outliers_csv, index=False)
+    # Write out to residuals.csv file
+    res_csv = output_path / CSVOutput.residuals.value
+    typer.echo(f"Saving the latest residuals to {str(res_csv.absolute())}")
+    resdf.to_csv(res_csv, index=False)
+
+    # Write out to outliers.csv file
+    if len(outliers_df) > 0:
+        outliers_csv = output_path / CSVOutput.outliers.value
+        typer.echo(
+            f"Saving the latest residual outliers to {str(outliers_csv.absolute())}"
+        )
+        outliers_df.to_csv(outliers_csv, index=False)
 
     if extract_process_dataset:
         # Write out to process_dataset.nc file

diff --git a/src/gnatss/configs/io.py b/src/gnatss/configs/io.py
@@ -19,6 +19,7 @@ class CSVOutput(StrEnum):
     outliers = "outliers.csv"
     residuals = "residuals.csv"
     dist_center = "dist_center.csv"
+    deletions = "deletions.csv"
 
 
 class InputData(BaseModel):

diff --git a/src/gnatss/configs/solver.py b/src/gnatss/configs/solver.py
@@ -45,8 +45,8 @@ class SolverInputs(BaseModel):
     gps_solution: InputData = Field(
         ..., description="GPS solution data path specification."
     )
-    deletions: InputData = Field(
-        ..., description="Deletions file for unwanted data points."
+    deletions: Optional[InputData] = Field(
+        None, description="Deletions file for unwanted data points."
     )
 
 

diff --git a/src/gnatss/loaders.py b/src/gnatss/loaders.py
@@ -283,33 +283,40 @@ def load_deletions(
     pd.DataFrame
         Deletion ranges data pandas dataframe
     """
-    from .utilities.time import AstroTime
+    output_path = Path(config.output.path)
+    # TODO: Add additional files to be used for deletions
+    default_deletions = output_path / CSVOutput.deletions.value
+    if file_path:
+        from .utilities.time import AstroTime
 
-    cut_df = pd.read_fwf(file_path, header=None)
-    # Date example: 28-JUL-22 12:30:00
-    cut_df[constants.DEL_STARTTIME] = pd.to_datetime(
-        cut_df[0] + "T" + cut_df[1], format="%d-%b-%yT%H:%M:%S"
-    )
-    cut_df[constants.DEL_ENDTIME] = pd.to_datetime(
-        cut_df[2] + "T" + cut_df[3], format="%d-%b-%yT%H:%M:%S"
-    )
-    # Got rid of the other columns
-    # TODO: Parse the other columns
-    cut_columns = cut_df.columns[0:-2]
-    cut_df.drop(columns=cut_columns, inplace=True)
-
-    # Convert time string to j2000,
-    # assuming that they're in Terrestrial Time (TT) scale
-    cut_df[constants.DEL_STARTTIME] = cut_df[constants.DEL_STARTTIME].apply(
-        lambda row: AstroTime(row, scale=time_scale).unix_j2000
-    )
-    cut_df[constants.DEL_ENDTIME] = cut_df[constants.DEL_ENDTIME].apply(
-        lambda row: AstroTime(row, scale=time_scale).unix_j2000
-    )
+        cut_df = pd.read_fwf(file_path, header=None)
+        # Date example: 28-JUL-22 12:30:00
+        cut_df[constants.DEL_STARTTIME] = pd.to_datetime(
+            cut_df[0] + "T" + cut_df[1], format="%d-%b-%yT%H:%M:%S"
+        )
+        cut_df[constants.DEL_ENDTIME] = pd.to_datetime(
+            cut_df[2] + "T" + cut_df[3], format="%d-%b-%yT%H:%M:%S"
+        )
+        # Got rid of the other columns
+        # TODO: Parse the other columns
+        cut_columns = cut_df.columns[0:-2]
+        cut_df.drop(columns=cut_columns, inplace=True)
+
+        # Convert time string to j2000,
+        # assuming that they're in Terrestrial Time (TT) scale
+        cut_df[constants.DEL_STARTTIME] = cut_df[constants.DEL_STARTTIME].apply(
+            lambda row: AstroTime(row, scale=time_scale).unix_j2000
+        )
+        cut_df[constants.DEL_ENDTIME] = cut_df[constants.DEL_ENDTIME].apply(
+            lambda row: AstroTime(row, scale=time_scale).unix_j2000
+        )
+    elif default_deletions.exists():
+        cut_df = pd.read_csv(default_deletions)
+    else:
+        cut_df = pd.DataFrame()
 
     # Try to find outliers.csv file if there is one run already
     # this currently assumes that the file is in the output directory
-    output_path = Path(config.output.path)
     outliers_csv = output_path / CSVOutput.outliers.value
     if outliers_csv.exists():
         import typer
@@ -318,9 +325,14 @@ def load_deletions(
         outliers_df = pd.read_csv(outliers_csv)
         outlier_cut = pd.DataFrame.from_records(
             outliers_df[constants.TT_TIME].apply(lambda row: (row, row)).to_numpy(),
-            columns=cut_df.columns,
+            columns=[constants.DEL_STARTTIME, constants.DEL_ENDTIME],
         )
         # Include the outlier cut into here
         cut_df = pd.concat([cut_df, outlier_cut])
+        outliers_csv.unlink()
+
+    # Export to a deletions csv
+    if not cut_df.empty:
+        cut_df.to_csv(output_path / CSVOutput.deletions.value, index=False)
 
     return cut_df