Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: --skip-errors option for skipping commits that rase an exception in the convert function #34

Open
simonw opened this issue Dec 6, 2021 · 4 comments
Labels
enhancement New feature or request research

Comments

@simonw
Copy link
Owner

simonw commented Dec 6, 2021

I could add an option that skips any commits that don't parse correctly with the provided conversion function. This would be the quickest possible way of parsing a history where all but a few commits work.

@simonw simonw added enhancement New feature or request research labels Dec 6, 2021
@simonw
Copy link
Owner Author

simonw commented Dec 6, 2021

I'd like to get rid of this code, this feature could help there:

if not content.strip():
# Skip empty JSON files
continue

@simonw
Copy link
Owner Author

simonw commented Dec 6, 2021

Options for the name:

  • --skip-error-commits
  • --skip-bad-commits
  • --skip-errors - I think I like this one best, it's succinct but memorable enough
  • --ignore-errors
  • --ignore-bad-commits

The related existing options (that this should be consistent with) are:

  • --ignore-duplicate-ids
  • --skip HASH
  • --start-at
  • --start-after

Maybe add a separate small section to the README about ways of skipping bad commits too.

@simonw simonw changed the title Idea: --skip-error-commits option Idea: --skip-errors option for skipping commits that rase an exception in the convert function Dec 6, 2021
@simonw
Copy link
Owner Author

simonw commented Dec 6, 2021

FARA_All_Registrants.csv may be a good file to develop this against.

Here's a prototype diff for this feature (though it should also capture exceptions raised by items = list(convert_function(content))):

diff --git a/git_history/cli.py b/git_history/cli.py
index b119004..08a2f04 100644
--- a/git_history/cli.py
+++ b/git_history/cli.py
@@ -74,6 +74,9 @@ def cli():
 @click.option(
     "skip_hashes", "--skip", multiple=True, help="Skip specific commit hashes"
 )
+@click.option(
+    "--skip-errors", is_flag=True, help="If a version has a parse error, skip it"
+)
 @click.option(
     "--full-versions",
     is_flag=True,
@@ -134,6 +137,7 @@ def file(
     start_at,
     start_after,
     skip_hashes,
+    skip_errors,
     full_versions,
     csv_,
     dialect,
@@ -255,7 +259,13 @@ def file(
                     ).keys()
                 )
                 # Validate all items in the commit have ID columns - raises ClickException if not
-                validate_items_have_id_columns(items, ids, git_hash)
+                try:
+                    validate_items_have_id_columns(items, ids, git_hash)
+                except click.ClickException:
+                    if skip_errors:
+                        continue
+                    else:
+                        raise
 
                 # Use this to detect IDs that are duplicated in the same commit
                 item_ids_seen_in_this_commit = set()

@simonw
Copy link
Owner Author

simonw commented Dec 6, 2021

  File "/Users/simon/Dropbox/Development/git-history/git_history/cli.py", line 233, in file
    items = list(convert_function(content))
  File "<string>", line 3, in fn
  File "/Users/simon/.pyenv/versions/3.10.0/lib/python3.10/csv.py", line 187, in sniff
    raise Error("Could not determine delimiter")
_csv.Error: Could not determine delimiter

It would be good if this was caught and the commit hash was shown in the error message.

simonw added a commit that referenced this issue Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request research
Projects
None yet
Development

No branches or pull requests

1 participant