Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grd failure #161

Closed
wants to merge 22 commits into from
Closed

Grd failure #161

wants to merge 22 commits into from

Conversation

areeda
Copy link
Collaborator

@areeda areeda commented Jun 30, 2023

Limit merges to metric day boundaries
Don't use guardian channels from raw frames for more than 1 hour
get flake8 and pytests to pass

Joseph Areeda and others added 14 commits September 20, 2022 15:26
* Address issue gwpy#126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this

* Do not merge files if they overlap "metric days"

* Do not cross metric day boundaries.

Co-authored-by: Joseph Areeda <[email protected]>
* Do not cross metric day boundaries.

* add log file arg delete empty directories when done

* Tweak remove empty dir removal

* Tweak remove empty dir removal again

* Merge day boundary (gwpy#146)

* Address issue gwpy#126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this

* Do not merge files if they overlap "metric days"

* Do not cross metric day boundaries.

Co-authored-by: Joseph Areeda <[email protected]>

* rebase agaist last approved PR

* rebase against last approved PR

* rebase against last approved PR again, fix flake8

* Fix a bug in remove empty directories.

Co-authored-by: Joseph Areeda <[email protected]>
* Address issue gwpy#126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this

* Do not merge files if they overlap "metric days"

* Do not cross metric day boundaries.

Co-authored-by: Joseph Areeda <[email protected]>
* Do not cross metric day boundaries.

* Merge day boundary (gwpy#146)

* Address issue gwpy#126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this

* Do not merge files if they overlap "metric days"

* Do not cross metric day boundaries.

Co-authored-by: Joseph Areeda <[email protected]>

* Check point merge (gwpy#147)

* Do not cross metric day boundaries.

* add log file arg delete empty directories when done

* Tweak remove empty dir removal

* Tweak remove empty dir removal again

* Merge day boundary (gwpy#146)

* Address issue gwpy#126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this

* Do not merge files if they overlap "metric days"

* Do not cross metric day boundaries.

Co-authored-by: Joseph Areeda <[email protected]>

* rebase agaist last approved PR

* rebase against last approved PR

* rebase against last approved PR again, fix flake8

* Fix a bug in remove empty directories.

Co-authored-by: Joseph Areeda <[email protected]>

* Merge day boundary (gwpy#146)

* Address issue gwpy#126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this

* Do not merge files if they overlap "metric days"

* Do not cross metric day boundaries.

Co-authored-by: Joseph Areeda <[email protected]>

* minor doc changes

* Fix a bug where an xml.gz file could get compressed again in merge-with-gaps

* Implement a periodic vacate to address permanent D-state (uninterupptible wait) causing jobs to fail to complete

* Always create a log file. If not specified put one in the output directory

* Fix a problem with periodic vacate.

* Up the periodic vacate time to 3 hrs

* Found a job killing typo

* Add time limits to post processing also

* Don't save segments.txt file if no sgments founds because we don't know if it's an issue of not finding them or a valid not analyzable state.

* disable periodic vacate to demo the problem.

* Fix reported version in some utilities. Only update segments.txt if omicron is actually run.

* Clarify relative imports. and add details to a few log messages

* Resolve flake8 issues

---------

Co-authored-by: Joseph Areeda <[email protected]>
tweak logging to better underst guardian channel usage-
Comment on lines 550 to 551
logger.critical(f'configuration file does not exist: {config_file.absolute()}')
exit(4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this using sys.exit and not raising an exception?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, how about FileNotFoundError wit the same text?

Comment on lines +793 to +794
if ((online and statechannel) or (statechannel and not stateflag) or (statechannel and args.no_segdb)) and \
dataduration <= 3600:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be easier to read (and maintain) by splitting over many lines

Suggested change
if ((online and statechannel) or (statechannel and not stateflag) or (statechannel and args.no_segdb)) and \
dataduration <= 3600:
if dataduration <= 3600 and (
(online and statechannel)
or (statechannel and not stateflag)
or (statechannel and args.no_segdb)
):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But also, why 3600 seconds, why not 1800, or 7200?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this effectively forbid using a statevector when processing more than 1 hour of data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been a troublesome issue that I'm still searching for a solution.
The original problem was that the segment file was updated regardless of whether the online job succeeded. So when we had delays we didn't process anything. Now the problem is if we get too far behind, a way too common occurrence we get stuck reading hours of raw frame files until evicted with no checkpointing.

I'm guessing at a robust limit so perhaps the "solution" is to make it another command line argument.

Comment on lines +22 to +23
from .. import io
from .. import const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change made?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember exactly but that looks like how I'd respond to a flake8 complaint

…st processing jobs. Also implement that the "right" way instead of my hacky paren-child model.

Verify we work with dqsegdb 1.2.1
@codecov
Copy link

codecov bot commented Sep 30, 2023

Codecov Report

Merging #161 (ed442fb) into master (99416c0) will increase coverage by 0.07%.
Report is 11 commits behind head on master.
The diff coverage is 26.19%.

❗ Current head ed442fb differs from pull request most recent head ab58bd2. Consider uploading reports for the commit ab58bd2 to get more accurate results

@@            Coverage Diff             @@
##           master     #161      +/-   ##
==========================================
+ Coverage   35.16%   35.23%   +0.07%     
==========================================
  Files          30       30              
  Lines        2944     2941       -3     
==========================================
+ Hits         1035     1036       +1     
+ Misses       1909     1905       -4     
Flag Coverage Δ
Conda 35.23% <26.19%> (+0.07%) ⬆️
Linux 35.23% <26.19%> (+0.07%) ⬆️
macOS 35.23% <26.19%> (+0.06%) ⬆️
python3.10 35.23% <26.19%> (+0.07%) ⬆️
python3.11 35.23% <26.19%> (?)
python3.9 35.24% <26.19%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
omicron/log.py 84.75% <100.00%> (+0.26%) ⬆️
omicron/tests/test_io.py 100.00% <100.00%> (ø)
omicron/tests/test_log.py 100.00% <100.00%> (ø)
omicron/tests/test_parameters.py 100.00% <100.00%> (ø)
omicron/cli/process.py 11.92% <3.12%> (+0.10%) ⬆️

... and 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@areeda areeda marked this pull request as draft October 2, 2023 18:30
@pep8speaks
Copy link

pep8speaks commented Oct 12, 2023

Hello @areeda! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 1106:35: F541 f-string is missing placeholders

Comment last updated at 2023-10-12 14:48:43 UTC

@areeda areeda closed this Dec 13, 2023
@areeda areeda deleted the grd-failure branch December 13, 2023 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants