Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acquire a lock prior to top level bids file modification #348

Merged
merged 4 commits into from
Oct 29, 2019

Conversation

stilley2
Copy link
Contributor

Uses SoftLockFile from py-filelock. Potentially addresses #340, although I'm not sure if SoftFileLock is guaranteed to work on distributed filesystems.

Currently the timeout is hardcoded, but if #344 gets merged then we can use the new method of specifying bids options to make the timeout configurable.

@codecov-io
Copy link

codecov-io commented May 18, 2019

Codecov Report

Merging #348 into master will increase coverage by 0.02%.
The diff coverage is 90.9%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #348      +/-   ##
==========================================
+ Coverage   74.56%   74.59%   +0.02%     
==========================================
  Files          35       35              
  Lines        2744     2751       +7     
==========================================
+ Hits         2046     2052       +6     
- Misses        698      699       +1
Impacted Files Coverage Δ
heudiconv/info.py 100% <ø> (ø) ⬆️
heudiconv/convert.py 79.1% <90.9%> (+0.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dcc590d...13be87a. Read the comment docs.

Copy link
Member

@mgxd mgxd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stilley2 - left a couple of small comments

@@ -32,6 +33,8 @@
compress_dicoms
)

LOCKFILE = 'heudiconv.lock'
LOCKFILE_TIMEOUT = 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not having a timeout by default, but allowing users to overwrite via an environmental variable.

populate_bids_templates(anon_outdir,
getattr(heuristic, 'DEFAULT_FIELDS', {}))
with SoftFileLock(op.join(anon_outdir, LOCKFILE),
timeout=LOCKFILE_TIMEOUT):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe something like

timeout = os.environ.get("HEUDICONV_LOCKFILE_TIMEOUT", -1)

@@ -202,14 +205,16 @@ def prep_conversion(sid, dicoms, outdir, heuristic, converter, anon_sid,
clear_temp_dicoms(item_dicoms)

if bids:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should log a warning to users if this file exists before entering the context manager, maybe something like:

Existing lockfile found in {path to LOCKFILE} - waiting the lock to be released. To set a timeout limit, set the HEUDICONV_FILELOCK_TIMEOUT environmental variable to a value in seconds. If this process hangs, it may require a manual deletion of the {LOCKFILE}.

@@ -1,3 +1,4 @@
from filelock import SoftFileLock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I personally not familiar with filelock (used fasteners when needed), but it seems to be alive, so ok

@yarikoptic
Copy link
Member

yarikoptic commented Oct 28, 2019

@stilley2 - would you have time in the near future to finalize this PR, i.e. to address @mgxd comments and to resolve conflicts?
It seems that more ppl are running into #362 so we better have it addressed, and locking is the way to go, so thank you for your contribution!

@stilley2
Copy link
Contributor Author

Sorry this fell off my radar. I added your suggestions

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great to me. thank you! I will leave it to @mgxd to give the final blessing/merge

Copy link
Member

@mgxd mgxd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one small comment

heudiconv/convert.py Outdated Show resolved Hide resolved
Co-Authored-By: Mathias Goncalves <[email protected]>
@mgxd
Copy link
Member

mgxd commented Oct 29, 2019

thanks, let's try it out

@mgxd mgxd merged commit b12e931 into nipy:master Oct 29, 2019
@yarikoptic yarikoptic added this to the 0.6 milestone Nov 22, 2019
yarikoptic added a commit that referenced this pull request Dec 16, 2019
This is largely a bug fix.  Metadata and order of `_key-value` fields in BIDS
could change from the result of converting using previous versions, thus minor
version boost.
14 people contributed to this release -- thanks
[everyone](https://github.com/nipy/heudiconv/graphs/contributors)!

 Enhancement

- Use [etelemetry](https://pypi.org/project/etelemetry) to inform about most
  recent available version of heudiconv. Please set `NO_ET` environment variable
  if you want to disable it ([#369][])
- BIDS:
  - `--bids` flag became an option. It can (optionally) accept `notop` value
    to avoid creation of top level files (`CHANGES`, `dataset_description.json`,
    etc) as a workaround during parallel execution to avoid race conditions etc.
    ([#344][])
  - Generate basic `.json` files with descriptions of the fields for
    `participants.tsv` and `_scans.tsv` files ([#376][])
  - Use `filelock` while writing top level files. Use
    `HEUDICONV_FILELOCK_TIMEOUT` environment to change the default timeout value
    ([#348][])
  - `_PDT2` was added as a suffix for multi-echo (really "multi-modal")
    sequences ([#345][])
- Calls to `dcm2niix` would include full output path to make it easier to
  discern in the logs what file it is working on ([#351][])
- With recent [datalad]() (>= 0.10), created DataLad dataset will use
  `--fake-dates` functionality of DataLad to not leak data conversion dates,
  which might be close to actual data acquisition/patient visit ([#352][])
- Support multi-echo EPI `_phase` data ([#373][] fixes [#368][])
- Log location of a bad .json file to ease troubleshooting ([#379][])
- Add basic pypi classifiers for the package ([#380][])

 Fixed
- Sorting `_scans.tsv` files lacking valid dates field should not cause a crash
  ([#337][])
- Multi-echo files detection based number of echos ([#339][])
- BIDS
  - Use `EchoTimes` from the associated multi-echo files if `EchoNumber` tag is
    missing ([#366][] fixes [#347][])
  - Tolerate empty ContentTime and/or ContentDate in DICOMs ([#372][]) and place
    "n/a" if value is missing ([#390][])
  - Do not crash and store original .json file is "JSON pretification" fails
    ([#342][])
- ReproIn heuristic
  - tollerate WIP prefix on Philips scanners ([#343][])
  - allow for use of `(...)` instead of `{...}` since `{}` are not allowed
    ([#343][])
  - Support pipolar fieldmaps by providing them with `_epi` not `_magnitude`.
    "Loose" BIDS `_key-value` pairs might come now after `_dir-` even if they
    came first before ([#358][] fixes [#357][])
- All heuristics saved under `.heudiconv/` under `heuristic.py` name, to avoid
  discrepancy during reconversion ([#354][] fixes [#353][])
- Do not crash (with TypeError) while trying to sort absent file list ([#360][])
- heudiconv requires nipype >= 1.0.0 ([#364][]) and blacklists `1.2.[12]` ([#375][])

* tag 'v0.6.0': (60 commits)
  Version boost to 0.6.0
  DOC: populate detailed changelog for 0.6.0 and tune up formatting in previous one
  Fix miscellaneous typos in ReproIn heuristic file.
  BF: fix check for the sbatch (SLURM) not being available
  ENH: make test-compare-two-versions take any two worktrees, and just show diff if results already known
  Update heudiconv/convert.py
  apply @mgxd 's suggestions, adding a warning and a timeout environment variable
  need str typecast
  Use empty string not None
  Empty acq_time results in empty cell not 'n/a'
  DOC: Clarify tarball session handling
  remove repetitive import statement
  respond to review - add explicit py2 check - change file saving strategy - use logger instead of print
  fix remaning py2 errors
  MNT: Add Python support metadata to package
  fix some python2/3 incompatibilities
  add return data (accidently removed return)
  make content unicode
  test that load_json provides filename if invalid
  explicitly name invalid json
  ...
yarikoptic added a commit that referenced this pull request Dec 16, 2019
[0.6.0] - 2019-12-16

This is largely a bug fix.  Metadata and order of `_key-value` fields in BIDS
could change from the result of converting using previous versions, thus minor
version boost.
14 people contributed to this release -- thanks
[everyone](https://github.com/nipy/heudiconv/graphs/contributors)!

Enhancement

- Use [etelemetry](https://pypi.org/project/etelemetry) to inform about most
  recent available version of heudiconv. Please set `NO_ET` environment variable
  if you want to disable it ([#369][])
- BIDS:
  - `--bids` flag became an option. It can (optionally) accept `notop` value
    to avoid creation of top level files (`CHANGES`, `dataset_description.json`,
    etc) as a workaround during parallel execution to avoid race conditions etc.
    ([#344][])
  - Generate basic `.json` files with descriptions of the fields for
    `participants.tsv` and `_scans.tsv` files ([#376][])
  - Use `filelock` while writing top level files. Use
    `HEUDICONV_FILELOCK_TIMEOUT` environment to change the default timeout value
    ([#348][])
  - `_PDT2` was added as a suffix for multi-echo (really "multi-modal")
    sequences ([#345][])
- Calls to `dcm2niix` would include full output path to make it easier to
  discern in the logs what file it is working on ([#351][])
- With recent [datalad]() (>= 0.10), created DataLad dataset will use
  `--fake-dates` functionality of DataLad to not leak data conversion dates,
  which might be close to actual data acquisition/patient visit ([#352][])
- Support multi-echo EPI `_phase` data ([#373][] fixes [#368][])
- Log location of a bad .json file to ease troubleshooting ([#379][])
- Add basic pypi classifiers for the package ([#380][])

Fixed

- Sorting `_scans.tsv` files lacking valid dates field should not cause a crash
  ([#337][])
- Multi-echo files detection based number of echos ([#339][])
- BIDS
  - Use `EchoTimes` from the associated multi-echo files if `EchoNumber` tag is
    missing ([#366][] fixes [#347][])
  - Tolerate empty ContentTime and/or ContentDate in DICOMs ([#372][]) and place
    "n/a" if value is missing ([#390][])
  - Do not crash and store original .json file is "JSON pretification" fails
    ([#342][])
- ReproIn heuristic
  - tolerate WIP prefix on Philips scanners ([#343][])
  - allow for use of `(...)` instead of `{...}` since `{}` are not allowed
    ([#343][])
  - Support pipolar fieldmaps by providing them with `_epi` not `_magnitude`.
    "Loose" BIDS `_key-value` pairs might come now after `_dir-` even if they
    came first before ([#358][] fixes [#357][])
- All heuristics saved under `.heudiconv/` under `heuristic.py` name, to avoid
  discrepancy during reconversion ([#354][] fixes [#353][])
- Do not crash (with TypeError) while trying to sort absent file list ([#360][])
- heudiconv requires nipype >= 1.0.0 ([#364][]) and blacklists `1.2.[12]` ([#375][])

* tag 'v0.6.0':
  Boost perspective release date in changelog to today
  ENH(TST): Fix version to older pytest to ease backward compatibility testing
  RF: use tmpdir not tmp_path fixture
  FIX: minor typo in CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants