-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local archive mode (fixes #6) #88
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so two concerns here lead off to other tickets that would be nice to take care of before this. Sometimes you feel like doing that, sometimes you don't... if you don't want to do it I think I'll reserve a day or two for myself to hopefully get to them. If neither of us do then yeah, we can swing back around to dealing with them later.
- There should be a single function holding the code for loading the configuration file. Presently there are two places it is loaded. This function is where defaults should be set.
- The
grep
ing of thedf -aBK
output should just be done in Python and the shell removed from the subprocessing.
Co-authored-by: Kyle Altendorf <[email protected]>
Co-authored-by: Kyle Altendorf <[email protected]>
I'm going to withdraw my request to hold this up on the other changes. I don't think merging this first makes it any harder to get everything done so no reason to wait. Sorry for that. Anyways, let me take another pass here and resolve those discussions. |
@jkbecker, where are we at here? So there's some catch up to do with other changes, like the configuration stuff. I think I wanted the comments in the configuration file to be merged with the existing ones. Maybe a separate local archive config section? Or, maybe this is now caught up in the general 'allow arbitrary archiving actions'? #110 |
Yeah, I feel like it's caught up in the broader discussion, but it could be used as a transitional thing as long as it is disabled by default and opt-in? I won't have the bandwidth to expand on this or make it a more generally-applicable setup in the next few weeks to come. |
welp, that was straightforward Co-authored-by: Kyle Altendorf <[email protected]>
Co-authored-by: Kyle Altendorf <[email protected]>
Co-authored-by: Kyle Altendorf <[email protected]>
Co-authored-by: lopesmcc <[email protected]>
I stitched this local archiving pull request in to my setup and it didn't work. The box I was running it on was receiving rsync plots already from another node - get_running_archive_jobs was seeing them and refusing to start a local archive. I don't know if this is already fixed somewhere in here, but if not - I tracked it down to dest = rsync_dest(arch_cfg, '/') where the / was matching the args for any running rsync process since they all start with a / . To fix it changed that block like this (key part is replacing / with actual rsync path) if arch_cfg.mode == 'remote':
dest = rsync_dest(arch_cfg, '/')
elif arch_cfg.mode == 'local':
dest = rsync_dest(arch_cfg, arch_cfg.rsyncd_path)
else:
raise KeyError(f'Archive mode must be "remote" or "local" ({arch_cfg.mode!r} given). Please inspect plotman.yaml.') |
@fihzy, sorry, I made some changes on my end before looking at your comment. Where is this code snippet from? Is it from |
Yeah, it's from get_running_archive_jobs(). I do pass in that 2nd argument, and the originally submitted patch, rsync_dest() is already updated to use the second argument for mode=remote. So that original code and my tiny patch makes the local archiving work on my box that is doing double duty as plotter + storage for another node that is rsync'ing to it. So satisfied with this software, thanks for all your work on it. I wish I had a better grip on git to interact and submit a patch so as not to make this harder :-) |
archive_tool: str = 'rsync' | ||
archive_cmd: str = '{} {} {} {}' | ||
parameters: str = '--bwlimit=80000 --no-compress --remove-source-files -P' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, my thought here is to:
- Keep stuff together as much as possible, but explicitly separate
rsync
since we need to use it to identify the process (maybe don't separate it though?). - To use environment variables to provide the relevant fields to the command.
- To decouple ourselves from the
df
format.
Since we already depended on the shell and are now exposing it to the user hazarding even more dependence and expectations, we should allow configuration of what shell (generally any command, even Python) will interpret the command.
Now that I've written this up, I think I roughly create a CI run step... Configurable shell, environment variables, and commands... Anyways.
Not addressed or checked points:
- Something about
df
and block size andK
and... - Consideration of what separator to use between the path and the available space.
- Details of identifying the transfer process with a
shell
like Python, or really just in general.
As an example to implement the present rsyncd
functionality, maybe:
path: /mnt/my_archive_drives
disk_space_shell: bash
disk_space: |
df -BK | grep " ${path}/" | awk '{ print $6 ":" $4 }'
transfer_shell: bash
process_name: rsync
transfer: |
${process_name} --bwlimit=80000 --no-compress --remove-source-files -P "${source}" "rsync://user@host:1234/rsyncd_site/${destination}"
(I'm sure awk
can grep
but I'm lazy at the moment and like stacking. It's the UNIX way. And this is about the idea at this point...)
Or for local rsync:
path: /mnt/my_archive_drives
disk_space_shell: bash
disk_space: |
df -BK | grep " ${path}/" | awk '{ print $6 ":" $4 }'
transfer_shell: bash
process_name: rsync
transfer: |
${process_name} --bwlimit=80000 --no-compress --remove-source-files -P "${source}" "${path}/${destination}"
(yeah, some of those slashes are not mandatory but duplicates are ignored so easiest to add extras and make things read sensibly...)
Maybe this is wonderfully controllable. Maybe this is silly. Maybe we can go in this direction and leave out complexities like the configurable shell, just mandate bash. So, maybe this is a more sensible step down the path.
path: /mnt/my_archive_drives
disk_space: df -BK | grep " ${path}/" | awk '{ print $6 ":" $4 }'
process_name: rsync
transfer: ${process_name} --bwlimit=80000 --no-compress --remove-source-files -P "${source}" "${path}/${destination}"
Development has continued in #627 building on this base. Thanks for getting it started... |
Hey,
I needed a local archive mode, so I implemented one.
What can I say,