Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local archive mode (fixes #6) #88

Closed
wants to merge 28 commits into from
Closed

Local archive mode (fixes #6) #88

wants to merge 28 commits into from

Conversation

jkbecker
Copy link
Contributor

@jkbecker jkbecker commented Apr 7, 2021

Hey,

I needed a local archive mode, so I implemented one.

What can I say,
image

Copy link
Collaborator

@altendky altendky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so two concerns here lead off to other tickets that would be nice to take care of before this. Sometimes you feel like doing that, sometimes you don't... if you don't want to do it I think I'll reserve a day or two for myself to hopefully get to them. If neither of us do then yeah, we can swing back around to dealing with them later.

  1. There should be a single function holding the code for loading the configuration file. Presently there are two places it is loaded. This function is where defaults should be set.
  2. The greping of the df -aBK output should just be done in Python and the shell removed from the subprocessing.

config.yaml Outdated Show resolved Hide resolved
config.yaml Outdated Show resolved Hide resolved
src/plotman/archive.py Outdated Show resolved Hide resolved
src/plotman/archive.py Outdated Show resolved Hide resolved
src/plotman/archive.py Outdated Show resolved Hide resolved
src/plotman/archive.py Outdated Show resolved Hide resolved
jkbecker and others added 2 commits April 7, 2021 09:38
Co-authored-by: Kyle Altendorf <[email protected]>
Co-authored-by: Kyle Altendorf <[email protected]>
@altendky
Copy link
Collaborator

altendky commented Apr 8, 2021

I'm going to withdraw my request to hold this up on the other changes. I don't think merging this first makes it any harder to get everything done so no reason to wait. Sorry for that. Anyways, let me take another pass here and resolve those discussions.

@altendky
Copy link
Collaborator

@jkbecker, where are we at here? So there's some catch up to do with other changes, like the configuration stuff. I think I wanted the comments in the configuration file to be merged with the existing ones. Maybe a separate local archive config section? Or, maybe this is now caught up in the general 'allow arbitrary archiving actions'? #110

@jkbecker
Copy link
Contributor Author

Yeah, I feel like it's caught up in the broader discussion, but it could be used as a transitional thing as long as it is disabled by default and opt-in?

I won't have the bandwidth to expand on this or make it a more generally-applicable setup in the next few weeks to come.

jkbecker and others added 2 commits April 28, 2021 17:11
src/plotman/configuration.py Outdated Show resolved Hide resolved
src/plotman/archive.py Outdated Show resolved Hide resolved
jkbecker and others added 2 commits April 28, 2021 22:59
src/plotman/archive.py Outdated Show resolved Hide resolved
@fihzy
Copy link
Contributor

fihzy commented May 8, 2021

I stitched this local archiving pull request in to my setup and it didn't work. The box I was running it on was receiving rsync plots already from another node - get_running_archive_jobs was seeing them and refusing to start a local archive. I don't know if this is already fixed somewhere in here, but if not - I tracked it down to dest = rsync_dest(arch_cfg, '/') where the / was matching the args for any running rsync process since they all start with a / . To fix it changed that block like this (key part is replacing / with actual rsync path)

    if arch_cfg.mode == 'remote':
        dest = rsync_dest(arch_cfg, '/')
    elif arch_cfg.mode == 'local':
        dest = rsync_dest(arch_cfg, arch_cfg.rsyncd_path)
    else:
        raise KeyError(f'Archive mode must be "remote" or "local" ({arch_cfg.mode!r} given). Please inspect plotman.yaml.')

@jkbecker
Copy link
Contributor Author

jkbecker commented May 8, 2021

@fihzy, sorry, I made some changes on my end before looking at your comment. Where is this code snippet from? Is it from get_running_archive_jobs? The rsync_dest() function will actually ignore the second argument completely, so it doesn't really matter what goes in there. Or am I looking in the wrong spot?

@fihzy
Copy link
Contributor

fihzy commented May 8, 2021

@fihzy, sorry, I made some changes on my end before looking at your comment. Where is this code snippet from? Is it from get_running_archive_jobs? The rsync_dest() function will actually ignore the second argument completely, so it doesn't really matter what goes in there. Or am I looking in the wrong spot?

Yeah, it's from get_running_archive_jobs(). I do pass in that 2nd argument, and the originally submitted patch, rsync_dest() is already updated to use the second argument for mode=remote. So that original code and my tiny patch makes the local archiving work on my box that is doing double duty as plotter + storage for another node that is rsync'ing to it. So satisfied with this software, thanks for all your work on it. I wish I had a better grip on git to interact and submit a patch so as not to make this harder :-)

Comment on lines +55 to +57
archive_tool: str = 'rsync'
archive_cmd: str = '{} {} {} {}'
parameters: str = '--bwlimit=80000 --no-compress --remove-source-files -P'
Copy link
Collaborator

@altendky altendky May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, my thought here is to:

  • Keep stuff together as much as possible, but explicitly separate rsync since we need to use it to identify the process (maybe don't separate it though?).
  • To use environment variables to provide the relevant fields to the command.
  • To decouple ourselves from the df format.

Since we already depended on the shell and are now exposing it to the user hazarding even more dependence and expectations, we should allow configuration of what shell (generally any command, even Python) will interpret the command.

Now that I've written this up, I think I roughly create a CI run step... Configurable shell, environment variables, and commands... Anyways.

Not addressed or checked points:

  • Something about df and block size and K and...
  • Consideration of what separator to use between the path and the available space.
  • Details of identifying the transfer process with a shell like Python, or really just in general.

As an example to implement the present rsyncd functionality, maybe:

path: /mnt/my_archive_drives
disk_space_shell: bash
disk_space: |
    df -BK | grep " ${path}/" | awk '{ print $6 ":" $4 }'
transfer_shell: bash
process_name: rsync
transfer: |
    ${process_name} --bwlimit=80000 --no-compress --remove-source-files -P "${source}" "rsync://user@host:1234/rsyncd_site/${destination}"

(I'm sure awk can grep but I'm lazy at the moment and like stacking. It's the UNIX way. And this is about the idea at this point...)

Or for local rsync:

path: /mnt/my_archive_drives
disk_space_shell: bash
disk_space: |
    df -BK | grep " ${path}/" | awk '{ print $6 ":" $4 }'
transfer_shell: bash
process_name: rsync
transfer: |
    ${process_name} --bwlimit=80000 --no-compress --remove-source-files -P "${source}" "${path}/${destination}"

(yeah, some of those slashes are not mandatory but duplicates are ignored so easiest to add extras and make things read sensibly...)

Maybe this is wonderfully controllable. Maybe this is silly. Maybe we can go in this direction and leave out complexities like the configurable shell, just mandate bash. So, maybe this is a more sensible step down the path.

path: /mnt/my_archive_drives
disk_space: df -BK | grep " ${path}/" | awk '{ print $6 ":" $4 }'
process_name: rsync
transfer: ${process_name} --bwlimit=80000 --no-compress --remove-source-files -P "${source}" "${path}/${destination}"

@altendky altendky mentioned this pull request May 22, 2021
11 tasks
@altendky
Copy link
Collaborator

Development has continued in #627 building on this base. Thanks for getting it started... :]

@altendky altendky closed this May 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants