Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local archive mode (fixes #6) #88

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
fbca19a
local/remote switch for rsync
jkbecker Apr 2, 2021
a39f39a
fix local rsync dest
jkbecker Apr 2, 2021
fc7ea41
add notes to config.yaml
jkbecker Apr 2, 2021
f3281d6
Merge branch 'development' of github.com:ericaltendorf/plotman into a…
jkbecker Apr 2, 2021
ad86f13
Merge branch 'setup' into archivelocal
jkbecker Apr 4, 2021
715267b
Merge branch 'setup' of github.com:jkbecker/plotman into archivelocal
jkbecker Apr 5, 2021
6882e10
Merge branch 'development' of github.com:jkbecker/plotman into archiv…
jkbecker Apr 7, 2021
829661c
Update config.yaml
jkbecker Apr 7, 2021
aae3654
Update src/plotman/archive.py
jkbecker Apr 7, 2021
9f205df
Merge branch 'development' of github.com:ericaltendorf/plotman into a…
jkbecker Apr 28, 2021
c570f7c
Merge branch 'archivelocal' of github.com:jkbecker/plotman into archi…
jkbecker Apr 28, 2021
c7d8ce2
adapt config template
jkbecker Apr 28, 2021
68b683f
adapt error message
jkbecker Apr 28, 2021
9d5ae8a
doc fix
jkbecker Apr 28, 2021
2d917fa
whitespace, defaults
jkbecker Apr 28, 2021
43ead3d
Update src/plotman/configuration.py
jkbecker Apr 28, 2021
4aee539
Merge branch 'development' into archivelocal
altendky Apr 28, 2021
194dc6d
Update src/plotman/configuration.py
jkbecker Apr 29, 2021
c5d5ad6
Update src/plotman/archive.py
jkbecker Apr 29, 2021
59de0c6
Update src/plotman/archive.py
altendky Apr 29, 2021
7f15c43
Merge branch 'development' of github.com:ericaltendorf/plotman into a…
jkbecker May 8, 2021
11d06a7
new config file format supporting local archive mode
jkbecker May 8, 2021
6df447e
prepare for legacy vs extendable new configs
jkbecker May 8, 2021
b97c385
merging
jkbecker May 8, 2021
4c92cb1
better example value for local mode
jkbecker May 8, 2021
57ebfe5
some archive tests
jkbecker May 8, 2021
160b3f9
rsync_dest -> arch_dest refactoring
jkbecker May 8, 2021
fff7c22
Merge branch 'development' into archivelocal
altendky May 17, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 34 additions & 16 deletions src/plotman/archive.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,12 @@ def compute_priority(phase, gb_free, n_plots):

def get_archdir_freebytes(arch_cfg):
archdir_freebytes = {}
df_cmd = ('ssh %s@%s df -aBK | grep " %s/"' %
(arch_cfg.rsyncd_user, arch_cfg.rsyncd_host, arch_cfg.rsyncd_path) )
if arch_cfg.mode == 'legacy':
df_cmd = ('ssh %s@%s df -aBK | grep " %s/"' %
(arch_cfg.rsyncd_user, arch_cfg.rsyncd_host, arch_cfg.rsyncd_path) )
else:
arch_cfg_custom = getattr(arch_cfg, arch_cfg.mode)
df_cmd = (arch_cfg_custom.df_cmd.format(arch_cfg_custom.path))
with subprocess.Popen(df_cmd, shell=True, stdout=subprocess.PIPE) as proc:
for line in proc.stdout.readlines():
fields = line.split()
Expand All @@ -100,12 +104,15 @@ def get_archdir_freebytes(arch_cfg):
return archdir_freebytes

def rsync_dest(arch_cfg, arch_dir):
rsync_path = arch_dir.replace(arch_cfg.rsyncd_path, arch_cfg.rsyncd_module)
if rsync_path.startswith('/'):
rsync_path = rsync_path[1:] # Avoid dup slashes. TODO use path join?
rsync_url = 'rsync://%s@%s:12000/%s' % (
arch_cfg.rsyncd_user, arch_cfg.rsyncd_host, rsync_path)
return rsync_url
if arch_cfg.mode == 'legacy':
rsync_path = arch_dir.replace(arch_cfg.rsyncd_path, arch_cfg.rsyncd_module)
if rsync_path.startswith('/'):
rsync_path = rsync_path[1:] # Avoid dup slashes. TODO use path join?
return 'rsync://%s@%s:12000/%s' % (
arch_cfg.rsyncd_user, arch_cfg.rsyncd_host, rsync_path)
else:
arch_cfg_custom = getattr(arch_cfg, arch_cfg.mode)
return arch_cfg_custom.path

# TODO: maybe consolidate with similar code in job.py?
def get_running_archive_jobs(arch_cfg):
Expand All @@ -115,7 +122,11 @@ def get_running_archive_jobs(arch_cfg):
dest = rsync_dest(arch_cfg, '/')
for proc in psutil.process_iter(['pid', 'name']):
with contextlib.suppress(psutil.NoSuchProcess):
if proc.name() == 'rsync':
if arch_cfg.mode == 'legacy':
proc_name = 'rsync'
else:
proc_name = getattr(arch_cfg, arch_cfg.mode).archive_tool
if proc.name() == proc_name:
args = proc.cmdline()
for arg in args:
if arg.startswith(dest):
Expand Down Expand Up @@ -156,7 +167,7 @@ def archive(dir_cfg, all_jobs):
archdir_freebytes = get_archdir_freebytes(dir_cfg.archive)
if not archdir_freebytes:
return(False, 'No free archive dirs found.')

archdir = ''
available = [(d, space) for (d, space) in archdir_freebytes.items() if
space > 1.2 * plot_util.get_k32_plotsize()]
Expand All @@ -168,10 +179,17 @@ def archive(dir_cfg, all_jobs):
return(False, 'No archive directories found with enough free space')

msg = 'Found %s with ~%d GB free' % (archdir, freespace / plot_util.GB)

bwlimit = dir_cfg.archive.rsyncd_bwlimit
throttle_arg = ('--bwlimit=%d' % bwlimit) if bwlimit else ''
cmd = ('rsync %s --no-compress --remove-source-files -P %s %s' %
(throttle_arg, chosen_plot, rsync_dest(dir_cfg.archive, archdir)))

if dir_cfg.archive.mode == 'legacy':
bwlimit = dir_cfg.archive.rsyncd_bwlimit
throttle_arg = ('--bwlimit=%d' % bwlimit) if bwlimit else ''
cmd = ('rsync %s --no-compress --remove-source-files -P %s %s' %
(throttle_arg, chosen_plot, rsync_dest(dir_cfg.archive, archdir)))
else:
arch_cfg_custom = getattr(dir_cfg.archive, dir_cfg.archive.mode)
cmd = arch_cfg_custom.archive_cmd.format(
arch_cfg_custom.archive_tool,
arch_cfg_custom.parameters,
chosen_plot,
rsync_dest(dir_cfg.archive, archdir)
)
return (True, cmd)
15 changes: 15 additions & 0 deletions src/plotman/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,14 @@ def get_validated_configs(config_text, config_path):

# Data models used to deserializing/formatting plotman.yaml files.

@attr.frozen
class ArchiveLocal:
path: str
df_cmd: str = 'df -BK | grep " {}/"'
archive_tool: str = 'rsync'
archive_cmd: str = '{} {} {} {}'
parameters: str = '--bwlimit=80000 --no-compress --remove-source-files -P'
Comment on lines +55 to +57
Copy link
Collaborator

@altendky altendky May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, my thought here is to:

  • Keep stuff together as much as possible, but explicitly separate rsync since we need to use it to identify the process (maybe don't separate it though?).
  • To use environment variables to provide the relevant fields to the command.
  • To decouple ourselves from the df format.

Since we already depended on the shell and are now exposing it to the user hazarding even more dependence and expectations, we should allow configuration of what shell (generally any command, even Python) will interpret the command.

Now that I've written this up, I think I roughly create a CI run step... Configurable shell, environment variables, and commands... Anyways.

Not addressed or checked points:

  • Something about df and block size and K and...
  • Consideration of what separator to use between the path and the available space.
  • Details of identifying the transfer process with a shell like Python, or really just in general.

As an example to implement the present rsyncd functionality, maybe:

path: /mnt/my_archive_drives
disk_space_shell: bash
disk_space: |
    df -BK | grep " ${path}/" | awk '{ print $6 ":" $4 }'
transfer_shell: bash
process_name: rsync
transfer: |
    ${process_name} --bwlimit=80000 --no-compress --remove-source-files -P "${source}" "rsync://user@host:1234/rsyncd_site/${destination}"

(I'm sure awk can grep but I'm lazy at the moment and like stacking. It's the UNIX way. And this is about the idea at this point...)

Or for local rsync:

path: /mnt/my_archive_drives
disk_space_shell: bash
disk_space: |
    df -BK | grep " ${path}/" | awk '{ print $6 ":" $4 }'
transfer_shell: bash
process_name: rsync
transfer: |
    ${process_name} --bwlimit=80000 --no-compress --remove-source-files -P "${source}" "${path}/${destination}"

(yeah, some of those slashes are not mandatory but duplicates are ignored so easiest to add extras and make things read sensibly...)

Maybe this is wonderfully controllable. Maybe this is silly. Maybe we can go in this direction and leave out complexities like the configurable shell, just mandate bash. So, maybe this is a more sensible step down the path.

path: /mnt/my_archive_drives
disk_space: df -BK | grep " ${path}/" | awk '{ print $6 ":" $4 }'
process_name: rsync
transfer: ${process_name} --bwlimit=80000 --no-compress --remove-source-files -P "${source}" "${path}/${destination}"


@attr.frozen
class Archive:
rsyncd_module: str
Expand All @@ -56,6 +64,13 @@ class Archive:
rsyncd_host: str
rsyncd_user: str
index: int = 0 # If not explicit, "index" will default to 0
mode: str = desert.ib(
default='legacy',
marshmallow_field=marshmallow.fields.String(
validate=marshmallow.validate.OneOf(choices=['legacy', 'local'])
),
)
local: Optional[ArchiveLocal] = None

@attr.frozen
class TmpOverrides:
Expand Down
8 changes: 7 additions & 1 deletion src/plotman/resources/plotman.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,13 @@ directories:
# have four plotters, you could set this to 0, 1, 2, and 3, on
# the 4 machines, or 0, 1, 0, 1.
# index: 0

# Optional switch to enable local archiving (defaults to remote if absent).
# Note: rsyncd_module, rsyncd_host and rsyncd_user are ignored in local mode.
mode: legacy # legacy (config above) or local
#
# Local mode:
# local:
# path: '/farm'

# Plotting scheduling parameters
scheduling:
Expand Down