Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSD Migration for changed disk slots #1302

Merged
merged 3 commits into from
Aug 27, 2018
Merged

OSD Migration for changed disk slots #1302

merged 3 commits into from
Aug 27, 2018

Conversation

agraul
Copy link
Member

@agraul agraul commented Aug 23, 2018

Fixes #1215

Please have a look already, there are still a few things to do (I've noted them below) but the core business logic should stay more or less the same (unless you find some problems 😉)

Description:
OSD Migration currently only works if the same slot is used. This works
because the generation of proposals is deterministic, so if all disks
are in the same position we end up with the same proposal.

This commit changes that behaviour: instead of creating a new proposal,
the old one is used and adapted. This happens in the following steps:

  1. When salt-run replace.osd X is used and a proposal file (e.g.
    "data1.ceph.yml" in /srv/pillar/proposals/...) gets the "-replace"
    suffix, a new attribute is written for the disks that get replaced
  2. When salt-run proposals.populate is used (e.g. in stage.1), all
    proposal files that have the "-replace" suffix get parsed and the
    disks in the proposal are compared to the currently installed disks
    on the node.
  3. Disks in the old proposal that have the attribute inserted in 1) and
    are still present (i.e. replaced in place) plus new disks (disks
    that are not in the old proposal but present) are considered free.
  4. The disks in the old proposal get interated over and if they are
    labeled for replacement they are swapped with one of the free disks.
  5. This modified proposal gets saved in the place of the original one
    and can be used by later stages

Things that should still be done:

  • Have a way to roll the "destroyed" mark in the CRUSH map back if renaming and inserting of "replace: true" in proposal file fails
  • Clean up and add more unittests
  • Add functional tests

Checklist:

  • Added unittests and or functional tests
  • Adapted documentation
  • Referenced issues or internal bugtracker
  • Ran integration tests successfully ( trigger with @susebot run teuthology )

3. Remove proposal file with "-replace" in its name
"""

# import ipdb; ipdb.set_trace()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this, that should not be in here

@agraul
Copy link
Member Author

agraul commented Aug 23, 2018

Copy link
Contributor

@jan--f jan--f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just from reading through it I think this is on a good way. Some docs on how this is expected to be used never hurt. Looking forward to giving this a spin!

:param profile_dir: Profile directory, e.g. "/srv/pillar/ceph/proposals/profile-default"
"""
ToReplace = namedtuple('ToReplace', ['fullpath', 'filename'])
dir = '{}/stack/default/ceph/minions'.format(profile_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dir is a python built-in function. maybe use dir_ or _dir to avoid confusion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, right! Will change that.

"""
ToReplace = namedtuple('ToReplace', ['fullpath', 'filename'])
dir = '{}/stack/default/ceph/minions'.format(profile_dir)
files = [f for f in os.listdir(dir) if isfile(join(dir, f))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add the f.endswith('-replace') here and thus only iterate once over all profile files. Somehting like
files = [f for f in os.listdir(dir) if isfile(join(dir, f)) and f.endswith('-replace')]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, that does not need two iterations over all profiles

return ""


def _strip_enddigits(string):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doe this do somehting different then string.rstrip('0123456789')? If I'm not missing anything this method is unnecessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is a case of "do before properly looking if there is an built-in that does the same" 😆 I'll adapt the code.

for minion in minions_to_replace:
ReplaceDiskOn(minion).replace()
return True
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that if a replacement is ongoing no new proposals are created? Wondering if that'll bite us at some point when someone replaces a disk and adds a new node in one go...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is actually a good point I did not think about. I guess we could remove the minions in minions_to_replace from the targets and run proposal.generate on the rest.

@agraul agraul force-pushed the wip-sorting-proposal branch 2 times, most recently from 840041f to 0c930cc Compare August 23, 2018 13:46
@@ -52,23 +53,24 @@ def osd(*args, **kwargs):

local = salt.client.LocalClient()
host_osds = local.cmd('I@roles:storage', 'osd.list', tgt_type='compound')
assert isinstance(host_osds, list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local.cmd('I@roles:storage', 'osd.list', tgt_type='compound') returns a dict of host: list_of_osds

@@ -1793,8 +1793,7 @@ def partitions(self, osd_id):

for device_type in ['journal', 'block', 'block.db', 'block.wal', 'block_dmcrypt']:
result = self._uuid_device("{}/{}".format(mount_dir, device_type))
if result:
_partitions[device_type] = result
_partitions[device_type] = result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the if result: is still needed as the partitions method also feeds the grains.

If no /dev/disk/by-id symlink is found, e.g. because that path does not
exist (as it is a case in VMs), the devicename that was provided in the
first place is returned.

Signed-off-by: Alexander Graul <[email protected]>
@agraul agraul force-pushed the wip-sorting-proposal branch 2 times, most recently from 0de3628 to f0c2e89 Compare August 24, 2018 11:35
@agraul agraul force-pushed the wip-sorting-proposal branch from f0c2e89 to bd6f3f5 Compare August 24, 2018 11:54
@jschmid1
Copy link
Contributor

Tried:

  1. salt-run replace.osd 1
    stage 1,2,3
    got the expected result

  2. salt-run replace.osd 1 5(5 being on another host)
    stage 1,2,3
    got the expected result

  3. salt-run replace.osd 1 5(5 being on another host)
    a node is about to be added ( not deployed, no proposal generated )
    stage 1,2,3
    got the expected result

@jschmid1
Copy link
Contributor

lgtm after those two flapping unittests are fixed.

@jschmid1
Copy link
Contributor

Real hardware yields some problems. In-depth explanation following

agraul added 2 commits August 27, 2018 16:54
OSD Migration currently only works if the same slot is used. This works
because the generation of proposals is deterministic, so if all disks
are in the same position we end up with the same proposal.

This commit changes that behaviour: instead of creating a new proposal,
the old one is used and adapted. This happens in the following steps:

1) When `salt-run replace.osd X` is used and a proposal file (e.g.
    "data1.ceph.yml" in /srv/pillar/proposals/...) gets the "-replace"
    suffix, a new attribute is written for the disks that get replaced
2) When `salt-run proposals.populate` is used (e.g. in stage.1), all
    proposal files that have the "-replace" suffix get parsed and the
    disks in the proposal are compared to the currently installed disks
    on the node.
3) Disks in the old proposal that have the attribute inserted in 1) and
    are still present (i.e. replaced in place) plus new disks (disks
    that are not in the old proposal but present) are considered free.
4) The disks in the old proposal get interated over and if they are
    labeled for replacement they are swapped with one of the free disks.
5) This modified proposal gets saved in the place of the original one
    and can be used by later stages

Once the replacement in the proposal went through, new proposals are
generated on all other nodes in case replacements and node additions
happen at once.

Signed-off-by: Alexander Graul <[email protected]>
@agraul agraul force-pushed the wip-sorting-proposal branch from d18ac8f to 02858eb Compare August 27, 2018 14:54
@jschmid1
Copy link
Contributor

@susebot run teuthology

@agraul
Copy link
Member Author

agraul commented Aug 27, 2018

retest this please

@susebot
Copy link
Collaborator

susebot commented Aug 27, 2018

Commit 02858eb is NOT OK for suite suse:tier1.
Check tests results in the Jenkins job: http://158.69.90.90:8080/job/deepsea-pr/98/

@jschmid1
Copy link
Contributor

8/9 failures are infrastructure related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

proposal.generate sorting
4 participants