-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove creation of unused partitions, implement recommendation #1188
Conversation
This PR is a prerequisite for the wip-migrate branch. The remove/replace smoketests can fail since the repetitive nature destroys and creates OSDs a few times. The accumulation of orphaned partitions causes the smoketest to fail. I will ask Igor to review this change as well since the changes are based on my discussions with him with regards to the preferred method of configuring bluestore on separate devices. |
PR is OK http://storage-ci.suse.de:8080/job/deepsea-pr/72/
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand, it's now no longer possible to have separate data, wal and db. It's either data, or data with separate wal, or data with separate db?
PR is OK http://storage-ci.suse.de:8080/job/deepsea-pr/73/
|
PR is OK http://storage-ci.suse.de:8080/job/deepsea-pr/75/
|
Does susebot run every time somebody makes a comment? @tserong You can still have data, separate wal, separate db (i.e. 3 devices), but if only using two devices with wal and db on the same device, we have only been making our lives harder. There's no performance difference. If the wal ran out of space, it still spilled over to the db. And we had two partitions to manage per OSD. This will give a management aspect similar to filestore. Is my journal on the same device or different device. Specifying just the db is apparently enough for the wal to use the db as well. So, conversations are easier. One partition per OSD on a shared device should be simpler. We still keep the wal by itself option on the odd chance somebody has really small SSDs and is willing to keep the db on the main disk for a specific hardware setup. |
@@ -898,8 +898,10 @@ def _bluestore_partitions(self): | |||
log.warning(("WAL size is unsupported for same device of " | |||
"{}".format(self.osd.device))) | |||
else: | |||
# Create wal of wal_size on wal device | |||
self.create(self.osd.wal, [('wal', self.osd.wal_size)]) | |||
log.info(("WAL will reside on same device {} as db - " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I missed something but I don't see the difference in handling following two cases:
- wal == db
- wal != db
IMO both are getting here but there is no wal creation for case 2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the wal would not be created in either case, but the reason is different for the admin. The first message is that ceph-disk does not support specifying a wal on the same device as the OSD. The second is that it's unnecessary.
log.info(("WAL will reside on same device {} as db - " | ||
"recommend removing the WAL entry from the " | ||
"configuration for device " | ||
"{}").format(self.osd.db, self.osd.device)) | ||
else: | ||
# pylint: disable=line-too-long | ||
log.warning("No size specified for wal {}. Using default sizes.".format(self.osd.wal)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR but just in case - are this line and similar handling for absent db_size relevant? It looks like respective devices wouldn't be created at all rather than "using default sizes".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a size is not specified, we cannot create a partition. However, ceph-disk will accept a device without the partition and create partitions itself. So, the "default sizes" are for what ceph-disk will use.
@swiftgist No, susebot does not do that on each comment, I have triggered the test job for this PR again on jenkins. |
It reads to me as if you can have three separate devices, but only if you don't specify wal_size or db_size. If either wal_size or db_size are specified, it seems to be forcing the wal onto the same device as the db. Is that correct? |
Also, see DeepSea/srv/salt/_modules/osd.py Line 910 in 2e465f0
if self.osd.db == self.osd.device , not if self.osd.wal == self.osd.device .
|
2e465f0
to
a2de6c8
Compare
Fixed line 910. Also, fixed the indent, removed old comments and wal creation when no wal is defined without sizes. |
With respect to no sizes, the |
The bluestore partition process has been working from original assumptions. The refactor of bluestore args created a disconnect between the two steps. Partitions are getting created, but never used nor removed. This also implements the recommendation to use a single DB partition since the WAL will naturally use an existing DB. This reduces the number of managed partitions from two to one for each OSD. Note that osd.report is intentionally not changed to give the admin an indication that the configuration can be improved. Signed-off-by: Eric Jackson <[email protected]>
a2de6c8
to
45e16bd
Compare
What about if someone wants to specify sizes for both db and wal, but wants to have three separate devices? That seems to be not possible. Or is that not a sensible thing for someone to want? Or, do I still not quite understand the logic? :-) |
No, that would work too. With sizes, the BTW, the |
@tserong are we good to merge? |
With the disclaimer that I haven't yet been able to test multi-device OSDs myself, yes, I'm happy this has received enough scrutiny (this one just hurt my brain trying to hold it in my head, for some reason :-/) |
I just tried the following setup: ceph:
storage:
osds:
/dev/vdb:
format: bluestore
wal: /dev/vde
db: /dev/vdf
db_size: 1G
wal_size: 2G
/dev/vdc:
format: bluestore
wal: /dev/vde
db: /dev/vdf
db_size: 4G
wal_size: 3G
/dev/vdd:
format: bluestore
wal: /dev/vde
db: /dev/vdf
which results in:
And the following logs:
Whilst osd.report still complains:
I'm not too convinced that this is the right way to go. We more or less silently forbid certain configurations that most users were anticipate to be correct. |
Comment is referencing my last comment above: I managed to get /dev/vdb and /dev/vdc to deployed as expected ( only one partition gets created on the db: /dev/vdf respectively ) EDIT /dev/vdd did not work as expected. Potential issues:
This PR is considered to be a prerequisite for #1216 with the argument:
iirc the smoketests use remove.osd which should take care of orphaned partitions? In case we have orphaned partitions there is a bug in the removal. If we don't have enough space on the partitions in the first place we should think about increasing the size of the disks on our testing machines. ( qcow2 is a sparse disks and does not take up space anyways.. ) I'm still not convinced that we should try to mangle a user-created configuration. Imo we should exactly do what the user told deepsea to do. If he went for it, he shall have it. I'm not against a warning message that tells him that this is not the most efficient configuration to go for, but we do what he asked for. I'd like to hear your thoughts on that @tserong @swiftgist @smithfarm |
@jschmid1 The bug isn't in the removal. The removal cannot wipe all partitions on a secondary device and can only remove the ones referenced. The bug is in the partition creation currently because it creates an unreferenced partition. The rest was to bring the behavior expected/desired by recent talks on the mailing lists. I would have to look closer, but the spacing between |
Forgive me my naivety, but isn't that exactly what we need? We need to remove the partitions that the passed OSD is pointing to. Why do we need to remove all partitions on a secondary device?
In which circumstances?
right, I see that. I'd still say we either stick to the old behavior or don't 'advise' anything at all.
The spaces may have sneaked in during copy/paste.
with #1244 we will get a better picture of what is currently works and what doesn't. |
any progress @swiftgist ? |
Closing. Created https://github.com/SUSE/DeepSea/tree/fix-extra-partition |
@swiftgist is that merged into some exsting/open PR? |
The bluestore partition process has been working from original
assumptions. The refactor of bluestore args created a disconnect
between the two steps. Partitions are getting created, but never
used nor removed.
This also implements the recommendation to use a single DB partition
since the WAL will naturally use an existing DB. This reduces the
number of managed partitions from two to one for each OSD. Note
that osd.report is intentionally not changed to give the admin an
indication that the configuration can be improved.
Signed-off-by: Eric Jackson [email protected]