-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally mount Instance Storage on boot #557
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏🏼
This would be a huge help to us! Anything I can do to help move this along? (By the way, I think we'd also want the option to move /var/lib/buildkite to the local storage, so that compilation would also see the faster I/O.) |
@benesch would be very interested to see if you actually saw performance improvements that were meaningful from this change. I guess there is only one way to find out! |
Hi @lox, I am keen to try this PR. Anything I can do to help move this along? We will be trying this on c5.24xlarge with about 100 buildkite agents. I think this PR is critical for our idea to work. |
Hi @lox, I tried this commit and the instance failed to start docker daemon with following error. The docker daemon.json file seems to be empty.
I tried this branch with following steps:
After launch, I can see the ephemeral drive is created 👍 . The instance type in this case is m5d.24large
|
@ans0600 really appreciate the testing! I'll get this rebased and see if I can figure out what is going on. |
89d7481
to
805bb00
Compare
805bb00
to
a0e5c2d
Compare
@lox Just want to check if the PR is ready to be tested again? |
@jradtilbrook or @ans0600 if either of you get a chance to test this out, let me know! |
For those following along, you can test it out with https://s3.amazonaws.com/buildkite-aws-stack/add-instance-storage/aws-stack.yml. Would love feedback! |
I'm under the pump for other things at work so I don't think I'll be able to try this out any time soon sorry - just in case you were relying/waiting on me |
No worries @jradtilbrook, some other folks asked about it! |
I just gave this a try and confirmed that it is still failing to spin up new instances. The issue appears to be that the jq filter syntax being used does not support key names containing hyphens (alphanumeric characters and underscores only): https://github.com/buildkite/elastic-ci-stack-for-aws/pull/557/files#diff-75f0630f614eebc548f6a4beeda4f879R23 To fix it, we just need to add double quotes around the key name:
In fact, it looks like we do exactly that for the "userns-remap" feature in that same script. I applied the patch above, built an AMI, and confirmed that the instance stays running. However, when running a build, I ran into the next error:
Which I fixed by changing this:
Now everything appears to be working. |
Hi @lox, I've:
I've replaced our "builder" stack (mostly docker/yarn/webpack) by this new stack keeping most of the parameters the same and just changing the instance type: With a cold docker cache, the build went from ~60min down to ~16min. So this is a massive improvement for us! I had to apply the following patch to fix the striped volume creation and mount: --- a/packer/linux/conf/bin/bk-mount-instance-storage.sh
+++ b/packer/linux/conf/bin/bk-mount-instance-storage.sh
@@ -10,16 +10,13 @@ if [[ "${BUILDKITE_ENABLE_INSTANCE_STORAGE:-false}" != "true" ]] ; then
exit 0
fi
+# Install nvme-cli to list NVMe SSD instance store volumes
+yum -y -d0 install nvme-cli
+
devicemount=/ephemeral
logicalname=/dev/md0
-candidates=( '/dev/nvme1n1' )
-devices=()
+devices=($(nvme list | grep "Amazon EC2 NVMe Instance Storage"| cut -f1 -d' '))
-for candidate in "${candidates[@]}" ; do
- if [[ -b $candidate ]] ; then
- devices+=("$candidate")
- fi
-done
if [[ "${#devices[@]}" -gt 0 ]] ; then
mkdir -p "$devicemount"
@@ -35,13 +32,13 @@ if [[ "${#devices[@]}" -eq 1 ]] ; then
fi
elif [[ "${#devices[@]}" -gt 1 ]] ; then
- yes | mdadm \
+ mdadm \
--create "$logicalname" \
--level=0 \
-c256 \
--raid-devices="${#devices[@]}" "${devices[@]}"
- echo \'DEVICE "${devices[*]}"\' > /etc/mdadm.conf
+ echo "DEVICE ${devices[*]}" > /etc/mdadm.conf
mdadm --detail --scan >> /etc/mdadm.conf
blockdev --setra 65536 "$logicalname" |
I also have a branch based off |
We've also tried this out. It made a modest improvement to docker image build times in our situation. Would still be good to see this merged so that it can be experimented with by consumers without forking the stack. |
@yob @lox Any updates on this? I'd be happy to rebase/update our branch and open a new PR if that helps. |
We're using something similar for node since the massive quantity of files really drags down the default config. We've seen NVMe instances with soft RAID0 hit over 70k IOPs so there's definitely a boost here. If I remember/have some time I'll post what we have (we've been using it over a year now) I think our main difference is mounting NVMe to /media/ephemeral then bind mounting:
|
@sj26 @pda Pinging you since I saw you were looking at #790 which fall into the same category. Let me know what you think of this and if I can help in getting it merged 🙏 |
Hi all, Just saw some activity on this today and I realised I never prepared the PR I offered in my last message. I spent some time last week updating another of our stack to nvme storage and that went quite well. @keithduncan let me know if you need any help on this. I have a bit more time to spend on buildkite, so I'm happy to create a PR from our fork as I offered last time (and didn't do 😅 ) |
22bda9d
to
73814fa
Compare
Hi @ouranos, I think I’ve incorporated all the patches mentioned here and am testing this on Linux now. If you spot anything missing from this implementation compared to yours let me know and I can look at incorporating it 😄 One area I haven’t looked into yet is adding this to our Windows AMI, and is something I’m interested in adding. Pointers in this area would be appreciated! |
Great work @keithduncan! 🙌
I've compared my branch with what is now in I have an isolated stack & set of pipelines for testing so I'll test the new stack and report. The branch I'm using is branched off
Unfortunately, we're only using Linux AMI so I won't be able to help much with this. |
Since lots of instances seem to have fancy fast instance storage now, this mounts whatever is available at
/ephemeral
and uses that for docker storage.Closes #464.