-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Customize builds for different platforms #166
Conversation
c64be33
to
ca81180
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall architecture LGTM, I think the simplification of the directory structure is great.
I really think we should do something about that Makefile though as it is getting out of hand.
8442167
to
48b3878
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few nits.
I think it would be good to add a README section on the various gradle tasks that can be executed (Also how to list the existing tasks).
d06721d
to
170c095
Compare
70fc794
to
cf4f5c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The build system for this repository has continually gotten more complicated over time, and this change is continuing that trend.. I think as a team (i.e. not necessarily you, or in this change specifically) we should take some time to write up some documentation about the details of how the build works; e.g. some sort of block diagram, or flow chart, or something would be good.
If somebody unfamiliar is tasked to debug some failure, or make changes to how the build works, it'll probably be difficult to do that without more documentation than what we currently have.
Further, I think we might want to think about the entire architecture of the build, now that our requirements have been fleshed out a bit more. I feel like we don't get a lot of benefit by using live-build, and it might only serve to make things more complicated at this point.
Additionally, I think the upgrade image generation logic is more complicated with this design, so we might have more motivation to look into generating the upgrade image first, and then generate the VM artifacts from that upgrade image (while also using it for upgrade). We've briefly discussed building the initial VM artifacts from the upgrade image, rather than build the upgrade image from the VM artifacts, but haven't looked into what's required to implement that in detail. If we could do that, I think it it'd simplify things, and make initial install and upgrade more similar w.r.t. implementation logic.
With all that said, this looks reasonable to me. My only blocking concerns are about the changes to the upgrade scripts, and about ensuring the packages installed on upgrade match the packages that get installed in the VM artifacts.
On another subject, what is our plan for distributing migration images? Will we have a different migration images for each platform? which is exactly what we wanted to avoid for upgrade; but perhaps it's OK to do this for migration, since that's a one time event?
.travis.yml
Outdated
@@ -11,12 +11,12 @@ services: | |||
- docker | |||
|
|||
env: | |||
- TARGET=ansiblecheck | |||
- TARGET=ansibleCheck | |||
- TARGET=shellcheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shellCheck
to match other targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
built by invoking the gradle task for the variant and platform desired. | ||
The task name has the form 'build\<Variant\>\<Platform\>'. For instance, | ||
the task to build the 'internal-minimal' variant for KVM is | ||
'buildInternalMinimalKvm': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's not a big deal, but the inconsistency between the variant names, platform names, and the task names is a little weird to me. e.g. we have to translate internal-minimal
to InternalMinimal
, and kvm
to Kvm
.
this means, if we have a variable like VARIANT=internal-minimal
and PLATFORM=kvm
in a script, we can't simply do something like gradle build$VARIANT$PLATFORM
.
if this is the way we want to go, I'm fine with it, but just something that I wanted to mention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose this way because it matches the normal Gradle convention for naming tasks. It would be a bit of a pain to handle in a bash script, but the places where we currently invoke these tasks via automation is in Jenkins jobs, where we have pretty simple helpers for converting between hyphens and camelCase.
README.md
Outdated
an upgrade image for the internal-minimal variant is | ||
'buildUpgradeImageInternalMinimal': | ||
|
||
$ PLATFORMS='kvm aws' ./scripts/docker-run.sh gradle buildUpgradeImageInternalMinimal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't gotten to looking at the upgrade scripts yet, or how the upgrade image is generated.. but since an upgrade image may not support all platforms, I'm wondering if/how we can quickly detect when a given upgrade image doesn't support a given platform..
e.g. we have a version.info
file in the upgrade image that we could potentially augment to include the platforms that upgrade image supports.. then verify that list of supported platforms with the platform being used during upgrade.. and throw an error early when attempting to upgrade with an image that doesn't support the platform being used.
I also wonder if allowing upgrade images that don't support all platforms is a good thing or a bad thing.. I presume that we do this to allow faster builds in some circumstances, but we'll have to weigh that convenience with the possibility that we might accidentally build images for the wrong platforms (either due to bugs, folks not knowing how to run the build correctly, etc), and then not notice the mistake until we try to run the upgrade (and it fails).
I'm OK with this as is, but just making notes as I look through this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can certainly add something to version.info
. At that point the image will be unpacked though, so maybe we could also just query to see which versions of delphix-entire-
are in the apt repo. Either way though, I agree it would be good to have an explicit check for this failure mode so that we can give users a good error message explaining why they have hit this error and how they should proceed.
I presume that we do this to allow faster builds in some circumstances, but we'll have to weigh that convenience with the possibility that we might accidentally build images for the wrong platforms
Faster builds and smaller upgrade images. Yes, there certainly is a trade off, but I think it's necessary for now unless we want to more or less give up on running the build outside of the jenkins job; running live build five times in a row to build the uprade image for a single variant is just too painful, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #204 for this follow-up work
scripts/docker-run.sh
Outdated
@@ -56,6 +56,8 @@ $DOCKER_RUN --rm \ | |||
--env AWS_SECRET_ACCESS_KEY \ | |||
--env DELPHIX_SIGNATURE_URL \ | |||
--env DELPHIX_SIGNATURE_TOKEN \ | |||
--env PLATFORMS \ | |||
--env AWS_S3_URI_LIVEBUILD_ARTIFACTS \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this a few lines up, so it's alongside the other "AWS_S3_URI" variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
scripts/docker-run.sh
Outdated
@@ -56,6 +56,8 @@ $DOCKER_RUN --rm \ | |||
--env AWS_SECRET_ACCESS_KEY \ | |||
--env DELPHIX_SIGNATURE_URL \ | |||
--env DELPHIX_SIGNATURE_TOKEN \ | |||
--env PLATFORMS \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a big deal, but we could prefix this with "DELPHIX_" like we've done with two other variable names.. I'm fine either way, just mentioning the slight inconsistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
# to install two packages which provide delphix-platform, which will | ||
# fail because the two packages conflict with each other. | ||
# | ||
apt_get install -y "delphix-platform-$platform" || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is concerning...
If we leave this out, why doesn't installing delphix-entire-$platform
install delphix-platform-$platform
? doesn't delphix-entire-$platform
depend on delphix-platform-$platform
?
What about running apt_get install -y "delphix-platform-$platform delphix-entire-$platform
if we absolutely have to explicitly specify delphix-platform-$platform
?
Also, if we do have to be explicit about installing delphix-platform-$platform
, I'm concerned that we're not specifying the version that'll be installed, whereas we do specify the version with delphix-entire-$platform
.
I'd like to think about this some more, and try to avoid it if possible. At least, I'd like to outline the variations that we've tried, and document why we need to go this route vs. some other implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgallag88 I know we've talked about this and I agreed on the approach, I'd still really like to understand more why is apt getting confused by this.
Since delphix-entire-$platform
should depend on delphix-platform-$platform
and delphix-platform-$platform
already provides delphix-platform
, I'm really surprised that apt doesn't take it into consideration when looking for the dependencies of delphix-virtualization
.
Let me know if you have a system in that state, I'd really love to take a look at it and understand better what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we leave this out, why doesn't installing delphix-entire-$platform install delphix-platform-$platform? doesn't delphix-entire-$platform depend on delphix-platform-$platform?
So it does install delphix-platform-$platform, but it installs delphix-platform-$someOtherPlatform too. Say we are upgrading a vm on AWS. I suspect what happens is that apt first sees the delphix-virtualization
dependency on delphix-platform
. It also sees that there are mulitple versions of delphix-platform
available, none of which are yet installed. In this scenario, apt picks an arbitrary package that provides the virtual package to install, say delphix-platform-kvm
. It later sees that it needs to install the concrete package delphix-platform-aws
because that is a dependency of delphix-entire-aws
. It isn't smart enough to go back and realize that it no longer needs delphix-platform-kvm
, so it tries to install both, and they conflict with each other.
It's possible that apt_get install -y "delphix-platform-$platform delphix-entire-$platform
would work, but I don't know how we could be sure whether it's guaranteed to work or is just working accidentally unless we understood all of the details about how apt resolves transitive dependencies where virtual packages are involved (for which I haven't found any documentation).
Another way to resolve this would be to remove the explicit dependency on delphix-platform
from delphix-virtualization
. What I like about the current strategy though is that it makes the order in which the packages are installed more similar to the order used in live-build in appliance-build. I think this is good from a consistency perspective, since in general installing the same packages in different orders can result in different end-states. However, there could be other ways to make appliance build more similar to a not-in-place upgrade, maybe by getting rid of live-build, as you mentioned elsewhere.
@pzakha I can get a VM set up shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% against installing delphix-platform-$platform
prior to installing delphix-entire-$platform
, but it does set a precedence that I'd like to avoid if we can.. since, this means if we ever want to create some other platform specific package(s), upgrade will need explicit knowledge about them. e.g. if we create some new package(s) foo-aws
, foo-esx
, etc. upgrade will have to know how to install the correct one, right?
Ultimately, I'd like to see if we can understand this behavior prior to coming to a decision.
What I like about the current strategy though is that it makes the order in which the packages are installed more similar to the order used in live-build in appliance-build.
Yea, I agree with this in concept, but I'd argue the build is worse about this than upgrade. I'd rather see us revamp the build, to avoid the step-by-step implementation that it has now. If we don't change how the build works, though, I think this is a good argument for taking this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on our offline findings and discussion, I've expanded the comment here substantially to explain exactly what goes wrong if we install delphix-entire- directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, I think the upgrade image generation logic is more complicated with this design, so we might have more motivation to look into generating the upgrade image first, and then generate the VM artifacts from that upgrade image (while also using it for upgrade)
That seems like an idea worth investigating. I like the idea of using the exact same process for the initial vm creation and the not-in-place upgrade because it makes it more likely that we will produce the same output in both cases.
On another subject, what is our plan for distributing migration images? Will we have a different migration images for each platform? which is exactly what we wanted to avoid for upgrade; but perhaps it's OK to do this for migration, since that's a one time event?
Yes, that's the plan AFAIK. Unlike with upgrade images, the size of a migration image is proportional to the number of platforms the image supports, so they would get very unwieldy if we had a single image.
echo "No live-build artifacts found for this variant" >&2 | ||
exit 1 | ||
fi | ||
for deb_tarball in "$LIVE_BUILD_OUTPUT_DIR/$APPLIANCE_VARIANT"*.debs.tar.gz; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, due to how we generate delphix-entire-$PLATFORM, the upgrade of a kvm system will still get foo=1.0 and the aws system will get foo=1.1.. is that correct?
Yeah. I haven't tested that scenario, but I think that's what will happen.
aptly publish repo -skip-signing upgrade-repository | ||
|
||
# Include version information about this image. | ||
VERSION=$(dpkg -f "$(find debs/ -name 'delphix-entire-*' | head -n 1)" version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version of delphix-entire is typically taken from delphix-virtualization, and the version of delphix-virtualization will be the same in each case. If the delphix-entire packages are all built on the same machine, then they all use the same ancillary repository. If they were built via Jenkins in parallel, the Jenkins job ensures that the same version of delphix-virtualization is passed to each sub job.
If delphix-virtualization isn't installed, then the versions could potentially be different, but that only applies to the internal-minimal variant, which didn't seem like too much of an issue to me, since we aren't going to do any real upgrade testing with that variant.
I can add a comment to that effect if that all seems reasonable.
# to install two packages which provide delphix-platform, which will | ||
# fail because the two packages conflict with each other. | ||
# | ||
apt_get install -y "delphix-platform-$platform" || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we leave this out, why doesn't installing delphix-entire-$platform install delphix-platform-$platform? doesn't delphix-entire-$platform depend on delphix-platform-$platform?
So it does install delphix-platform-$platform, but it installs delphix-platform-$someOtherPlatform too. Say we are upgrading a vm on AWS. I suspect what happens is that apt first sees the delphix-virtualization
dependency on delphix-platform
. It also sees that there are mulitple versions of delphix-platform
available, none of which are yet installed. In this scenario, apt picks an arbitrary package that provides the virtual package to install, say delphix-platform-kvm
. It later sees that it needs to install the concrete package delphix-platform-aws
because that is a dependency of delphix-entire-aws
. It isn't smart enough to go back and realize that it no longer needs delphix-platform-kvm
, so it tries to install both, and they conflict with each other.
It's possible that apt_get install -y "delphix-platform-$platform delphix-entire-$platform
would work, but I don't know how we could be sure whether it's guaranteed to work or is just working accidentally unless we understood all of the details about how apt resolves transitive dependencies where virtual packages are involved (for which I haven't found any documentation).
Another way to resolve this would be to remove the explicit dependency on delphix-platform
from delphix-virtualization
. What I like about the current strategy though is that it makes the order in which the packages are installed more similar to the order used in live-build in appliance-build. I think this is good from a consistency perspective, since in general installing the same packages in different orders can result in different end-states. However, there could be other ways to make appliance build more similar to a not-in-place upgrade, maybe by getting rid of live-build, as you mentioned elsewhere.
@pzakha I can get a VM set up shortly.
README.md
Outdated
an upgrade image for the internal-minimal variant is | ||
'buildUpgradeImageInternalMinimal': | ||
|
||
$ PLATFORMS='kvm aws' ./scripts/docker-run.sh gradle buildUpgradeImageInternalMinimal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can certainly add something to version.info
. At that point the image will be unpacked though, so maybe we could also just query to see which versions of delphix-entire-
are in the apt repo. Either way though, I agree it would be good to have an explicit check for this failure mode so that we can give users a good error message explaining why they have hit this error and how they should proceed.
I presume that we do this to allow faster builds in some circumstances, but we'll have to weigh that convenience with the possibility that we might accidentally build images for the wrong platforms
Faster builds and smaller upgrade images. Yes, there certainly is a trade off, but I think it's necessary for now unless we want to more or less give up on running the build outside of the jenkins job; running live build five times in a row to build the uprade image for a single variant is just too painful, IMO.
✌️ jgallag88 can now approve this pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there's still some more work to be done, but I'm going to proactively approve this based on our discussions; as far as I'm concerned, feel free to update and land this when you're confident with it.
bors delegate+
This change provides the ability to build versions of the appliance customized for different platforms (hypervisors and cloud providers). This is done by installing a different versions of the delphix-platform and delphix-kernel packages depending on which platform we are building for. Since we only want to have a single upgrade image per variant, this change also adds a second stage to the build which combines the live-build output for multiple platform versions of the same variant into a single upgrade tarball. The live-build stage of the build is now run by invoking 'gradle' with a a target which is a combination of variant and platform, e.g. 'gradle buildInternalDevEsx'. The second stage of the build is run by invoking 'gradle' with a variant as a target, e.g. 'gradle buildUpgradeImageInternalDev'. When the second stage is run, an environment variable 'AWS_S3_URI_LIVEBUILD_ARTIFACTS' can be passed. If it is used, previously built live-build artifacts will be downloaded from the provided S3 URIs, and placed in live-build/build/artifacts as if they had been built locally. If it is not used, live-build will be invoked for each of the hypervisors specified in the 'DELPHIX_PLATFORMS' environment variable. A couple notes about the implementation: 1. This change replaces the Make build with a Gradle one. The build logic needed for this change was difficult to express using Make and resulted in a Makefile which was very difficult to understand. The use of Gradle make handling this build logic more straightforward and also made it possible to add better support for incremental builds. 2. This change removes the idea of the 'base' live-build variant. The base variant contains the kernel, and because the kernel differs between hypervisors, it cannot be shared between different hypervisor builds. It would be possible to have a different version of the base variant per hypervisor, and share that between different variants built for the same hypervisor. However, this likely isn't worth the effort because it doesn't help in either of the two most common use cases: - Building via a jenkins job: when building via Jenkins, each variant will now be built via a sub-job running on its own build VM, so the base would be rebuilt for each sub-job anyway. - Developers iterating on changes on personal build VMs: in this case developers are most likely to be building a single variant, in which case the 'base' variant would be less likely to be re-used. 3. We no longer do the live-build in place (that is, directly in live-build/variant/<variant>/). Now that we have multiple builds per variant, we need to make sure that intermediate live-build artifacts from one build are not incorrectly re-used in the next build of the same variant, which might be for a different hypervisor. The simplest way to accomplish this is just to do the live-build in a throw-away directory. In light of these two changes, some of current layout of the repository no longer makes sense, so this change re-arranges a number of files in the repo, particularly in the live-build/ directory.
c7bf5a3
to
ebd884a
Compare
bors r+ |
166: Customize builds for different platforms r=jgallag88 a=jgallag88 This change provides the ability to build versions of the appliance customized for different platforms (hypervisors and cloud providers). This is done by installing a different versions of the delphix-platform and delphix-kernel packages depending on which platform we are building for. Since we only want to have a single upgrade image per variant, this change also adds a second stage to the build which combines the live-build output for multiple platform versions of the same variant into a single upgrade tarball. The live-build stage of the build is now run by invoking 'gradle' with a a target which is a combination of variant and platform, e.g. `gradle buildInternalDevEsx`. The second stage of the build is run by invoking 'gradle' with a variant as a target, e.g. `gradle buildUpgradeImageInternalDev`. When the second stage is run, an environment variable 'AWS_S3_URI_LIVEBUILD_ARTIFACTS' can be passed. If it is used, previously built live-build artifacts will be downloaded from the provided S3 URIs, and placed in `live-build/build/artifacts` as if they had been built locally. If it is not used, live-build will be invoked for each of the hypervisors specified in the 'DELPHIX_PLATFORMS' environment variable. A couple notes about the implementation: 1. This change replaces the Make build with a Gradle one. The build logic needed for this change was difficult to express using Make and resulted in a Makefile which was very difficult to understand. The use of Gradle make handling this build logic more straightforward and also made it possible to add better support for incremental builds. 2. This change removes the idea of the 'base' live-build variant. The base variant contains the kernel, and because the kernel differs between hypervisors, it cannot be shared between different hypervisor builds. It would be possible to have a different version of the base variant per hypervisor, and share that between different variants built for the same hypervisor. However, this likely isn't worth the effort because it doesn't help in either of the two most common use cases: - Building via a jenkins job: when building via Jenkins, each variant will now be built via a sub-job running on its own build VM, so the base would be rebuilt for each sub-job anyway. - Developers iterating on changes on personal build VMs: in this case developers are most likely to be building a single variant, in which case the 'base' variant would be less likely to be re-used. 3. We no longer do the live-build in place (that is, directly in `live-build/variant/<variant>/`). Now that we have multiple builds per variant, we need to make sure that intermediate live-build artifacts from one build are not incorrectly re-used in the next build of the same variant, which might be for a different hypervisor. The simplest way to accomplish this is just to do the live-build in a throw-away directory. In light of these two changes, some of current layout of the repository no longer makes sense, so this change re-arranges a number of files in the repo, particularly in the `live-build/` directory. Co-authored-by: John Gallagher <[email protected]>
Build succeeded
|
This change provides the ability to build versions of the appliance
customized for different platforms (hypervisors and cloud providers).
This is done by installing a different versions of the delphix-platform
and delphix-kernel packages depending on which platform we are
building for. Since we only want to have a single upgrade image per
variant, this change also adds a second stage to the build which
combines the live-build output for multiple platform versions of the
same variant into a single upgrade tarball.
The live-build stage of the build is now run by invoking 'gradle' with a
a target which is a combination of variant and platform, e.g.
gradle buildInternalDevEsx
.The second stage of the build is run by invoking 'gradle' with a variant
as a target, e.g.
gradle buildUpgradeImageInternalDev
. When the secondstage is run, an environment variable 'AWS_S3_URI_LIVEBUILD_ARTIFACTS'
can be passed. If it is used, previously built live-build artifacts will
be downloaded from the provided S3 URIs, and placed in
live-build/build/artifacts
as if they had been built locally. If it isnot used, live-build will be invoked for each of the hypervisors
specified in the 'DELPHIX_PLATFORMS' environment variable.
A couple notes about the implementation:
needed for this change was difficult to express using Make and
resulted in a Makefile which was very difficult to understand. The
use of Gradle make handling this build logic more straightforward
and also made it possible to add better support for incremental
builds.
base variant contains the kernel, and because the kernel differs
between hypervisors, it cannot be shared between different hypervisor
builds. It would be possible to have a different version of the base variant
per hypervisor, and share that between different variants built for
the same hypervisor. However, this likely isn't worth the effort
because it doesn't help in either of the two most common use cases:
variant will now be built via a sub-job running on its own build
VM, so the base would be rebuilt for each sub-job anyway.
case developers are most likely to be building a single variant,
in which case the 'base' variant would be less likely to be
re-used.
live-build/variant/<variant>/
). Now that we have multiple builds pervariant, we need to make sure that intermediate live-build
artifacts from one build are not incorrectly re-used in the next
build of the same variant, which might be for a different
hypervisor. The simplest way to accomplish this is just to do the
live-build in a throw-away directory.
In light of these two changes, some of current layout of the
repository no longer makes sense, so this change re-arranges a number
of files in the repo, particularly in the
live-build/
directory.