Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pulumi thinks it needs to replace ECS service and task definition even though no changes #23

Closed
kennyjwilli opened this issue Sep 27, 2018 · 22 comments
Assignees
Labels
4.x.x customer/feedback Feedback from customers kind/enhancement Improvements or new features resolution/fixed This issue was fixed
Milestone

Comments

@kennyjwilli
Copy link

Every time I switch computers and run pulumi update, Pulumi thinks it needs to replace my ECS task definition and update the service even though nothing has changed. The only thing that is different in the containerDefinition is the sha256 for IMAGE_DIGEST in the environment list.

From @lukehoban on Slack:

Yes - we currently inject the docker image ID as the IMAGE_DIGEST and it may be that docker build will produce different IDs for the same sources on different machines (perhaps even if there are difference docker versions?). We are looking to making some changes to how we lock the digest version - and this is a factor we'll want to take into consideration.

@lukehoban lukehoban self-assigned this Oct 1, 2018
@lukehoban lukehoban added this to the 0.18 milestone Oct 1, 2018
@lukehoban lukehoban assigned hausdorff and unassigned lukehoban Oct 4, 2018
@CyrusNajmabadi
Copy link
Contributor

@kennyjwilli Do you have a repro for this? I'm not seeing this. Is this a multi-machine scenario?
@lukehoban can you fill me in on your thoughts about:

We are looking to making some changes to how we lock the digest version - and this is a factor we'll want to take into consideration.

@kennyjwilli
Copy link
Author

Not sure what exactly you mean by multi-machine scenario. This specific ticket is about running pulumi update on two different computers and Pulumi thinking it needs to replace the ECS task.

@CyrusNajmabadi
Copy link
Contributor

Not sure what exactly you mean by multi-machine scenario. This specific ticket is about running pulumi update on two different computers and Pulumi thinking it needs to replace the ECS task.

Yup! That's just what i wanted to verify for certain. I'll wait to hear back from Luke about his thoughts on how we can be resilient to this.

@joeduffy
Copy link
Member

@CyrusNajmabadi I guess I would love to know first whether Docker builds on different machines necessarily imply different SHA hashes? I would have assumed "no", that identical build across machines -- provided the contents are identical -- would end up with identical hashes.

If that's true, it could be that in this specific case, there's some machine-specific info making its way into the container image somehow.

@hausdorff
Copy link

IIRC the situation is thus:

  • It is nice to decouple image tag and unique containerID (which is a SHA). This allows users to write image("nginx:alpine") and then during preview see whether the SHA underneath that tag has changed. This gives you a very high degree of reproducible deployments, since you know precisely what container you're deploying. (This comes from the work we did on ksonnet.)
  • The problem is, the container ID seems to be granted by the registry. So in ksonnet we'd just ask the registry "what's the ID/SHA for this image tag"? And what we'd get back is the SHA we'd use to precisely identify the version of the container we want to run.
  • Here we're taking the SHA from the Docker daemon, which seems to differ per machine.
  • Worse, if we wait until we push to resolve the container ID, we will conservatively report everything needs to be re-done in preview, since we can't know the container ID ahead of time.

@ericrudder
Copy link
Member

ericrudder commented Oct 15, 2018 via email

@joeduffy
Copy link
Member

Here we're taking the SHA from the Docker daemon, which seems to differ per machine.

Why would it differ per machine?

If it's a timestamp, for example, than it's actually a different image...

@hausdorff
Copy link

I don't think we know why it differs per machine. Timestamp seems likely, but at least I have not dug in. I do think from this information we can conclude that we have to use a different mechanism to identify whether a container is unique, especially since we are not aware of any published guarantees about what this SHA is generated from.

@joeduffy
Copy link
Member

I'd love to see what someone thought it would be produced from before assuming we can't depend on it.

@CyrusNajmabadi
Copy link
Contributor

@kennyjwilli I wasn't able to actually repro this myself. I'm wondering if this is something particular to your stack. Would it be possible for you to share the docker-file+build-folder with me (and also the code you use to create the service)? I'd like to see if it's something in particular about that docker setup. Thanks!

@CyrusNajmabadi
Copy link
Contributor

@kennyjwilli Also, what version of docker are you using on these boxes? Thanks!

@CyrusNajmabadi
Copy link
Contributor

I've been able to repro this, and have traced it down to a docker design decision documented here: https://github.com/moby/moby/blob/master/image/spec/v1.md

Specifically, when docker produces the images for layers, they embed a "created" timestamp in the metadata for that image. When producing the final image, the informatino about this is contained in 'json' files that are then eventually tarred up. That final tar is hashed, and will then be different for every different machine you run in.

I'm looking around to see if there's any way to avoid this. Some way, perhaps to force a specific date for docker to use here. Absent that, this may just be how docker works, and it may be hte case that with/without pulumi you would just be experiencing this no matter what.

@CyrusNajmabadi
Copy link
Contributor

Ok. I spelunked through the docker code, and i couldn't find any way to avoid this. Furthermore, i'm virtually certain that if you were doing this manually (i.e. without pulumi), you'd be running into this.

Once thing you can do to try to help avoid this is to clear your docker cache on one of your machines, then export your docker image from one and import it on hte other. You can use docker save and docker load to do this.

If you end up doing this, i believe then that you'll have the right images on both machines that docker will reuse, without it then wanting to create new images that it will embed timestamps into.

@CyrusNajmabadi
Copy link
Contributor

Closing out. Note: @kennyjwilli asked if there was any way for pulumi to pull down the built images that had been published. I believe that the 'cacheFrom' property here is intended to help wiht that: https://github.com/pulumi/pulumi-cloud/blob/d8315b6ff7de0e76ad8aa7c4195335493b199988/api/service.ts#L161

However, i'm personally unfamiliar with how it works. @pgavlin (who is on vacation right now) may be able to help guide you through using this. For now, do you want to try passing along 'true' to that value to see if it helps out?

@joeduffy
Copy link
Member

This doesn't seem like a satisfactory outcome. I can imagine this is going to be a common issue for anybody trying to do CD of Dockerized services with Pulumi.

@CyrusNajmabadi @lukehoban @hausdorff Thoughts on what we can do here?

@joeduffy joeduffy reopened this Oct 25, 2018
@joeduffy joeduffy modified the milestones: 0.18, 0.19 Oct 25, 2018
@lukehoban
Copy link
Contributor

From our last discussion on this, the path forward on several related issues like this was to do two things:

  1. Use an Archive to track changes to the sources of the Docker build folder (and Dockerfile is outside that folder) within the Pulumi resource model.
  2. Move docker.Image over to being a CustomResource that can fully participate in the Pulumi resource dependency graph so that it can know to only re-build and re-push when there are changes in the Archive hash.

For (2), this could be accomplished either with a dynamic provider, or via moving this whole package over to be a true Pulumi Provider written in Go.

Short of doing that, we could not think of any robust way to use docker itself to reliably handle these issues.

Relying on Archive hash semantics instead of Docker build cacheing is a little worrying, just because it's a different semantics. But it should be a conservative additional layer of "caching", and relying on docker build cacheing already provides limited guarantees on if/when layers will get re-built even if the rebuild may cause different contents to be created (timestamps in builds, different bits from npm install, etc.).

So we have what we think is a path to addressing this class of issues. But it will be a pretty significant overhaul of this library. And the right thing if we go this direction is probably to move to a real provider - which would be a complete re-write.

@CyrusNajmabadi
Copy link
Contributor

This doesn't seem like a satisfactory outcome. I can imagine this is going to be a common issue for anybody trying to do CD of Dockerized services with Pulumi.

@joeduffy This appears to be an issue for anyone doing CD of dockerized services, regardless of if they're using pulumi or not.

As luke mentions, we discussed an alternative approach here. But @hausdorff was tasked with taking htat on, as he has the most context on this space, and on doing a revamp to a dynamic provider based approach. I'm going to assign this over to him, unless there's already another bug tracking this work (@hausdorff , you mentioned you were going to create one to track the results of our convo?)

@lukehoban lukehoban added customer/feedback Feedback from customers priority/P1 labels Nov 16, 2018
@lukehoban lukehoban modified the milestones: 0.20, 0.21 Jan 10, 2019
@lukehoban
Copy link
Contributor

This will require a more or less complete overhaul of this library per #23 (comment) - and we haven't started work on it - so it won't get done in M21. This remains a very high priority issue that we will need to find time to prioritize.

@lukehoban lukehoban modified the milestones: 0.21, 0.22 Mar 9, 2019
@lukehoban lukehoban modified the milestones: 0.22, 0.23 Apr 1, 2019
@hausdorff hausdorff removed this from the 0.23 milestone May 6, 2019
@ameier38
Copy link

@lukehoban I think an interesting benefit, if possible, of using an Archive for the Docker context would be to include files that are in different directories. For my use case I have a directory called protos in which I store protobuf definitions and then generate the code stubs using Uber's prototool. In order to build my services I first have to copy these stubs into the service directory in order to build the Docker image. I think it would be nice feature to be able to track when the generated stubs have changed and automatically update the build context to keep the service and stubs in sync. We can't currently mount a volume during the build which would also solve this for me.

Also, having the Docker image show a change when running pulumi preview would be really nice. Sometimes I am just building an image in a stack and exporting the image name and I can't see the change until I run a full update and view the outputs. Edit: this works in the latest version 👍

@Blitz2145
Copy link

I've been able to repro this, and have traced it down to a docker design decision documented here: https://github.com/moby/moby/blob/master/image/spec/v1.md

Specifically, when docker produces the images for layers, they embed a "created" timestamp in the metadata for that image. When producing the final image, the informatino about this is contained in 'json' files that are then eventually tarred up. That final tar is hashed, and will then be different for every different machine you run in.

I'm looking around to see if there's any way to avoid this. Some way, perhaps to force a specific date for docker to use here. Absent that, this may just be how docker works, and it may be hte case that with/without pulumi you would just be experiencing this no matter what.

@CyrusNajmabadi Maybe in buildkit the new docker image builder, they will take a PR to put in reproducible timestamps (some discussion here moby/buildkit#1058) which might be a route to tackle spurious image builds, rather than going the archive route.

@AaronFriel
Copy link
Contributor

In the new implementation of the Docker Image resource in v4, a new image is not built unless the provider detects a change in the build context. See our blog post for more info: https://www.pulumi.com/blog/build-images-50x-faster-docker-v4/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4.x.x customer/feedback Feedback from customers kind/enhancement Improvements or new features resolution/fixed This issue was fixed
Projects
None yet
Development

No branches or pull requests