-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graph deletes are non-atomic, db refs deleted without deleting on-disk entities #6354
Comments
It seems that it needs good old mutex :) |
The right solution, I believe, is support for a transaction log in the graph driver. Other databases and filesystems solve this with transaction logs and our graph driver is no different. I am prepared to begin working on this. Pinging maintainers: @shykes @vieux @crosbymichael |
Wouldn't it be easier to occasionally scan the graph for unreferenced dirs and just remove them? That was the thinking when we implemented this (ie "worst case we leave an unremoved dir after a crash. we can garbage collect it later"). |
@shykes Fair point. You're suggesting we 'fsck' rather than having a filesystem journal. History shows where we've gone with that as an industry, but we also have far fewer updates than typical filesystems. I'd accept the argument that we should have both, but that 'fsck' gives us the best bang-for-our-time at the moment. Also, I'll note that this can happen on create as well, although I haven't as throughly investigated how hard that is to reproduce and what the impacts would be. |
Yeah that's my argument - fsck as a pragmatic stopgap, and harden it later when it makes sense to make the invest. To my knowledge we haven't ever received a real-world issue that we could track back to this, so that's a datapoint :) Re: create, we should double-check, but from memory it's the same thing in reverse. Worst case, we crash while creating a layer (which includes populating its content from a pull), and before referencing it. The result is the same. |
It seems all Docker systems I run eventually run out of disk space and This problem was crushing for me when I attempted to run Docker in/for CI. I have also on occasion had layers that could not be, under any
|
Mentioned same yesterday. When suddenly after an hour of experiments with building a test environment (for some of my stuff) found that I ran out of disk space.... /var/lib/docker/vfs was full while i had neither containers nor images (removed them all with rm and rmi). |
This issue is still present, and occurs on multiple host configurations including Ubuntu 14.04 with LVM, UBuntu 14.04 w/ ext4, boot2docker default ext4, etc. Cleaning up the leftover files requires parsing the list of valid containers and then manually invoking "/bin/rm -rf" on the stale directories. Is this bug going to be addressed in any planned work queue? |
@y3ddet could you share how exactly you go about recognizing stale / zombie layers and removing those? based on output of 'docker ps' / 'docker inspect' or so? Thinking of coming up with a watchdog script on cron for now until this issue is resolved |
@adamhadani I have been able to locate the 'dangling' contents directories with the following Ruby script (depends on [https://github.com/swipely/docker-api] which is available as Ruby Gem 'docker-api'):
|
@y3ddet got it, thanks. I created a similar Python version, in case becomes useful for anyone, achieves pretty much same results, for anyone who cannot introduce ruby dependencies to his environment: #!/usr/bin/env python
"""
Check all existing Docker containers for their mapped paths, and then purge any
zombie directories in docker's volumes directory which don't correspond to an
existing container.
"""
import logging
import os
import sys
from shutil import rmtree
import docker
DOCKER_VOLUMES_DIR = "/var/lib/docker/vfs/dir"
def get_immediate_subdirectories(a_dir):
return [os.path.join(a_dir, name) for name in os.listdir(a_dir)
if os.path.isdir(os.path.join(a_dir, name))]
def main():
logging.basicConfig(level=logging.INFO)
client = docker.Client()
valid_dirs = []
for container in client.containers(all=True):
volumes = client.inspect_container(container['Id'])['Volumes']
if not volumes:
continue
for _, real_path in volumes.iteritems():
if real_path.startswith(DOCKER_VOLUMES_DIR):
valid_dirs.append(real_path)
all_dirs = get_immediate_subdirectories(DOCKER_VOLUMES_DIR)
invalid_dirs = set(all_dirs).difference(valid_dirs)
logging.info("Purging %s dangling Docker volumes out of %s total volumes found.",
len(invalid_dirs), len(all_dirs))
for invalid_dir in invalid_dirs:
logging.info("Purging directory: %s", invalid_dir)
rmtree(invalid_dir)
logging.info("All done.")
if __name__ == "__main__":
sys.exit(main()) |
@crosbymichael - thoughts on adding a fsck util similar to the above to contrib? At least until we fix this in Docker proper with a transactional graph db? |
For volumes, there's this as well; https://github.com/cpuguy83/docker-volumes and of course this PR; #8484 |
@thaJeztah thanks, that PR looks pretty great, could be very easily used in some one-liner / cron for getting rid of dangling dirs among other things. any idea whats the status of that as far as inclusion in a future release? |
@adamhadani I have absolutely no idea, I hope soon! but in the mean time, the |
@thaJeztah got it, thanks again |
@adamhadani Thanks for the python script. Hoping a native solution to this problem lands soon... I'm frequently running out of space these days on my dev environment (which happens to be my laptop) :( |
See moby/moby#6354 (comment) All credit goes to https://github.com/adamhadani
@y3ddet Thanks for posting a solution. @adamhadani Thanks for the Python script. This rescued me from a very bad situation. |
Still isn't fixed.. (1.4.1 in Ubuntu 14.04) |
@adamhadani could you post this script to github gists? It still an issue and while it is, this script is extremely useful |
@FelikZ sure , there you go https://gist.github.com/adamhadani/807e68452b43636c2d5f |
I just came across the exact same problem and then discovered that there is the option |
@soupdiver There is no auto deletion of volumes at all. The only time a volume is deleted is if you explicitly |
@cpuguy83 Ah okay ... I ever thought this part from the docs |
@soupdiver if that's how it's written in the docs, I agree that's confusing. Can you add a link where in the docs you found that sentence? (Then I'll create a small PR to improve that) |
@thaJeztah It can be found in the user guide. https://docs.docker.com/userguide/dockervolumes/ |
@soupdiver thanks for pointing that out, I'll see if I can come up with a better description there! |
@thaJeztah Just for clarification: Is the current behaviour that volume directories are not deleted the normal behaviour since ever? I ever thought that they will be deleted after all references are gone and I barely remember that the docs explicitly said this when I started using Docker a time ago. But I'm not 100% sure about this |
@soupdiver yes, this is normal behavior and "by design". Docker should never remove your data unless you explicitly tell it to (data is important). There are some difficulties with this (i.e., "orphaned" volumes), which is being worked on in #8484 I just created a PR that (hopefully) makes this a bit more clear, you can find it here: #10757 |
@thaJeztah ok thanks! The PR really explains the behaviour more clearly |
A comment in moby#6354 (comment) brought to light that the "Managing Data in containers" section contained an incorrect (or confusing) line; "Volumes persist until no containers use them" Which implies that volumes are automatically removed if they are no longer referenced by a container. This pull-request attempts to add some information explaining that volumes are never automatically removed by Docker and adds some extra hints on working with data volumes. Signed-off-by: Sebastiaan van Stijn <[email protected]>
We've run into this problem on our jenkins ci server as well (docker 1.5). I think the discussion about volumes is actually a tangent from the original issue. The issue is not really about volumes, it's about image layers being left on-disk after a For now we're able to identify and reclaim this lost space using this bash: #!/bin/bash
set -eu
cd /var/lib/docker/aufs/diff
for image in $(du -s * 2> /dev/null | sort -nr | grep -v -- '-init' | cut -f 2); do
echo "Inspecting or removing $image"
docker inspect $image > /dev/null || rm -r $image;
done |
+kind/bug |
/cc @dalanlan |
@adamhadani very nice job, thank you |
@dnephin We had the same issue. But be careful: in Docker 1.10, these file names does not match with containers or images, so your script will remove everything from the diff folder. I could purge everything from every docker folder manually, but I don't know, how can we do this cleanup using docker >= 1.10. |
this bug is outdate? anyone can handle it. |
I think it's duplicate of another bug from @cpuguy83 |
Given the activity level on this issue, I'm going to close it as it's either fixed, a duplicate or not a request anymore. If you think I'm mistaken, feel free to discuss it there 😉 |
In function graph.Delete,
The method
graph.idIndex.Delete(id)
is called beforegraph.driver.Delete(id)
It is non-atomic. If
graph.driver.Delete(id)
fails, thegraph.idIndex.Delete
has already occurred and the image is now apparently gone, but the extents may still exist on disk in whole or in part.The text was updated successfully, but these errors were encountered: