Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADD: method to minify containers gently and in-place #258

Merged
merged 23 commits into from
Apr 9, 2020

Conversation

kaczmarj
Copy link
Collaborator

@kaczmarj kaczmarj commented Jan 28, 2019

Gently? Yes, this command allows the user to choose which directories to prune, which makes the minified image still usable interactively (in most cases). The fmriprep image, for example, likely contains many files, let's say in /opt, that are never touched by any part of their software. This PR would allow one to minify the fmriprep container by removing files only in /opt that are not caught by reprozip. The rest of the container is untouched, so it will still be usable interactively and for other purposes. Of course, this does not offer the highest degree of minimization.

Example of use:

Start up the container. Be sure to add --security-opt=seccomp:unconfined or --cap-add SYS_PTRACE to the docker run call. Only the latter is necessary on the most recent Docker CE.

$ docker run --rm -it --cap-add SYS_PTRACE --name tominify imagename

In another terminal window, run ndminify, given the name of the running container, the commands you want to minify and the directories you want to prune.

$ cmd0="python runscript.py"
$ cmd1="python runotherscript.py"
$ ndminify --container tominify --dirs-to-prune /home /opt --commands "$cmd0" "$cmd1"

This will run the commands and then display all of the files that will be deleted in the container. BEWARE that data loss is possible. If you have mounted directories onto the container and try to prune those directories, the files within those mounted directories will be irreversibly REMOVED. Please exercise extreme caution when trying out this feature, and to be safe, if you need to mount directories, mount them as read-only.

If you choose to proceed, the files will be removed from the container. Then, create a new image using that container's minimized filesystem. The last line of the output of the command will explain how to do that. But in the process of creating the new image, the environment / metadata is lost. So for now, create the new image, and if you need environment variables set, I suggest creating a new Dockerfile that bootstraps the minified image and sets the appropriate variables.

The ndminify command-line is temporary to make debugging easier. It will end up being a sub-command of the main neurodocker command-line program.

cc: @gkiar

To try this out, install this branch of the project, and also install the docker-py Python package:

$ pip install --no-cache-dir https://github.com/kaczmarj/neurodocker/tarball/add/minify-gently
$ pip install --no-cache-dir docker
$ ndminify --help

@codecov-io
Copy link

codecov-io commented Jan 28, 2019

Codecov Report

Merging #258 into master will decrease coverage by 0.02%.
The diff coverage is 74.24%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #258      +/-   ##
==========================================
- Coverage   74.58%   74.55%   -0.03%     
==========================================
  Files          33       35       +2     
  Lines        1853     1985     +132     
  Branches      241      260      +19     
==========================================
+ Hits         1382     1480      +98     
- Misses        371      395      +24     
- Partials      100      110      +10     
Impacted Files Coverage Δ
neurodocker/reprozip/gentle/trace.py 68.75% <68.75%> (ø)
neurodocker/reprozip/gentle/tests/test_minify.py 88.88% <88.88%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 590d3f2...8c6f556. Read the comment docs.

@gkiar
Copy link

gkiar commented Jan 29, 2019

Hey @kaczmarj ! This looks awesome! Thanks so much!

Quick question/confirmation: For my container to be minified, is there any advantage to installing/not installing reprozip in the original Dockerfile? I imagine I can just install it in my running container prior to launching the minify tool, but I guess it might be packaged in the minified container anyways? I can always remove it manually, so long as I know where I install it, after the minification has been done, of course.

Will let you know how it goes :)

@gkiar
Copy link

gkiar commented Jan 29, 2019

Note 1: _trace.sh isn't currently included in the package, so I had to download it to the correct location manually.

Note 2: When running, I get the following error immediately as the tool is being launched:

[NEURODOCKER 2019-01-28 22:55:46,371 INFO]: + printf 'NEURODOCKER (in container): ERROR: reprozip trace command exited with non-zero code. Command: /tmp/reprozip-miniconda/bin/reprozip trace -d /tmp/neurodocker-reprozip-trace --dont-identify-packages --overwrite python3 preprocessing_pipeline.py /data/RocklandSample/ /data/RocklandSample/derivatives/ session --participant_label A00008326 --verbose'
+ exit 1
Traceback (most recent call last):
  File "/Users/greg/code/scratch/neurodocker/env/bin/ndminify", line 11, in <module>
    load_entry_point('neurodocker==0.4.4.dev0', 'console_scripts', 'ndminify')()
  File "/Users/greg/code/scratch/neurodocker/env/lib/python3.6/site-packages/neurodocker/reprozip/gentle/trace.py", line 138, in main
    trace_and_prune(container=args.container, commands=args.commands, directories_to_prune=args.dirs_to_prune)
  File "/Users/greg/code/scratch/neurodocker/env/lib/python3.6/site-packages/neurodocker/reprozip/gentle/trace.py", line 74, in trace_and_prune
    raise RuntimeError("Error: {}".format(log))
RuntimeError: Error: + printf 'NEURODOCKER (in container): ERROR: reprozip trace command exited with non-zero code. Command: /tmp/reprozip-miniconda/bin/reprozip trace -d /tmp/neurodocker-reprozip-trace --dont-identify-packages --overwrite python3 preprocessing_pipeline.py /data/RocklandSample/ /data/RocklandSample/derivatives/ session --participant_label A00008326 --verbose'
+ exit 1

However, when I run the command /tmp/reprozip-miniconda/bin/reprozip trace -d /tmp/neurodocker-reprozip-trace --dont-identify-packages --overwrite python3 preprocessing_pipeline.py /data/RocklandSample/ /data/RocklandSample/derivatives/ session --participant_label A00008326 --verbose in the container directly, it runs without a problem. Is there anywhere off-hand where I can quickly add a print statement to see the specific error the tool encounters?

@kaczmarj
Copy link
Collaborator Author

@gkiar -

For my container to be minified, is there any advantage to installing/not installing reprozip in the original Dockerfile

The script _trace.sh will install miniconda and reprozip (not adding either to $PATH), and will then run the trace on the commands you give it. You don't have to install reprozip yourself. With my most recent commit, the miniconda installation is removed as part of the pruning process, so it won't appear in the minified image.

Note 1: _trace.sh isn't currently included in the package, so I had to download it to the correct location manually.

Thanks for catching this! I fixed this in the setup.py. Please reinstall this branch with -U/--upgrade.

Note 2: When running, I get the following error immediately as the tool is being launched

Is there more output above the trace that you pasted? The exact error could be in that output. For reference, the code that runs _trace.sh and prints all its output is here. Can you paste the full output? I'm not sure why the command would work interactively but not as part of the script...

@gkiar
Copy link

gkiar commented Jan 29, 2019

1- noted; great, thanks!
2- :) it is now there
3- it didn't print anything else out, but please forgive my sleepy relative-path-using self for causing this one; it is running now and I'll check back on it in a couple hours to see if the job is finished, and let you know!

If this is all works, I think it'd be really nice (and you may have already done this) setup a .travis.yml config which lets users build their containers, run a demo analysis, minify, and then push the minified container automatically to Dockerhub. I'm happy to PR this template, if you want.

@gkiar
Copy link

gkiar commented Jan 29, 2019

Hey @kaczmarj - quick update. The execution completed, but in the prune stages it seemed to collapse a bit. The following is the error I got from ndminify.

[NEURODOCKER 2019-01-29 12:27:07,535 INFO]: Uploading usage statistics is currently disabled
Please help us by providing anonymous usage statistics; you can enable this
by running:
    reprozip usage_report --enable
If you do not want to see this message again, you can run:
    reprozip usage_report --disable
Nothing will be uploaded before you opt in.
[NEURODOCKER 2019-01-29 12:27:07,535 INFO]: Configuration file written in /tmp/neurodocker-reprozip-trace/config.yml
Edit that file then run the packer -- use 'reprozip pack -h' for help
Traceback (most recent call last):
  File "/Users/greg/code/scratch/neurodocker/env/bin/ndminify", line 11, in <module>
    load_entry_point('neurodocker==0.4.4.dev0', 'console_scripts', 'ndminify')()
  File "/Users/greg/code/scratch/neurodocker/env/lib/python3.6/site-packages/neurodocker/reprozip/gentle/trace.py", line 144, in main
    trace_and_prune(container=args.container, commands=args.commands, directories_to_prune=args.dirs_to_prune)
  File "/Users/greg/code/scratch/neurodocker/env/lib/python3.6/site-packages/neurodocker/reprozip/gentle/trace.py", line 84, in trace_and_prune
    raise RuntimeError("Failed: {}".format(result))
RuntimeError: Failed: Traceback (most recent call last):
  File "/tmp/_prune.py", line 93, in <module>
    main(yaml_file=args.config_file, directories_to_prune=args.dirs_to_prune)
  File "/tmp/_prune.py", line 51, in main
    print('\n'.join(map(str, sorted(files_to_remove))), file=f)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 15970-15971: ordinal not in range(128)
env (neurodocker) wpa073037 $

@kaczmarj
Copy link
Collaborator Author

@gkiar - which version of python are you using? it could be a unicode issue if you're using python2. if you're using python2, would you mind editing neurodocker/reprozip/gentle/trace.py by putting str = unicode somewhere towards the top of that script?

@kaczmarj
Copy link
Collaborator Author

@gkiar - actually I don't think that will work. i've pushed a potential fix, so if you could please reinstall.

i still suspect this is a python2 vs python3 issue. i wrote this with only python3 in mind. but please do let me know if you are in fact using python 3.

@gkiar
Copy link

gkiar commented Jan 29, 2019

Hey @kaczmarj - I am using Python 3.6.5, so don't think that switch is the issue?

@kaczmarj
Copy link
Collaborator Author

Hmmm... have you tried the version with my most recent commit?

@gkiar
Copy link

gkiar commented Jan 29, 2019

Missing bracket on .decode('utf-8') --

$ ndminify --container tominify --dirs-to-prune /usr/share  --commands "$cmd2"
Traceback (most recent call last):
  File "/Users/greg/code/scratch/neurodocker/env/bin/ndminify", line 11, in <module>
    load_entry_point('neurodocker==0.4.4.dev0', 'console_scripts', 'ndminify')()
  File "/Users/greg/code/scratch/neurodocker/env/lib/python3.6/site-packages/pkg_resources/__init__.py", line 487, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/Users/greg/code/scratch/neurodocker/env/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2728, in load_entry_point
    return ep.load()
  File "/Users/greg/code/scratch/neurodocker/env/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2346, in load
    return self.resolve()
  File "/Users/greg/code/scratch/neurodocker/env/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2352, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/Users/greg/code/scratch/neurodocker/env/lib/python3.6/site-packages/neurodocker/reprozip/gentle/trace.py", line 121
    result = result.decode'utf-8').split()
                                ^
SyntaxError: invalid syntax

@gkiar
Copy link

gkiar commented Jan 29, 2019

But after fixing the typo, I still had an error. I think the issue is actually an index issue with the list of libraries you're trying to remove?

Traceback (most recent call last):
  File "/Users/greg/code/scratch/neurodocker/env/bin/ndminify", line 11, in <module>
    load_entry_point('neurodocker==0.4.4.dev0', 'console_scripts', 'ndminify')()
  File "/Users/greg/code/scratch/neurodocker/env/lib/python3.6/site-packages/neurodocker/reprozip/gentle/trace.py", line 144, in main
    trace_and_prune(container=args.container, commands=args.commands, directories_to_prune=args.dirs_to_prune)
  File "/Users/greg/code/scratch/neurodocker/env/lib/python3.6/site-packages/neurodocker/reprozip/gentle/trace.py", line 84, in trace_and_prune
    raise RuntimeError("Failed: {}".format(result))
RuntimeError: Failed: Traceback (most recent call last):
  File "/tmp/_prune.py", line 93, in <module>
    main(yaml_file=args.config_file, directories_to_prune=args.dirs_to_prune)
  File "/tmp/_prune.py", line 51, in main
    print('\n'.join(map(str, sorted(files_to_remove))), file=f)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 15970-15971: ordinal not in range(128)

@kaczmarj
Copy link
Collaborator Author

OK thank you. sorry for all the back and forth. let me create some paths with non-ascii characters in them and debug... it seems like it's expecting ascii but gets something else. i'll get back to you soon @gkiar

@gkiar
Copy link

gkiar commented Jan 29, 2019

btw the command I ran that last time was literally /bin/echo 'hello'. Could it be a problem with the fact that I'm using a Mac?

@kaczmarj
Copy link
Collaborator Author

@gkiar - i think i was able to reproduce your issue, and i think i have fixed it. can you please reinstall and try again? the error before was that a file that is saved in the container with all of the files to be deleted was saved as ascii instead of utf-8. that's what i think at least. the commit that fixes this is 92848df.

fyi in commit 2a2126b i added some protections to prevent users from pruning mounted directories.

@gkiar
Copy link

gkiar commented Jan 30, 2019

Hey @kaczmarj - unfortunately, an error persists:

[NEURODOCKER 2019-01-30 15:34:17,000 INFO]: Uploading usage statistics is currently disabled
Please help us by providing anonymous usage statistics; you can enable this
by running:
    reprozip usage_report --enable
If you do not want to see this message again, you can run:
    reprozip usage_report --disable
Nothing will be uploaded before you opt in.
Traceback (most recent call last):
  File "/Users/greg/code/scratch/neurodocker/env/bin/ndminify", line 11, in <module>
    load_entry_point('neurodocker==0.4.4.dev0', 'console_scripts', 'ndminify')()
  File "/Users/greg/code/scratch/neurodocker/env/lib/python3.6/site-packages/neurodocker/reprozip/gentle/trace.py", line 169, in main
    trace_and_prune(container=args.container, commands=args.commands, directories_to_prune=args.dirs_to_prune)
  File "/Users/greg/code/scratch/neurodocker/env/lib/python3.6/site-packages/neurodocker/reprozip/gentle/trace.py", line 96, in trace_and_prune
    raise RuntimeError("Failed: {}".format(result))
RuntimeError: Failed: Traceback (most recent call last):
  File "/tmp/_prune.py", line 61, in <module>
    main(yaml_file=args.config_file, directories_to_prune=args.dirs_to_prune)
  File "/tmp/_prune.py", line 51, in main
    print('\n'.join(map(str, sorted(files_to_remove))), file=f)
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 15970-15971: surrogates not allowed

@kaczmarj
Copy link
Collaborator Author

ok, at least it's a different error :) is this container something you can share with me? i'd like to try to reproduce this myself

@gkiar
Copy link

gkiar commented Jan 30, 2019

Yep!

Command I'm using to launch the container:

docker run --rm -it --cap-add SYS_PTRACE --name tominify gkiar/fsl_5.0.11_dwi_preprocessing

Commands I'm using to minify:

pip install --no-cache-dir https://github.com/kaczmarj/neurodocker/tarball/add/minify-gently --upgrade
cmd1="/bin/echo 'hi'"
ndminify --container tominify --dirs-to-prune /usr/share  --commands "$cmd1"

@kaczmarj
Copy link
Collaborator Author

Ah! There are paths in the list of files to prune that have wonky characters, specifically in the directory:

/usr/share/ca-certificates/mozilla/

I won't paste the filenames here because I have no idea whether there are security implications... But these filenames have characters that python does not want to write to file. I accounted for this in the most recent commit (4c8c177).

@gkiar
Copy link

gkiar commented Jan 31, 2019

Progress! The container was indeed minified a LOT, which is awesome! When I save it, however, there are two notable missing pieces which I'm not entirely sure can be preserved (so at least deserve to be listed as a disclaimer, if not fixed), which are a) the entrypoint , and b) environment variables. In cases like FSL where binaries are placed in a non-standard location, and output file type is determined from a variable, it would be very useful if these could be preserved somehow. Maybe a flag in docker export? Otherwise, I'd recommend using this in a "temp" capacity and writing a new Dockerfile which builds from the saved minified image, adding the missing pieces back in as needed.

Thanks, @kaczmarj !

@kaczmarj
Copy link
Collaborator Author

Yes, docker export will not save environment variables, so I agree that this pruning process would be an intermediate step. The pruned image would be used as a base image in a new Dockerfile.

@kaczmarj kaczmarj merged commit b7772a6 into master Apr 9, 2020
@kaczmarj kaczmarj deleted the add/minify-gently branch April 9, 2020 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants