Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimalistic way of dvc pull usage in ci/cd #1835

Closed
puhoshville opened this issue Apr 5, 2019 · 6 comments
Closed

Minimalistic way of dvc pull usage in ci/cd #1835

puhoshville opened this issue Apr 5, 2019 · 6 comments
Assignees
Labels
enhancement Enhances DVC p1-important Important, aka current backlog of things to do

Comments

@puhoshville
Copy link
Contributor

I'm using dvc pull in ci pipeline and not so long ago (after a new release) it broke due to lack of git executable in the container (python:3.6-slim-jessie). It was easily fixed by including of additional stage to Dockerfile with git-core installation, but this approach does not meet minimalistic principals of ci. I think that some dvc methods such as dvc pull need to be considered as production-ready with no additional requirements.

@ghost ghost added the enhancement Enhances DVC label Apr 5, 2019
@shcheklein
Copy link
Member

Related #1487 . @puhoshville have you seen that ticket? Would be great if you read it and let us know what you think.

@ghost
Copy link

ghost commented Apr 5, 2019

It makes sense, because git is usually a buildtime dependency, but it is hardly used during run time.
Also, having git as a requirement for dvc pull breaks compatibility with the --no-scm flag.

Steps to reproduce with Docker:

FROM python

RUN apt-get remove git --yes
RUN pip install dvc
RUN mkdir -p /tmp/example \
      && mkdir -p /tmp/dvc-storage \
      && cd /tmp/example \
      && dvc init --no-scm \
      && dvc remote add -d dvc-storage /tmp/dvc-storage \
      && echo "hello" > hello \
      && dvc add hello \
      && dvc push \
      && rm -rf .dvc/cache \
      && dvc pull

@ghost
Copy link

ghost commented Apr 5, 2019

Traceback:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/git/__init__.py", line 83, in <module>
    refresh()
  File "/usr/local/lib/python3.7/site-packages/git/__init__.py", line 73, in refresh
    if not Git.refresh(path=path):
  File "/usr/local/lib/python3.7/site-packages/git/cmd.py", line 290, in refresh
    raise ImportError(err)
ImportError: Bad git executable.
The git executable must be specified in one of the following ways:
    - be included in your $PATH
    - be set via $GIT_PYTHON_GIT_EXECUTABLE
    - explicitly set via git.refresh()

All git commands will error until this is rectified.

This initial warning can be silenced or aggravated in the future by setting the
$GIT_PYTHON_REFRESH environment variable. Use one of the following values:
    - quiet|q|silence|s|none|n|0: for no warning or exception
    - warn|w|warning|1: for a printed warning
    - error|e|raise|r|2: for a raised exception

Example:
    export GIT_PYTHON_REFRESH=quiet


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/dvc", line 6, in <module>
    from dvc.main import main
  File "/usr/local/lib/python3.7/site-packages/dvc/main.py", line 6, in <module>
    from dvc.cli import parse_args
  File "/usr/local/lib/python3.7/site-packages/dvc/cli.py", line 12, in <module>
    import dvc.command.pkg as pkg
  File "/usr/local/lib/python3.7/site-packages/dvc/command/pkg.py", line 8, in <module>
    from dvc.repo.pkg import PackageParams
  File "/usr/local/lib/python3.7/site-packages/dvc/repo/__init__.py", line 16, in <module>
    class Repo(object):
  File "/usr/local/lib/python3.7/site-packages/dvc/repo/__init__.py", line 35, in Repo
    from dvc.repo.pkg import install_pkg
  File "/usr/local/lib/python3.7/site-packages/dvc/repo/pkg.py", line 5, in <module>
    from git.cmd import Git
  File "/usr/local/lib/python3.7/site-packages/git/__init__.py", line 85, in <module>
    raise ImportError('Failed to initialize: {0}'.format(exc))
ImportError: Failed to initialize: Bad git executable.
The git executable must be specified in one of the following ways:
    - be included in your $PATH
    - be set via $GIT_PYTHON_GIT_EXECUTABLE
    - explicitly set via git.refresh()

All git commands will error until this is rectified.

This initial warning can be silenced or aggravated in the future by setting the
$GIT_PYTHON_REFRESH environment variable. Use one of the following values:
    - quiet|q|silence|s|none|n|0: for no warning or exception
    - warn|w|warning|1: for a printed warning
    - error|e|raise|r|2: for a raised exception

Example:
    export GIT_PYTHON_REFRESH=quiet

@efiop
Copy link
Contributor

efiop commented Apr 5, 2019

@MrOutis Got it! Thank you! Will send a fix ASAP.

@efiop efiop self-assigned this Apr 5, 2019
@efiop efiop added the p1-important Important, aka current backlog of things to do label Apr 5, 2019
@puhoshville
Copy link
Contributor Author

@shcheklein I think that this issue is not quite relevant to the dataset storage task. exactly same ImportError: Failed to initialize: Bad git executable. I got while tried to load pretrained model to container not dataset.
@MrOutis thank you for such a detailed explanation! It's exactly what I was talking about :)

@efiop
Copy link
Contributor

efiop commented Apr 6, 2019

@puhoshville yeah, it is a simple import bug in one of unrelated parts of dvc. I have my patch going, should be ready today. Thanks for the feedback! 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC p1-important Important, aka current backlog of things to do
Projects
None yet
Development

No branches or pull requests

3 participants