Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary python packages from new workspaces #7064

Closed
kevin-ci opened this issue Dec 4, 2021 · 16 comments
Closed

Remove unnecessary python packages from new workspaces #7064

kevin-ci opened this issue Dec 4, 2021 · 16 comments
Labels
meta: stale This issue/PR is stale and will be closed soon priority: highest (user impact) Directly user impacting team: workspace Issue belongs to the Workspace team

Comments

@kevin-ci
Copy link

kevin-ci commented Dec 4, 2021

Is your feature request related to a problem? Please describe

Workspaces now seem to include a number of preinstalled Python packages.
image
image

Requesting that these be removed. It bloats requirements files and causes heroku deployments to fail.

Describe the behaviour you'd like

The old behaviour, where no pip packages were installed by default.

Describe alternatives you've considered

Using a venv, but then what's the point of using Gitpod at all? If I'm going to set up a venv within a workspace, I might as well just develop locally.

Additional context

This seems to be a very recent issue (first noticed it yesterday).

@lechien73
Copy link

Additionally to this, the package bloat means that pip3 freeze --local no longer works properly for creating requirements.txt files and pushing these requirements to Heroku causes the app to no longer deploy.

@akosyakov akosyakov added the priority: highest (user impact) Directly user impacting label Dec 6, 2021
@JanKoehnlein
Copy link
Contributor

/cc @ghuntley looks like a change in the workspace-full image. See also #7077

@ghuntley
Copy link
Contributor

ghuntley commented Dec 7, 2021

/cc @ghuntley looks like a change in the workspace-full image. See also #7077
This seems to be a very recent issue (first noticed it yesterday).

Hey @kevin-ci. Have bisected backwards +4 months worth of changes to workspace-full and these dependencies have been always installed. Have not tried going back further but even the version of the dependencies matches going back months so that almost rules out changes to workspace-full.

See https://gist.github.com/ghuntley/074a2a45f1af372d2e8572abf565419d

@ghuntley
Copy link
Contributor

ghuntley commented Dec 7, 2021

re: Describe alternatives you've considered

Using a venv

One of the things that's always a good idea is to pin dependencies of the project (at the operating system level) by bringing your own Docker file as described over here. The workspace-full image provided by Gitpod is updated often and our update cadence may not match the update cadence of your projects. By pinning the dependencies it enables you to have full control of your software supply chain.

.gitpod.yml

image:
  file: .gitpod.Dockerfile

.gitpod.Dockerfile

FROM gitpod/workspace-base
USER root
RUN ... # standard linux commands to install python such as from https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-programming-environment-on-an-ubuntu-20-04-server

The custom image will be smaller than workspace-full as it contains just what you need and will likely yield faster startup times as well once the image has been initially built.

Anyway, switching over to look into topics of pip versions and understand why these are installed in the first place.

@lechien73
Copy link

So, is it that older versions of workspace-full were automatically activating a venv and that is the changed behaviour @ghuntley?

When I create a new workspace using an older commit of workspace-full, it starts up without showing all of those packages installed. I've also tried running source deactivate to see if a venv is running, and it tells me pyenv-virtualenv: no virtualenv has been activated.

When I stop and restart the workspace created with the older commit, then all of the installed packages are there, which is very curious behaviour. You can use this template if you want to check: https://github.com/code-institute-org/gitpod-full-template

@lechien73
Copy link

So, further investigation reveals that the change took place back in June. Probably, our custom Dockerfile hadn't been updated, so the image hadn't rebuilt.

I've managed to solve it by creating a new Dockerfile, which builds from workspace:base and using the following Python installation:

### Python ###
USER gitpod
RUN sudo install-packages python3-pip

ENV PATH=$HOME/.pyenv/bin:$HOME/.pyenv/shims:$PATH
RUN curl -fsSL https://github.com/pyenv/pyenv-installer/raw/master/bin/pyenv-installer | bash \
    && { echo; \
        echo 'eval "$(pyenv init -)"'; \
        echo 'eval "$(pyenv virtualenv-init -)"'; } >> /home/gitpod/.bashrc.d/60-python \
    && pyenv update \
    && pyenv install 3.8.11 \
    && pyenv global 3.8.11 \
    && python3 -m pip install --no-cache-dir --upgrade pip \
    && python3 -m pip install --no-cache-dir --upgrade \
        setuptools wheel virtualenv pipenv pylint rope flake8 \
        mypy autopep8 pep8 pylama pydocstyle bandit notebook \
        twine \
    && sudo rm -rf /tmp/*USER gitpod
ENV PYTHONUSERBASE=/workspace/.pip-modules \
    PIP_USER=yes
ENV PATH=$PYTHONUSERBASE/bin:$PATH

This differs from the new default installation by setting PIP_USER to yes and removing the PIPENV_VENV_IN_PROJECT environment variable.

The workspace now behaves as expected - installed libraries are persisted and the other installed libraries are not visible to the user when pip freeze is run.

@JanKoehnlein
Copy link
Contributor

I am glad you found out how to fix this. Can this issue be closed then?

@lechien73
Copy link

For us, it can @JanKoehnlein; however I'd strongly suggest reverting to this behaviour by default. We can't be the only people who use Gitpod in order to escape the need to create virtual environments, and the current behaviour will cause a lot of package confusion for users. The need to manually configure pip to store its packages in persistent storage also seems a bit user-hostile. The previous behaviour, which we've now reverted to, is a lot cleaner, and a lot more intuitive and helpful for the user.

@AlexTugarev
Copy link
Member

I'm trying to provide some light on this as it might be necessary to understand the actual expectation on the state of python package in a fresh workspace.

See the dependencies of pre-installed packages which perfectly match the packages contributed by the workspace full image.
gitpod /workspace/gitpod $ pipdeptree -fl
autopep8==1.5.7
  pycodestyle==2.7.0
  toml==0.10.2
awscli==1.21.9
  botocore==1.22.9
    jmespath==0.10.0
    python-dateutil==2.8.2
      six==1.16.0
    urllib3==1.26.6
  colorama==0.4.3
  docutils==0.15.2
  PyYAML==5.4.1
  rsa==4.7.2
    pyasn1==0.4.8
  s3transfer==0.5.0
    botocore==1.22.9
      jmespath==0.10.0
      python-dateutil==2.8.2
        six==1.16.0
      urllib3==1.26.6
bandit==1.7.0
  GitPython==3.1.18
    gitdb==4.0.7
      smmap==4.0.0
  PyYAML==5.4.1
  six==1.16.0
  stevedore==3.4.0
    pbr==5.6.0
crcmod==1.7
flake8==3.9.2
  mccabe==0.6.1
  pycodestyle==2.7.0
  pyflakes==2.3.1
mypy==0.910
  mypy-extensions==0.4.3
  toml==0.10.2
  typing-extensions==3.10.0.2
notebook==6.4.3
  argon2-cffi==21.1.0
    cffi==1.14.6
      pycparser==2.20
  ipykernel==6.4.1
    debugpy==1.4.3
    ipython==7.27.0
      backcall==0.2.0
      decorator==5.1.0
      jedi==0.18.0
        parso==0.8.2
      matplotlib-inline==0.1.3
        traitlets==5.1.0
      pexpect==4.8.0
        ptyprocess==0.7.0
      pickleshare==0.7.5
      prompt-toolkit==3.0.20
        wcwidth==0.2.5
      Pygments==2.10.0
      setuptools==58.0.4
      traitlets==5.1.0
    ipython-genutils==0.2.0
    jupyter-client==7.0.2
      entrypoints==0.3
      jupyter-core==4.7.1
        traitlets==5.1.0
      nest-asyncio==1.5.1
      python-dateutil==2.8.2
        six==1.16.0
      pyzmq==22.2.1
      tornado==6.1
      traitlets==5.1.0
    matplotlib-inline==0.1.3
      traitlets==5.1.0
    tornado==6.1
    traitlets==5.1.0
  ipython-genutils==0.2.0
  Jinja2==3.0.1
    MarkupSafe==2.0.1
  jupyter-client==7.0.2
    entrypoints==0.3
    jupyter-core==4.7.1
      traitlets==5.1.0
    nest-asyncio==1.5.1
    python-dateutil==2.8.2
      six==1.16.0
    pyzmq==22.2.1
    tornado==6.1
    traitlets==5.1.0
  jupyter-core==4.7.1
    traitlets==5.1.0
  nbconvert==6.1.0
    bleach==4.1.0
      packaging==21.0
        pyparsing==2.4.7
      six==1.16.0
      webencodings==0.5.1
    defusedxml==0.7.1
    entrypoints==0.3
    Jinja2==3.0.1
      MarkupSafe==2.0.1
    jupyter-core==4.7.1
      traitlets==5.1.0
    jupyterlab-pygments==0.1.2
      Pygments==2.10.0
    mistune==0.8.4
    nbclient==0.5.4
      jupyter-client==7.0.2
        entrypoints==0.3
        jupyter-core==4.7.1
          traitlets==5.1.0
        nest-asyncio==1.5.1
        python-dateutil==2.8.2
          six==1.16.0
        pyzmq==22.2.1
        tornado==6.1
        traitlets==5.1.0
      nbformat==5.1.3
        ipython-genutils==0.2.0
        jsonschema==3.2.0
          attrs==21.2.0
          pyrsistent==0.18.0
          setuptools==58.0.4
          six==1.16.0
        jupyter-core==4.7.1
          traitlets==5.1.0
        traitlets==5.1.0
      nest-asyncio==1.5.1
      traitlets==5.1.0
    nbformat==5.1.3
      ipython-genutils==0.2.0
      jsonschema==3.2.0
        attrs==21.2.0
        pyrsistent==0.18.0
        setuptools==58.0.4
        six==1.16.0
      jupyter-core==4.7.1
        traitlets==5.1.0
      traitlets==5.1.0
    pandocfilters==1.4.3
    Pygments==2.10.0
    testpath==0.5.0
    traitlets==5.1.0
  nbformat==5.1.3
    ipython-genutils==0.2.0
    jsonschema==3.2.0
      attrs==21.2.0
      pyrsistent==0.18.0
      setuptools==58.0.4
      six==1.16.0
    jupyter-core==4.7.1
      traitlets==5.1.0
    traitlets==5.1.0
  prometheus-client==0.11.0
  pyzmq==22.2.1
  Send2Trash==1.8.0
  terminado==0.12.1
    ptyprocess==0.7.0
    tornado==6.1
  tornado==6.1
  traitlets==5.1.0
pep8==1.7.1
pipdeptree==2.2.0
  pip==21.2.4
pipenv==2021.5.29
  certifi==2021.5.30
  pip==21.2.4
  setuptools==58.0.4
  virtualenv==20.7.2
    backports.entry-points-selectable==1.1.0
    distlib==0.3.2
    filelock==3.0.12
    platformdirs==2.3.0
    six==1.16.0
  virtualenv-clone==0.5.7
pylama==7.7.1
  mccabe==0.6.1
  pycodestyle==2.7.0
  pydocstyle==6.1.1
    snowballstemmer==2.1.0
  pyflakes==2.3.1
pylint==2.10.2
  astroid==2.7.3
    lazy-object-proxy==1.6.0
    setuptools==58.0.4
    wrapt==1.12.1
  isort==5.9.3
  mccabe==0.6.1
  platformdirs==2.3.0
  toml==0.10.2
rope==0.19.0
twine==3.4.2
  colorama==0.4.3
  importlib-metadata==4.8.1
    zipp==3.5.0
  keyring==23.2.1
    importlib-metadata==4.8.1
      zipp==3.5.0
    jeepney==0.7.1
    SecretStorage==3.3.1
      cryptography==3.4.8
        cffi==1.14.6
          pycparser==2.20
      jeepney==0.7.1
  pkginfo==1.7.1
  readme-renderer==29.0
    bleach==4.1.0
      packaging==21.0
        pyparsing==2.4.7
      six==1.16.0
      webencodings==0.5.1
    docutils==0.15.2
    Pygments==2.10.0
    six==1.16.0
  requests==2.26.0
    certifi==2021.5.30
    charset-normalizer==2.0.4
    idna==3.2
    urllib3==1.26.6
  requests-toolbelt==0.9.1
    requests==2.26.0
      certifi==2021.5.30
      charset-normalizer==2.0.4
      idna==3.2
      urllib3==1.26.6
  rfc3986==1.5.0
  tqdm==4.62.2
wheel==0.37.0

It were changes like gitpod-io/workspace-images#461 which caused to have a single global location of packages to be installed to and loaded from. The linked issue there might be not a regular case, as the mentioned conflict arises from another virtualenv situation.

Anyway, what we need to clarify is, what do people expect from a python environment in an empty project?
To heat up that discussion, I assume that Gitpod cannot replace the configuration of virtual environments by providing an empty site and pre-populate dependencies for IDE (notebook etc.) at the same time.

@lechien73
Copy link

I could be way off the mark here @AlexTugarev, because I couldn't find much documentation for the PIP_USER environment variable; however I think what was happening prior to the June change is that the required dependencies were being installed globally. The PIP_USER=yes environment variable set pip to work in user mode, so that global packages weren't displayed. When you started a new workspace and ran pip freeze, nothing was returned. The global packages were installed under /home/gitpod/.pyenv/versions/3.8.12/lib/python3.8 and the local packages installed into /workspace/.pip_modules

In effect, a new workspace appeared to start with an empty Python environment, although the global packages were installed and available. For us, at least, that is the ideal behaviour. Our students can start with an "empty" environment and build their requirements files without having to worry about creating another virtual environment or filtering extraneous packages. It is, in fact, the great USP of Gitpod for us - each workspace is a blank, self-contained workspace and a student can just start coding.

I'd be interested to know the rationale behind the PIP_USER=no change that happened a couple of months ago.

@AlexTugarev
Copy link
Member

[...] In effect, a new workspace appeared to start with an empty Python environment, although the global packages were installed and available. For us, at least, that is the ideal behaviour.

This is a great summary and exactly the intention behind the previous setup.

I'd be interested to know the rationale behind the PIP_USER=no change that happened a couple of months ago.

I pass the ball to @csweichel because of gitpod-io/workspace-images#461. It seems to have been changed in order to overcome issues with pipenv, but I've no insight about the actual issue there.

@svenefftinge
Copy link
Member

Is this still an issue with the new images?

@svenefftinge svenefftinge added the team: workspace Issue belongs to the Workspace team label Feb 9, 2022
@JanKoehnlein
Copy link
Contributor

Looks like it's at least improved: I forked the repo above, removed the image section from the .gitpod.yml file, started Gitpod and ran

$ pip freeze
breezy==3.0.2
certifi==2019.11.28
chardet==3.0.4
configobj==5.0.6
dbus-python==1.2.16
dulwich==0.19.15
fastimport==0.9.8
idna==2.8
PyGObject==3.36.0
python-apt==2.0.0+ubuntu0.20.4.6
PyYAML==5.3.1
requests==2.22.0
requests-unixsocket==0.2.0
six==1.14.0
urllib3==1.25.8

@edmondop
Copy link

The fact that as we enable pre-built GitPod stopped working and we had to dig down on this thread caused us a lot of headache. A general commands on virtual env: virtual env are here to stay, and you want to support developers collaboration. Some people will use GitPod, other won't. Making necessary to change the docker image to support virtual env and pre-builts seems to make collaboration harder "by default"

@yenicelik
Copy link

have the same issue, the init functionality is basically not there

@stale
Copy link

stale bot commented Dec 3, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the meta: stale This issue/PR is stale and will be closed soon label Dec 3, 2022
@stale stale bot closed this as completed Dec 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: stale This issue/PR is stale and will be closed soon priority: highest (user impact) Directly user impacting team: workspace Issue belongs to the Workspace team
Projects
None yet
Development

No branches or pull requests

9 participants