Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable back iterative development of latest providers with old airflows #43617

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Nov 3, 2024

The compatibility tests in CI are using providers built as packages from sources, so the compatibility tests run there using "providers/tests" work just fine, because all providers are installed in the airflow.providers site library. However when we are iterating and debugging backwards compatiblity provider tests, we should be able to use local provider sources, rather than installed packages and we have the possibility of mounting both - providers sources and tests to the image.

See contributing-docs/testing/unit_tests.rst on how to do it by using --mount-sources providers-and-tests flag connected with --use-airflow-version.

However as of #42505 this has been broken, because currently in main we rely on airflow having "pkgutil" namespace package for both - airflow, and airflow.providers packages (previous airflow versions had implicit package for airflow.providers package) - so providers installed locally cannot be used as "another" source of providers. Previously it was working because both "installed" and "sources" airflow.providers package were implicit namespace packages.

As explained in https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#native-namespace-packages

Every distribution that uses the namespace package must include such
an __init__.py. If any distribution does not, it will cause the
namespace logic to fail and the other sub-packages will not be
importable. Any additional code in init.py will be inaccessible.

So because old airflow uses implicit provider's packages and main airflow from source uses "explicit" provider's package, the only way we can make the "source" providers is to mount them or symbolically link them to inside installed distribution of airflow package (in site directory) (or dynamically remove the init.py from provider's source directory.

We cannot mount the provider package sources ot inside the installed airflow - because when --use-airflow-version is used, airflow is installed dynamically inside the container - after the container is started.

This PR solves the problem by adding an env variable that will make the initialization script to remove the installed airflow.providers folder after installing airflow and linking the "providers/src/airflow/providers" folder there. This has the added benefit that all providers (including the preinstalled ones) are used from "main" sources rather than from installed packages - which was problematic for the past way of using providers from sources - which used the fact that both "airflow.providers" in the site-library and the one in sources were implicit namespace packages.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

The compatibility tests in CI are using providers built as packages
from sources, so the compatibility tests run there using
"providers/tests" work just fine, because all providers are installed
in the airflow.providers site library. However when we are iterating
and debugging backwards compatiblity provider tests, we should be
able to use local provider sources, rather than installed packages
and we have the possibility of mounting both - providers sources
and tests to the image.

See `contributing-docs/testing/unit_tests.rst` on how to do it
by using ``--mount-sources providers-and-tests`` flag connected
with `--use-airflow-version`.

However as of apache#42505 this has been broken, because currently in
main we  rely on airflow having "pkgutil" namespace package for
both - airflow, and airflow.providers packages (previous airflow
versions had implicit package for airflow.providers package) - so
providers installed locally cannot be used as "another" source of
providers.  Previously it was working because both "installed" and
"sources" `airflow.providers` package were implicit namespace packages.

As explained in https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#native-namespace-packages

> Every distribution that uses the namespace package must include such
> an `__init__.py`. If any distribution does not, it will cause the
> namespace logic to fail and the other sub-packages will not be
> importable. Any additional code in __init__.py will be inaccessible.

So because old airflow uses implicit provider's packages and
main airflow from source uses "explicit" provider's package,
the only way we can make the "source" providers is to mount
them or symbolically link them to inside installed distribution
of airflow package (in site directory) (or dynamically remove
the __init__.py from provider's source directory.

We cannot mount the provider package sources ot inside the
installed airflow - because when --use-airflow-version is used,
airflow is installed dynamically inside the container - after
the container is started.

This PR solves the problem by adding an env variable that
will make the initialization script to remove the installed
airflow.providers folder after installing airflow and linking the
"providers/src/airflow/providers" folder there. This has the
added benefit that all providers (including the preinstalled ones)
are used from "main" sources rather than from installed packages -
which was problematic for the past way of using providers from
sources - which used the fact that both "airflow.providers" in
the site-library and the one in sources were implicit namespace
packages.
@potiuk
Copy link
Member Author

potiuk commented Nov 3, 2024

Very interesting :). I've learned a bit more about namespace packages.

@potiuk
Copy link
Member Author

potiuk commented Nov 3, 2024

Found it while working on #43556

Dockerfile.ci Outdated Show resolved Hide resolved
@gopidesupavan
Copy link
Member

gopidesupavan commented Nov 4, 2024

Definitely this is a great catch @potiuk :) Thank you..

@potiuk potiuk merged commit 12950dd into apache:main Nov 4, 2024
82 checks passed
@potiuk potiuk deleted the fix-iterative-testing-of-compatibility-with-providers branch November 4, 2024 12:01
ellisms pushed a commit to ellisms/airflow that referenced this pull request Nov 13, 2024
…ws (apache#43617)

* Enable back iterative development of latest providers with old airflows

The compatibility tests in CI are using providers built as packages
from sources, so the compatibility tests run there using
"providers/tests" work just fine, because all providers are installed
in the airflow.providers site library. However when we are iterating
and debugging backwards compatiblity provider tests, we should be
able to use local provider sources, rather than installed packages
and we have the possibility of mounting both - providers sources
and tests to the image.

See `contributing-docs/testing/unit_tests.rst` on how to do it
by using ``--mount-sources providers-and-tests`` flag connected
with `--use-airflow-version`.

However as of apache#42505 this has been broken, because currently in
main we  rely on airflow having "pkgutil" namespace package for
both - airflow, and airflow.providers packages (previous airflow
versions had implicit package for airflow.providers package) - so
providers installed locally cannot be used as "another" source of
providers.  Previously it was working because both "installed" and
"sources" `airflow.providers` package were implicit namespace packages.

As explained in https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#native-namespace-packages

> Every distribution that uses the namespace package must include such
> an `__init__.py`. If any distribution does not, it will cause the
> namespace logic to fail and the other sub-packages will not be
> importable. Any additional code in __init__.py will be inaccessible.

So because old airflow uses implicit provider's packages and
main airflow from source uses "explicit" provider's package,
the only way we can make the "source" providers is to mount
them or symbolically link them to inside installed distribution
of airflow package (in site directory) (or dynamically remove
the __init__.py from provider's source directory.

We cannot mount the provider package sources ot inside the
installed airflow - because when --use-airflow-version is used,
airflow is installed dynamically inside the container - after
the container is started.

This PR solves the problem by adding an env variable that
will make the initialization script to remove the installed
airflow.providers folder after installing airflow and linking the
"providers/src/airflow/providers" folder there. This has the
added benefit that all providers (including the preinstalled ones)
are used from "main" sources rather than from installed packages -
which was problematic for the past way of using providers from
sources - which used the fact that both "airflow.providers" in
the site-library and the one in sources were implicit namespace
packages.

* Update Dockerfile.ci

Co-authored-by: GPK <[email protected]>

* Update scripts/docker/entrypoint_ci.sh

Co-authored-by: GPK <[email protected]>

---------

Co-authored-by: GPK <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants