Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] add 'pandas' extra #5937

Merged
merged 3 commits into from
Jun 23, 2023
Merged

[python-package] add 'pandas' extra #5937

merged 3 commits into from
Jun 23, 2023

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Jun 23, 2023

While working on #5936, I discovered that lightgbm is incompatible with pandas<0.24.0. This PR proposes adding a [pandas] extra to the Python package to track that floor, so users can run pip install 'lightgbm[pandas]' to be guaranteed to either get a compatible version or a big, loud, clear error.

Other changes:

How I found that version floor

details (click me)

lightgbm can only be used with pandas if CategoricalDType can be imported...

"""pandas"""
try:
from pandas import DataFrame as pd_DataFrame
from pandas import Series as pd_Series
from pandas import concat
try:
from pandas import CategoricalDtype as pd_CategoricalDtype
except ImportError:
from pandas.api.types import CategoricalDtype as pd_CategoricalDtype
PANDAS_INSTALLED = True
except ImportError:
PANDAS_INSTALLED = False

... which was introduced in pandas==0.21.0: pandas-dev/pandas#16015.

lightgbm also relies on pd.DataFrame.to_numpy() and pd.Series.to_numpy()...

data = data.to_numpy(dtype=target_dtype, copy=False)

data = data.to_numpy(dtype=target_dtype, na_value=np.nan)

label = label.to_numpy(dtype=np.float32, copy=False)

label = label.to_numpy(dtype=np.float32, na_value=np.nan)

elif isinstance(other.data, pd_DataFrame):
self.data = dt_DataTable(np.hstack((self.data.to_numpy(), other.data.values)))

... which was introduced in pandas==0.24.0: pandas-dev/pandas#23623

How I discovered that extras were broken (and then tested that this fixes them)

details (click me)
# build the wheel
sh ./build-python.sh bdist_wheel

# try to install
pip install \
    --find-links=./dist \
    'lightgbm[pandas]'

On master:

WARNING: lightgbm 3.3.5.99 does not provide the extra 'pandas'

On this branch:

Looking in links: ./dist
Processing ./dist/lightgbm-3.3.5.99-py3-none-macosx_12_0_x86_64.whl
Collecting pandas>=0.24.0
  Downloading pandas-2.0.2-cp39-cp39-macosx_10_9_x86_64.whl (11.8 MB)
...

@@ -26,6 +26,15 @@ readme = "README.rst"
requires-python = ">=3.6"
version = "3.3.5.99"

[project.optional-dependencies]
dask = [
"dask[array,dataframe,distributed]>=2.0.0",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format where these are split out as individual items

dask[array]>=2.0.0
dask[dataframe]>=2.0.0
dask[distributed]>=2.0.0

Is confusing. It makes it seem like 3 separate versions of 3 separate things, which just happen to be the same... but that's not really what it means.

That is 3 different ways to say dask >= 2.0.0.

We should just use a single number, to make that clearer.

@jameslamb jameslamb merged commit 2e603f8 into master Jun 23, 2023
@jameslamb jameslamb deleted the pandas-extra branch June 23, 2023 20:04
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants