Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ellipsis indexing #1696

Closed
2 of 3 tasks
nc1m opened this issue Sep 25, 2024 · 2 comments · Fixed by #1729
Closed
2 of 3 tasks

Support ellipsis indexing #1696

nc1m opened this issue Sep 25, 2024 · 2 comments · Fixed by #1729
Assignees
Milestone

Comments

@nc1m
Copy link

nc1m commented Sep 25, 2024

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the master branch of anndata.

Report

In 1.5.1 sklearn changed their _array_indexing method to use ellipsis indexing/slicing:
https://github.com/scikit-learn/scikit-learn/blob/1.5.1/sklearn/utils/_indexing.py#L33

Running train_test_split on AnnData now results in an IndexError.

Code:

import anndata
import session_info
from sklearn.model_selection import train_test_split
import scanpy as sc

session_info.show(html=False, dependencies=True)

adata = sc.datasets.ebi_expression_atlas("E-MTAB-4888")
adata_val, adata_test = train_test_split(adata, test_size=0.5, stratify=adata.obs['Sample Characteristic[cell type]'])

Traceback:

Traceback (most recent call last):
  File "/home/user/project/src/data/test.py", line 6, in <module>
    adata_val, adata_test = train_test_split(adata, test_size=0.5, stratify=adata.obs['Sample Characteristic[cell type]'])
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/sklearn/utils/_param_validation.py", line 213, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/sklearn/model_selection/_split.py", line 2805, in train_test_split
    return list(
           ^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/sklearn/model_selection/_split.py", line 2807, in <genexpr>
    (_safe_indexing(a, train), _safe_indexing(a, test)) for a in arrays
     ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/sklearn/utils/_indexing.py", line 267, in _safe_indexing
    return _array_indexing(X, indices, indices_dtype, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/sklearn/utils/_indexing.py", line 33, in _array_indexing
    return array[key, ...] if axis == 0 else array[:, key]
           ~~~~~^^^^^^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/anndata/_core/anndata.py", line 1021, in __getitem__
    oidx, vidx = self._normalize_indices(index)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/anndata/_core/anndata.py", line 1002, in _normalize_indices
    return _normalize_indices(index, self.obs_names, self.var_names)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/anndata/_core/index.py", line 39, in _normalize_indices
    ax1 = _normalize_index(ax1, names1)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/anndata/_core/index.py", line 109, in _normalize_index
    raise IndexError(f"Unknown indexer {indexer!r} of type {type(indexer)}")
IndexError: Unknown indexer Ellipsis of type <class 'ellipsis'>

I'm not sure if this behavior is intended.

Versions

-----
anndata             0.10.9
scanpy              1.10.2
session_info        1.0.0
sklearn             1.5.1
-----
PIL                 10.3.0
cffi                1.17.1
colorama            0.4.6
cycler              0.12.1
cython_runtime      NA
dateutil            2.9.0
h5py                3.11.0
igraph              0.11.6
joblib              1.4.2
kiwisolver          1.4.7
legacy_api_wrap     NA
leidenalg           0.10.2
llvmlite            0.43.0
matplotlib          3.9.2
mpl_toolkits        NA
natsort             8.4.0
numba               0.60.0
numpy               1.26.4
packaging           24.1
pandas              2.2.2
pyparsing           3.1.4
pytz                2024.1
scipy               1.14.1
six                 1.16.0
texttable           1.7.0
threadpoolctl       3.5.0
torch               2.4.1
torchgen            NA
tqdm                4.66.5
typing_extensions   NA
yaml                6.0.2
-----
Python 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]
Linux-6.1.0-25-amd64-x86_64-with-glibc2.36
@ilan-gold
Copy link
Contributor

@nc1m I think this is then a feature because I don't think we ever supported Ellipsis indexing. But this should be simple enough.

@ilan-gold ilan-gold added this to the 0.10.10 milestone Oct 1, 2024
@ilan-gold ilan-gold changed the title Updating sklearn from 1.1.2 to 1.5.1 breaks train_test_split for AnnData because of ellipsis indexing Support ellipsis indexing Oct 1, 2024
@flying-sheep
Copy link
Member

flying-sheep commented Oct 8, 2024

Ehrdata could benefit from allowing 3D data in .X too: theislab/ehrdata#24

Seems like we already support 3D data in .layers, so latest in the .X-.layers unification (#244) we’d need to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants