Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: [Python] improve error log to return a list of duplicated columns #45013

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

adrienpacifico
Copy link

Improve error log to display the list of duplicated columns instead of the full columns list.

Rationale for this change

Getting the full list of columns of the dataframe is not usefull, knowing which columns is duplicated is.
I tought that having all duplicated columns of the dataframe is better than having only the set of duplicated column names (df.columns[df.columns.duplicated()].unique().tolist() )

Are these changes tested?

No, very minor changes. Checked that f-strings are compatible with Python 3.9 (oldest compatible Python version in the pyproject.toml .

Are there any user-facing changes?

No

Improve error log to display the list of duplicated columns instead of the full columns list.
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@adrienpacifico adrienpacifico changed the title Update pandas_compat.py error log return a list of duplicated columns MINOR: [Update pandas_compat.py] error log return a list of duplicated columns Dec 12, 2024
@adrienpacifico adrienpacifico changed the title MINOR: [Update pandas_compat.py] error log return a list of duplicated columns MINOR: [Python] improve error log to return a list of duplicated columns Dec 12, 2024
@AlenkaF
Copy link
Member

AlenkaF commented Dec 13, 2024

Hi @adrienpacifico, thank you for the suggested contribution! I would very much like if the test test_duplicate_column_names_does_not_crash in test_pandas is updated to reflect this change and also an issue opened so this, even if small diff, can be tracked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants