Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Mixed dtype not supported for pd.read_excel #168

Closed
belkmouf opened this issue Nov 30, 2020 · 4 comments
Closed

[BUG] Mixed dtype not supported for pd.read_excel #168

belkmouf opened this issue Nov 30, 2020 · 4 comments
Assignees
Labels
bug Something isn't working pandas Issues related to Pandas

Comments

@belkmouf
Copy link

################################################################################
error Lux

C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\formatters.py:345: UserWarning:
Unexpected error in rendering Lux widget and recommendations. Falling back to Pandas display.
Please report the following issue on Github: https://github.com/lux-org/lux/issues

C:\ProgramData\Anaconda3\lib\site-packages\lux\core\frame.py:709: UserWarning:Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\lux\core\frame.py", line 661, in repr_html
self.maintain_recs()
File "C:\ProgramData\Anaconda3\lib\site-packages\lux\core\frame.py", line 492, in maintain_recs
custom_action_collection = custom_actions(rec_df)
File "C:\ProgramData\Anaconda3\lib\site-packages\lux\action\custom.py", line 73, in custom_actions
recommendation = lux.actions.getattr(action_name).action(ldf, args)
File "C:\ProgramData\Anaconda3\lib\site-packages\lux\action\univariate.py", line 84, in univariate
vlist = VisList(intent, ldf)
File "C:\ProgramData\Anaconda3\lib\site-packages\lux\vis\VisList.py", line 43, in init
self.refresh_source(self._source)
File "C:\ProgramData\Anaconda3\lib\site-packages\lux\vis\VisList.py", line 313, in refresh_source
ldf.executor.execute(self._collection, ldf)
File "C:\ProgramData\Anaconda3\lib\site-packages\lux\executor\PandasExecutor.py", line 97, in execute
PandasExecutor.execute_aggregate(vis, isFiltered=filter_executed)
File "C:\ProgramData\Anaconda3\lib\site-packages\lux\executor\PandasExecutor.py", line 228, in execute_aggregate
vis._vis_data = vis.data.sort_values(by=groupby_attr.attribute, ascending=True)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 5307, in sort_values
indexer = nargsort(
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\sorting.py", line 301, in nargsort
indexer = non_nan_idx[non_nans.argsort(kind=kind)]
TypeError: '<' not supported between instances of 'str' and 'int'

@jinimukh jinimukh changed the title error BUG: Mixed dtype not supported for pd.read_excel Dec 1, 2020
@jinimukh jinimukh added bug Something isn't working pandas Issues related to Pandas labels Dec 1, 2020
@jinimukh jinimukh changed the title BUG: Mixed dtype not supported for pd.read_excel [BUG] Mixed dtype not supported for pd.read_excel Dec 1, 2020
@dorisjlee
Copy link
Member

dorisjlee commented Dec 1, 2020

Hi @belkmouf, This is likely happening because your Excel data contains one or more column that has mixed data types. I don't have access to your data, but I was able to reproduce the error that you are seeing with a different Excel file.
image
For example, here the "Car Type" column contains both int and str values (i.e., mixed type) and is recognized as an object type.
image

We can fix this by converting the mixed type column to a str type, which resolves the error.
image
Alternatively, you can also specify a converter or dtype in the read_excel to ensure that the data type is specified correctly. See the Pandas documentation or this post for more details.
Please let us know if this fixes the problem that you are seeing.

@dorisjlee
Copy link
Member

@jinimukh: Could we add a better warning message to flag issues with the input dataframe if the dataframe contains mixed dtypes? Similar to what we do with type detection warning date type columns.

@dorisjlee
Copy link
Member

This error is also reproduced in the Kaggle DS survey data since the first row is string columns.
image

@dorisjlee
Copy link
Member

Hi @belkmouf,
@jinimukh fixed this bug in our most recent release.
You can access these updated changes by upgrading to the latest version of Lux:

pip install --upgrade lux-api
jupyter nbextension install --py luxwidget
jupyter nbextension enable --py luxwidget

Please let us know if this addresses the issue mentioned above.
I'll close this issue for now, but feel free to open a new issue if there are any other questions that come up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pandas Issues related to Pandas
Projects
None yet
Development

No branches or pull requests

3 participants