Mixed dtype issue #205

jinimukh · 2021-01-08T04:02:59Z

This PR addresses #168

Previously, sort_values on vis.data was causing an exception for mixed data type to occur. To handle this exception, I converted the internal data of the vis to type str.

Changed:

Exception to handle mixed dtype issue
refactoring to make code more legible
test
add xlrd to dev requirements so that the excel file in the test can load

codecov-io · 2021-01-08T04:39:49Z

Codecov Report

Merging #205 (cc8e3ee) into master (e1430df) will decrease coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #205      +/-   ##
==========================================
- Coverage   77.54%   77.51%   -0.04%     
==========================================
  Files          40       40              
  Lines        2841     2846       +5     
==========================================
+ Hits         2203     2206       +3     
- Misses        638      640       +2

Impacted Files	Coverage Δ
lux/executor/PandasExecutor.py	`92.88% <100.00%> (+0.12%)`	⬆️
lux/vislib/altair/Heatmap.py	`93.10% <0.00%> (-3.45%)`	⬇️
lux/vislib/altair/ScatterChart.py	`93.93% <0.00%> (-3.04%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1430df...cc8e3ee. Read the comment docs.

* Similarity as a default action (#182) * similarity formatting fixed * added another similarity test case; fixed bug where colored heatmap dimension is temporal (invalidate all 2 msr 1 temporal case) * filter and similarity together * filter and similarity together * remove filter * black line length * file reorg and clean; change sim metric Co-authored-by: Caitlyn Chen <[email protected]> Co-authored-by: Doris Lee <[email protected]> * bump numpy min version for travis * Special character issue (#184) * rename col * broken * fixed period replacement bug * add tests * refine tests * refine tests * remove cols * fix tests * add agg * fixed tests * clean up PR Co-authored-by: Caitlyn Chen <[email protected]> Co-authored-by: Doris Lee <[email protected]> * Colored bar interestingness bug (#189) * rewrote chi2 contingency with pd.crosstab * catching KeyError issue with chi2 contingency * padding interestingness with warning instead of error * interestingness now reuses ndim and nmsr computed in Compiler * bug fix for parser with int values * improve Vis repr to better display inferred intent when data is absent but fully compiled intent (all clauses) * Add sampling parameters as a global config (#192) * update export tutorial to add explanation for standalone argument * minor fixes and remove cell output in notebooks * added contributing doc * fix bugs and uncomment some tests * remove raise warning * remove unnecessary import * split up rename test into two parts * fix setting warning, fix data_type bugs and add relevant tests * remove ordinal data type * add test for small dataframe resetting index * add loc and iloc tests * fix attribute access directly to dataframe * add small changes to code * added test for qcut and cut * add check if dtype is Interval * added qcut test * fix Record KeyError * add tests * take care of reset_index case * small edits * add data_model to column_group Clause * small edits for row_group * fixes to row group * add config for start and cap for samples * finish sampling config and tests * black formatting * add documentation for sampling config * remove small added issues * minor changes to docs * implement heatmap flag and add tests * black formatting and documentation edits Co-authored-by: Doris Lee <[email protected]> * Coalesce all data_type attributes of frame into one (#185) * coalesce data_types into data_type_lookup * black reformat * changed to better variable names * lux not defined error * fixed * black format * Update CONTRIBUTING.md * Bug Fix: User-provided Index causes KeyError in Pandas Execution (#191) * Moved Executor Parameters to Global Config * Black formatting * Moved table_name parameter to frame.py. Removed executor_type parameter executor_type parameter no longer necessary to maintain * Fixed reference to table_name parameter table_name is now a parameter within frame.py * Adjusted Functions to Set SQL Connection Moved set_SQL_connection function to config. Added set_SQL_table function within frame.py to let users specify which database table will be associated with their dataframe * Update SQLExecutor name parameter * Fix Executor Reference Update current_vis() to reference lux.config.executor * Update frame.py * Moved set functions to global config * Fixed Index Issue in Pandas Executor Issue caused when user sets an index. The Pandas Executor was not correctly renaming this new index column to Record in execute_aggregate() * Added tests for set_index functions * Black formatting * Update Pandas Executor to handle NA values Readded missing dropna parameter within execute_aggregate() groupby function call * Updated Pandas Coverage Tests Commented out set_index case which has not been addressed yet * Black Formatting * Update to Pandas Executor Index Handling Cleaned up how execute_aggregrate renames index columns. Now retrieves the index name from vis.data instead of filtering out non-index columns. Created separate test function for when user specifies an index in read_csv. Co-authored-by: 19thyneb <[email protected]> Co-authored-by: Doris Lee <[email protected]> * Initialize Config once only during __init__ (#194) * basic matplotlib chart example * migrate register default action to init * config class * move actions * fixed tests * changes * alright * fix plot_config * black reformat * black reformat Co-authored-by: Doris Lee <[email protected]> Co-authored-by: Caitlyn Chen <[email protected]> Co-authored-by: Ujjaini Mukhopadhyay <[email protected]> * Update README.md * Series Bugfix for describe and convert_dtypes (#197) * bugfix for describe and convert_dtypes * added back metadata series test * black * default to pandas display when df.dtypes printed * Update Lux Docs (#195) * add black to travis * reformat all code and adjust test * remove .idea * fix contributing doc * small change in contributing * update * reformat, update command to fix version * remove dev dependencies * first pass -- inline comments * _config/config.py * delete test notebook * action * line length 105 * executor * interestingness * processor * vislib * tests, travis, CONTRIBUTING * .format () changed * replace tabs with escape chars * update using black * more rewrites and merges into single line * update pyproject.toml and makefile * coalesce data_types into data_type_lookup * black reformat * changed to better variable names * lux not defined error * fixed * black format * config doc updated * fix link for executor * more links * fixed overview * more links fixed * pandas methods no longer included * updates to some docstrings * black reformat * minor fixes * minor fix Co-authored-by: Doris Lee <[email protected]> * Supporting dataframe with integer columns (#203) * bugfix for describe and convert_dtypes * added back metadata series test * black * default to pandas display when df.dtypes printed * various fixes to support int columns * fixed merge conflict issues. vis.data shows None DF. * Override Pandas DataFrames created from I/O pandas operations (#207) * update export tutorial to add explanation for standalone argument * minor fixes and remove cell output in notebooks * added contributing doc * fix bugs and uncomment some tests * remove raise warning * remove unnecessary import * split up rename test into two parts * fix setting warning, fix data_type bugs and add relevant tests * remove ordinal data type * add test for small dataframe resetting index * add loc and iloc tests * fix attribute access directly to dataframe * add small changes to code * added test for qcut and cut * add check if dtype is Interval * added qcut test * fix Record KeyError * add tests * take care of reset_index case * small edits * add data_model to column_group Clause * small edits for row_group * fixes to row group * add config for start and cap for samples * finish sampling config and tests * black formatting * add documentation for sampling config * remove small added issues * minor changes to docs * implement heatmap flag and add tests * black formatting and documentation edits * add pd.io equalities for DataFrames Co-authored-by: Doris Lee <[email protected]> * Merge master into sql-engine + minor mergeconflict fixes * Removing the PYNB * Cleaning up obsolete code * Configuration for topk and sort order (#206) * bugfix for describe and convert_dtypes * added back metadata series test * black * default to pandas display when df.dtypes printed * various fixes to support int columns * skip series vis for df.iterrows series element * config setting for modifying top K and sorting * note about regenerated config * Version lock for jupyter-client (#211) * move to single requirements-dev without lux-widget install manually * pin jedi version * pin jupyter-client version * add back old travis and requirement-dev * Mixed dtype issue (#205) * coalesce data_types into data_type_lookup * merge fixed * merge conflicts * add warning and suggestion on how to fix * formatting for warnings version * change to internal data * legibility update * test added * update test * test updated * xlrd in dev reqs * black * update link * changes to test logic, minor string format for warning Co-authored-by: Doris Lee <[email protected]> * Fixes issue where value_counts was not returning LuxSeries (#210) * add series equality and value counts test * black formatting * fix old value counts test instead * minor fix Co-authored-by: Doris Lee <[email protected]> * bump version * update README Co-authored-by: Caitlyn Chen <[email protected]> Co-authored-by: Caitlyn Chen <[email protected]> Co-authored-by: Doris Lee <[email protected]> Co-authored-by: Kunal Agarwal <[email protected]> Co-authored-by: jinimukh <[email protected]> Co-authored-by: thyneb19 <[email protected]> Co-authored-by: 19thyneb <[email protected]> Co-authored-by: Ujjaini Mukhopadhyay <[email protected]>

jinimukh added 9 commits December 22, 2020 20:55

coalesce data_types into data_type_lookup

2cef000

Merge branch 'master' of https://github.com/lux-org/lux

cad3e84

merge

f197884

merge fixed

1ed9655

merge conflicts

c56f79d

merged

c0388df

Merge branch 'master' of https://github.com/jinimukh/lux

cf045de

Merge branch 'master' into foo

6ae9767

add warning and suggestion on how to fix

aed08d6

jinimukh marked this pull request as draft January 8, 2021 04:03

formatting for warnings version

ede400d

jinimukh added 2 commits January 7, 2021 20:57

change to internal data

d0a3c38

legibility update

f810dc5

jinimukh mentioned this pull request Jan 8, 2021

add new kaggle_survey_2020_responses lux-org/lux-datasets#4

Merged

jinimukh added 2 commits January 8, 2021 14:45

test added

fb1f47b

update test

001d7e7

jinimukh marked this pull request as ready for review January 9, 2021 02:10

test updated

5b95297

jinimukh mentioned this pull request Jan 9, 2021

Add car-data excel data spreadsheet lux-org/lux-datasets#5

Merged

jinimukh and others added 5 commits January 8, 2021 21:27

Merge branch 'master' of https://github.com/lux-org/lux into issues/read

cdb8ddc

xlrd in dev reqs

8425b0f

black

7665806

update link

2d7edf8

changes to test logic, minor string format for warning

cc8e3ee

dorisjlee merged commit 14c141b into lux-org:master Jan 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixed dtype issue #205

Mixed dtype issue #205

jinimukh commented Jan 8, 2021 •

edited

Loading

codecov-io commented Jan 8, 2021 •

edited

Loading

Mixed dtype issue #205

Mixed dtype issue #205

Conversation

jinimukh commented Jan 8, 2021 • edited Loading

codecov-io commented Jan 8, 2021 • edited Loading

Codecov Report

jinimukh commented Jan 8, 2021 •

edited

Loading

codecov-io commented Jan 8, 2021 •

edited

Loading