Skip to content

Commit

Permalink
Adding support to improve temporal action to display different timesc…
Browse files Browse the repository at this point in the history
…ales (#262)

* Add support to improve temporal action to display different timescales

* Resolve PR comments

* Add support to improve temporal action to display different timescales

* Resolve PR comments

* Reformat files using black

* "All-column" vis when only few columns in dataframe #199 (#336)

Co-authored-by: Caitlyn Chen <[email protected]>
Co-authored-by: Doris Lee <[email protected]>

* documentation and cleaning
* added notebook gallery
* update README
* removed scatterplot message in SQLExecutor
* fixed typo in SQL documentation

* update README and bump version

* bump version

* clear propagated vis data intent after PandasExecutor completes execute (#297)

* fix black to stable version

* Scalability: incorporate early pruning optimizations (#368)

* changes from perf branch to config
* added flag for turning on/off lazy maintain optimization

* merged in approx early pruning code

* increase overall sampling start and cap

* Adjust width and length criteria for early pruning vislist based on experiment results; Add warning message and test for early pruning

* black version update

* version lock on black

* * fixed sql tests (added approx to execute constructor)
* fixed sampling config test
* improved Executor documentation

* timescale feature
* adding weekday
* adding docs
* bugfix for y axis line chart export
* fixing temporal axis by adding timescale variable in Clause

Co-authored-by: Doris Lee <[email protected]>
Co-authored-by: Caitlyn Chen <[email protected]>
Co-authored-by: Caitlyn Chen <[email protected]>
  • Loading branch information
4 people authored Apr 30, 2021
1 parent a0cb921 commit ada6173
Show file tree
Hide file tree
Showing 12 changed files with 235 additions and 22 deletions.
16 changes: 16 additions & 0 deletions doc/source/advanced/date.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,22 @@ After changing the Pandas data type to datetime, we see that date field is recog
:align: center
:alt: add screenshot

Visualizing Trends across Different Timescales
----------------------------------------------

Lux automatically detects the temporal attribute and plots the visualizations across different timescales to showcase any cyclical patterns. Here, we see that the `Temporal` tab displays the yearly, monthly, and weekly trends for the number of stock records.

.. code-block:: python
from vega_datasets import data
df = data.stocks()
df.recommendation["Temporal"]
.. image:: https://github.com/lux-org/lux-resources/blob/master/doc_img/date-12.png?raw=true
:width: 600
:align: center

Advanced Date Manipulation
--------------------------

Expand Down
1 change: 1 addition & 0 deletions doc/source/reference/gen/lux.core.frame.LuxDataFrame.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
~LuxDataFrame.combine
~LuxDataFrame.combine_first
~LuxDataFrame.compare
~LuxDataFrame.compute_metadata
~LuxDataFrame.convert_dtypes
~LuxDataFrame.copy
~LuxDataFrame.copy_intent
Expand Down
3 changes: 2 additions & 1 deletion doc/source/reference/gen/lux.executor.Executor.Executor.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
lux.executor.Executor.Executor
lux.executor.Executor.Executor
==============================

.. currentmodule:: lux.executor.Executor
Expand All @@ -19,6 +19,7 @@ lux.executor.Executor.Executor
~Executor.compute_data_type
~Executor.compute_stats
~Executor.execute
~Executor.execute_2D_binning
~Executor.execute_aggregate
~Executor.execute_binning
~Executor.execute_filter
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
lux.executor.PandasExecutor.PandasExecutor
lux.executor.PandasExecutor.PandasExecutor
==========================================

.. currentmodule:: lux.executor.PandasExecutor
Expand All @@ -23,6 +23,7 @@ lux.executor.PandasExecutor.PandasExecutor
~PandasExecutor.execute
~PandasExecutor.execute_2D_binning
~PandasExecutor.execute_aggregate
~PandasExecutor.execute_approx_sample
~PandasExecutor.execute_binning
~PandasExecutor.execute_filter
~PandasExecutor.execute_sampling
Expand Down
3 changes: 2 additions & 1 deletion lux/action/default.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ def register_default_actions():
from lux.action.enhance import enhance
from lux.action.filter import add_filter
from lux.action.generalize import generalize
from lux.action.temporal import temporal

# display conditions for default actions
no_vis = lambda ldf: (ldf.current_vis is None) or (
Expand All @@ -18,7 +19,7 @@ def register_default_actions():
lux.config.register_action("correlation", correlation, no_vis)
lux.config.register_action("distribution", univariate, no_vis, "quantitative")
lux.config.register_action("occurrence", univariate, no_vis, "nominal")
lux.config.register_action("temporal", univariate, no_vis, "temporal")
lux.config.register_action("temporal", temporal, no_vis)
lux.config.register_action("geographical", univariate, no_vis, "geographical")

lux.config.register_action("Enhance", enhance, one_current_vis)
Expand Down
128 changes: 128 additions & 0 deletions lux/action/temporal.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Copyright 2019-2020 The Lux Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import lux
from lux.vis.VisList import VisList
from lux.vis.Vis import Vis
import pandas as pd
from lux.core.frame import LuxDataFrame
from lux.interestingness.interestingness import interestingness
from lux.utils import utils


def temporal(ldf):
"""
Generates line charts for temporal fields at different granularities.
Parameters
----------
ldf : lux.core.frame
LuxDataFrame with underspecified intent.
Returns
-------
recommendations : Dict[str,obj]
Object with a collection of visualizations that result from the Temporal action.
"""
vlist = []
recommendation = {
"action": "Temporal",
"description": "Show trends over <p class='highlight-descriptor'>time-related</p> attributes.",
"long_description": "Temporal displays line charts for all attributes related to datetimes in the dataframe.",
}
for c in ldf.columns:
if ldf.data_type[c] == "temporal":
try:
generated_vis = create_temporal_vis(ldf, c)
vlist.extend(generated_vis)
except:
pass

# If no temporal visualizations were generated via parsing datetime, fallback to default behavior.
if len(vlist) == 0:
intent = [lux.Clause("?", data_type="temporal")]
intent.extend(utils.get_filter_specs(ldf._intent))
vlist = VisList(intent, ldf)
for vis in vlist:
vis.score = interestingness(vis, ldf)
else:
vlist = VisList(vlist)
recommendation["long_description"] += (
" Lux displays the overall temporal trend first,"
+ " followed by trends across other timescales (e.g., year, month, week, day)."
)

# Doesn't make sense to generate a line chart if there is less than 3 datapoints (pre-aggregated)
if len(ldf) < 3:
recommendation["collection"] = []
return recommendation
vlist.sort()
recommendation["collection"] = vlist
return recommendation


def create_temporal_vis(ldf, col):
"""
Creates and populates Vis objects for different timescales in the provided temporal column.
Parameters
----------
ldf : lux.core.frame
LuxDataFrame with underspecified intent.
col : str
Name of temporal column.
Returns
-------
vlist : [Vis]
Collection of Vis objects.
"""
formatted_date = pd.to_datetime(ldf[col], format="%Y-%m-%d")

overall_vis = Vis([lux.Clause(col, data_type="temporal")], source=ldf, score=5)

year_col = col + " (year)"
year_df = LuxDataFrame({year_col: pd.to_datetime(formatted_date.dt.year, format="%Y")})
year_vis = Vis([lux.Clause(year_col, data_type="temporal")], source=year_df, score=4)

month_col = col + " (month)"
month_df = LuxDataFrame({month_col: formatted_date.dt.month})
month_vis = Vis(
[lux.Clause(month_col, data_type="temporal", timescale="month")], source=month_df, score=3
)

day_col = col + " (day)"
day_df = LuxDataFrame({day_col: formatted_date.dt.day})
day_df.set_data_type(
{day_col: "nominal"}
) # Since day is high cardinality 1-31, it can get recognized as quantitative
day_vis = Vis([lux.Clause(day_col, data_type="temporal", timescale="day")], source=day_df, score=2)

week_col = col + " (day of week)"
week_df = lux.LuxDataFrame({week_col: formatted_date.dt.dayofweek})
week_vis = Vis(
[lux.Clause(week_col, data_type="temporal", timescale="day of week")], source=week_df, score=1
)

unique_year_values = len(year_df[year_col].unique())
unique_month_values = len(month_df[month_col].unique())
unique_week_values = len(week_df[week_col].unique())
vlist = []

vlist.append(overall_vis)
if unique_year_values != 1:
vlist.append(year_vis)
if unique_month_values != 1:
vlist.append(month_vis)
if unique_week_values != 1:
vlist.append(week_vis)
return vlist
11 changes: 0 additions & 11 deletions lux/action/univariate.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,17 +89,6 @@ def univariate(ldf, *args):
"description": "Show choropleth maps of <p class='highlight-descriptor'>geographic</p> attributes",
"long_description": f"Occurence displays choropleths of averages for some geographic attribute{examples}. Visualizations are ranked by diversity of the geographic attribute.",
}
elif data_type_constraint == "temporal":
intent = [lux.Clause("?", data_type="temporal")]
intent.extend(filter_specs)
recommendation = {
"action": "Temporal",
"description": "Show trends over <p class='highlight-descriptor'>time-related</p> attributes.",
"long_description": "Temporal displays line charts for all attributes related to datetimes in the dataframe.",
}
# Doesn't make sense to generate a line chart if there is less than 3 datapoints (pre-aggregated)
if len(ldf) < 3:
ignore_rec_flag = True
if ignore_rec_flag:
recommendation["collection"] = []
return recommendation
Expand Down
3 changes: 2 additions & 1 deletion lux/utils/date_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,8 @@ def compute_date_granularity(date_column: pd.core.series.Series):
A str specifying the granularity of dates for the inspected temporal column
"""
# supporting a limited set of Vega-Lite TimeUnit (https://vega.github.io/vega-lite/docs/timeunit.html)
date_fields = ["day", "month", "year"]
# corresponding to Pandas timescales
date_fields = ["day", "month", "year", "dayofweek"]
date_index = pd.DatetimeIndex(date_column)
for field in date_fields:
# can be changed to sum(getattr(date_index, field)) != 0
Expand Down
5 changes: 5 additions & 0 deletions lux/vis/Clause.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ def __init__(
bin_size: int = 0,
weight: float = 1,
sort: str = "",
timescale: str = "",
exclude: typing.Union[str, list] = "",
):
"""
Expand Down Expand Up @@ -67,6 +68,9 @@ def __init__(
Number of bins for histograms, by default 0
weight : float, optional
A number between 0 and 1 indicating the importance of this Clause, by default 1
timescale : str, optional
If data type is temporal, indicate whether temporal associated with timescale (if empty, then plot overall).
If timescale is present, the line chart axis is based on ordinal data type (non-date axis).
sort : str, optional
Specifying whether and how the bar chart should be sorted
Possible values: 'ascending', 'descending', by default ""
Expand All @@ -86,6 +90,7 @@ def __init__(
self.bin_size = bin_size
self.weight = weight
self.sort = sort
self.timescale = timescale
self.exclude = exclude

def get_attr(self):
Expand Down
13 changes: 9 additions & 4 deletions lux/vislib/altair/LineChart.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,11 @@ def initialize_chart(self):
x_attr_abv = str(x_attr.attribute)
y_attr_abv = str(y_attr.attribute)

if x_attr.timescale != "":
x_attr.data_type = "ordinal"
if y_attr.timescale != "":
y_attr.data_type = "ordinal"

if len(x_attr_abv) > 25:
x_attr_abv = x_attr.attribute[:15] + "..." + x_attr.attribute[-10:]
if len(y_attr_abv) > 25:
Expand Down Expand Up @@ -72,7 +77,7 @@ def initialize_chart(self):
axis=alt.Axis(title=y_attr_abv),
)
x_attr_field_code = f"alt.X('{x_attr.attribute}', type = '{x_attr.data_type}', axis=alt.Axis(title='{x_attr_abv}'))"
y_attr_fieldCode = f"alt.Y('{y_attr.attribute}', type= '{y_attr.data_type}', title='{agg_title}', axis=alt.Axis(title='{y_attr_abv}')"
y_attr_field_code = f"alt.Y('{y_attr.attribute}', type= '{y_attr.data_type}', title='{agg_title}', axis=alt.Axis(title='{y_attr_abv}'))"
else:
agg_title = get_agg_title(x_attr)
x_attr_spec = alt.X(
Expand All @@ -84,14 +89,14 @@ def initialize_chart(self):
y_attr_spec = alt.Y(
str(y_attr.attribute), type=y_attr.data_type, axis=alt.Axis(title=y_attr_abv)
)
x_attr_field_code = f"alt.X('{x_attr.attribute}', type = '{x_attr.data_type}', title='{agg_title}', axis=alt.Axis(title='{x_attr_abv}')"
y_attr_fieldCode = f"alt.Y('{y_attr.attribute}', type= '{y_attr.data_type}', axis=alt.Axis(title='{y_attr_abv}')"
x_attr_field_code = f"alt.X('{x_attr.attribute}', type = '{x_attr.data_type}', title='{agg_title}', axis=alt.Axis(title='{x_attr_abv}'))"
y_attr_field_code = f"alt.Y('{y_attr.attribute}', type= '{y_attr.data_type}', axis=alt.Axis(title='{y_attr_abv}'))"
self.data = AltairChart.sanitize_dataframe(self.data)
chart = alt.Chart(self.data).mark_line().encode(x=x_attr_spec, y=y_attr_spec)
chart = chart.interactive() # Enable Zooming and Panning
self.code += f"""
chart = alt.Chart(visData).mark_line().encode(
y = {y_attr_fieldCode},
y = {y_attr_field_code},
x = {x_attr_field_code},
)
chart = chart.interactive() # Enable Zooming and Panning
Expand Down
50 changes: 50 additions & 0 deletions tests/test_action.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,56 @@
from lux.vis.Vis import Vis


def test_temporal_action(global_var):
airbnb_df = pd.read_csv(
"https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/airbnb_nyc.csv"
)
flights_df = pd.read_csv(
"https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/flights.csv"
)
date_df = pd.DataFrame(
{
"date": [
"2019-01",
"2014-02",
"2020-03",
"2013-04",
"2012-05",
"2019-01",
"2020-02",
"2013-03",
"2020-04",
"2000-05",
"2000-01",
"2000-02",
"2004-12",
"2004-06",
"2020-05",
"2020-01",
"2020-02",
"2020-03",
"2020-04",
"2020-05",
]
}
)
test_data = [airbnb_df, flights_df, date_df, pytest.car_df, pytest.olympic]
test_data_vis_count = [4, 4, 2, 1, 1]
for entry in zip(test_data, test_data_vis_count):
df, num_vis = entry[0], entry[1]
df._repr_html_()
assert ("Temporal" in df.recommendation, "Temporal visualizations should be generated.")
recommended = df.recommendation["Temporal"]
assert (len(recommended) == num_vis, "Incorrect number of temporal visualizations generated.")
temporal_col = [c for c in df.columns if df.data_type[c] == "temporal"]
overall_vis = [
vis.get_attr_by_channel("x")[0].attribute
for vis in recommended
if vis.score == 4 or vis.score == 5
]
assert temporal_col.sort() == overall_vis.sort()


def test_vary_filter_val(global_var):
lux.config.set_executor_type("Pandas")
df = pytest.olympic
Expand Down
Loading

0 comments on commit ada6173

Please sign in to comment.