Skip to content

Proposal: Aesthetic Mappings

Graham Wills edited this page Nov 12, 2015 · 4 revisions

Aesthetic value mappings are tricky and can become a slippery slope to something overly complex. Brunel is founded on simplicity and we'd like to avoid adding fundamental complexities to support relatively rarely used features. However, we also want to make sure we don't destroy potentially useful features by over-simplifying the syntax too soon. A common, fine line to walk..

Problems for Brunel to Solve

First we should agree on the problems Brunel should solve vs. not. Generally we feel the language should support the following where an 'aesthetic representation' is the actual appearance in the visualization (i.e. the color, size, opacity or other future aesthetic):

  • Control over the aesthetic representation of the minimum and/or maximum data values for continuous scales
    • Sometimes we need a 'mid-point' value as well (typically for color)
    • Where sensible only mappings for the min or the max is required
  • Control over unique aesthetic representations for each categorical value
    • An ordered list where each aesthetic representation is applied depending on the data order
    • An explicit mapping of category values to aesthetic representations

As an example, currently Brunel offers some control over the actual aesthetic representation of data values. Generally the mappings appear after the field, separated by a colon (:) in an aesthetic statement:

size(density:1000)

The above means that the largest size of the element representing the largest data value will be 1000% of the size it would have been without any variation using the size() aesthetic.

Syntax Proposal

Since each aesthetic type may have its own caveats, it is helpful to review examples of each. Edge cases (such as too few mappings) are presumed to be dealt with in some reasonable way.

Color

Apply specific colors to the values of a categorical field in the order they appear in the data:

color (categorical_field: Red, Blue)

Map these colors to some specific values that are in the data:

color (categorical_field: Red @ "Coke", Blue @ "Pepsi")

Apply a color range to the values of a continuous field:

color (continuous_field: Red - Blue)

Apply a color range to the values of a continuous field including a specific mid-point (the data value of 50 will be white):

color (continuous_field: Red - White @ 50 - Blue)

Note, data values for mid-points would be supported in output domains but values for the min/max would be supported in input domains. Is this a problem?

From the current docs (leave as is): The colors can be named colors, CSS or #RRGGBB. Asterisks appended to the end mute the strengths of the colors making them less vivid and more suitable for large areas.

Size

Size is currently a number that is a percent. A proposal here is to make the fact that they are percents more obvious.

Apply specific sizes to the values of a categorical field in the order they appear in the data:

size (categorical_field: 50%, 100%, 150%)

Map these sizes to some specific values that are in the data:

size (categorical_field: 50% @ "Small", 100% @ "Medium", 150% @ "Large")

Apply a size range to the values of a continuous field:

size (continuous_field: 50% - 150%)

Sets the maximum size only:

size (continuous_field: 200%)

Apply a size range to the values of a continuous field including a specific mid-point (the data value of 50 will be 100%):

size (continuous_field: 50% - 100% @ 50 - 150%)

Opacity

A straight list of opacity values to use for categories (unclear how useful):

opacity (categorical_field: .1, .5, 1.0)

Map these opacity values to specific data values:

opacity (categorical_field: .1 @ "Not Selected", 1.0 @ "Selected")

Apply an opacity range to the values of a continuous field:

opacity (continuous_field: .1 - 1.0)

Sets the low value of the opacity only:

opacity (continuous_field: 0.6)

Question: Should these be expressed in %s like the size proposal?

Future Proofing Considerations

There are a few things we should consider now to avoid significant language changes down the road.

Color components

At some point we may need to add in color components such as hue, saturation, brightness or RGB. These could be done as separate aesthetic statements as in:

color.hue(continuous_field)

Multiple Fields

It has been suggested to support multiple fields for an aesthetic such as:

color(f1,f2)

..would result is mapping of colors to all permutations of the unique values for the two fields. Perhaps an option of crossing vs. nesting would be needed here as well? If so, then perhaps the following syntax?:

color(f1*f2)

color(f1/f2)

However it is unlikely we would support combinations of operators--so perhaps the comma notation works best with some option for mapping missing permutations.

Input Domains

All of the above examples define an output domain for the field. We likely will need input domains. One idea would be to express this as:

color(field:output_domain:input_domain)

color(cities:["Miami","Chicago"]: Orange, Red)

Where the optional input_domain provides explicit data domains overriding what was determined using the data. Another idea might be:

color (field: input_domain -> output_domain)

color (cities: "Miami", "Chicago" -> Orange, Red)

size (troops: 0-100 -> 10% - 150%)

Version 0.8 Plan

For 0.8 we will not try and consider the input domain (the data we map from) and will let that be determined automatically. We will support only syntax for the output space.

The general syntax for an aesthetic is aesthetic(field), which will set everything up automatically. The syntax for an output domain is aesthetic(field:result) where result is the output domain we will use. This output can have the following forms:

  • A named output space (like a name color scale, e.g. diverging)
  • A literal value (like a color name or size, e.g red, #ff00ff, 100)
  • A percentage (for size, e.g 200%)
  • A list of values surrounded by square brackets (e.g. [red,white,green], or [10%,100%,1000%])
  • As syntactic sugar for the above, we allow - to be used a separator , e.g. red-white-green

When a single value is provided, it typically has the meaning that the scale will run from a defined low value up to that high value. When two or more values are provided, those are fixed points used for interpolation. For categorical fields, these are instead used as a cycle.