Skip to content

Latest commit

 

History

History
363 lines (303 loc) · 13.3 KB

README.org

File metadata and controls

363 lines (303 loc) · 13.3 KB

structural

This is a small library that provides macros to help convert destructured code into efficient code via type hints.

One of the big problems one runs into in practice is that the expressiveness of destructuring - while elegant - typically renders inefficient code from the clojure compiler. The default is more generic, polymorphic code. If we know the types and are willing to hint to optimize for performance, we’d like to have clojure just efficiently unpack our structures.

This is particularly important for code that lives on a hot loop, where (idiomatic) forms can lead to a “death by a thousand cuts” situation.

installation

https://img.shields.io/clojars/v/structural.svg https://clojars.org/structural

Add the current version from clojars to your project.clj or deps.edn.

(ns blah 
  (:require [structural.core :as s])
  (:import [clojure.lang Indexed Counted IPersistentMap]))

(defprotocol IBlah 
 (blah [this]))

(defrecord point [^long x ^long y]
   IBlah
   (blah [_] (+ x y)))

;;with-slots acts like clojure.core/let, except it's smart
;;about hinted destructuring, and adds the :fields
;;key to map-based destructuring to allow binding to 
;;arbitrary fields/methods on the hinted object.  This
;;is usually substantially faster than e.g. keyword
;;lookup on a record.  You can mix keys and fields freeely,
;;and we'll try to find the most efficient path.


(s/with-slots [[^IPersistentMap m ^Indexed v ^point p]  [{:a 2 :b 3} [4 5 6] (->point 1 2)]
               {:fields [count] :keys [a b]}       m
               [c d e]                             v
               {:fields [x y blah]}                p]
  (+ a b c d e blah count))


;;The macros sdefn and sfn (names may change) are for 
;;structural function definitions.  They allow you to write
;;functions as you would in clojure, with the semantics of 
;;with-slots applied to the function arguments, and all 
;;`let` bindings in the body (which are macrolet bound
;;to convert to `with-slots` invocations).

(s/sdefn add-vector [[^long x ^long y]] 
  (+ x y))

;; [:with-slots.warning/using-generic :nth :ns
;; #namespace[examples.core] :fields [x y] :coll
;; arg16626 :try-hinting [clojure.lang Indexed IPersistentVector
;; java.util.List]]

(s/sdefn add-vector [^Indexed [^long x ^long y]] 
  (+ x y))

;If you were to benchmark add-vector, you would notice about a 5.85x
;;faster runtime compared to stock generic destructuring.
;;These gains are even more dramatic for records and clojure/java types
;;with direct field access/methods, where we see around 9-10x improvement
;;in access times.

Intro

Functions like:

(defn add [[x y]]
  (+ x y))

will incur a cost in destructuring, as the code will expand to something like…

(defn add [xy]
  (let [x (nth xy 0)
        y (nth xy 1)]
    (+ x y))

where clojure.core/nth is polymorphic and has to run through several tests to determine what the input is and how to coerce it to an appropriate operation for indexed lookup.

A faster route would be something like..

(defn add [^clojure.lang.Indexed xy]
  (let [x (.nth xy 0)
        y (.nth xy 1)]
    (+ x y))

wherein we know the type of the input will always be something supporting Indexed, thus supporting the .nth method, thus something we can to the compiler to emit a direct method invocation.

For smallish function optimizations, this isn’t too bad, but it can get hairy for nested destructuring. Ideally, we’d like to preserve the destructuring forms, and broaden the compiler’s knowledge on how to emit efficient code in the face of type information (to included nested types):

For an initial cut, we define the with-slots macro to establish bindings that respect type hints but act like let bindings for destructuring purposes.

structural.core/with-slots

Allows for efficient, type-based destructuring similar to the idiomatic destructuring forms of Clojure, with some limitations. Bindings are presented as the typical vector, with an even number of entries, where the preceding odd binding establishes binds for the even successor. Unlike typical forms, bindings leverage type-hinting information - both on the left hand side and the right hand side - to establish efficient operations beyond the generic destructuring forms established with maps and vectors, e.g. get and nth.

Callers may use {:fields [a b ^clojure.lang.Counted c] }, along with a type-hinted rhs, to denote establishing bindings for a, b, c, by invoking like-named direct, type-hinted field applications on the rhs, ala (.a ^some-type rhs).

Any binding var hinted on the LHS will propogate its hint throughout later bindings. This allows an expressive form of efficient destructuring for the consenting adult, which allows idiomatic expressivity without the accompanying significant loss of performance.

map destructuring for {:keys […]} follows that of :fields, except the bindings are established via either a (.valAt ..) or (.get ..) or (get …) depending on the presented type, get being the fallback. This allows usage with types supporting the java.util.Map interface. Literal maps are automatically inferred with efficient getters.

Vector or indexed destructuring is similarly supported, [^some-type x y] ^clojure.lang.Indexed coll will invoke efficient .nth indexing operations rather than the slower, more general nth. Depending on the presented type, either .nth, .get, or nth will be used, allowing operation with structures supporting the java.util.List interface. Literal vectors are automatically inferred with efficient getters. The & rest notation is currently NOT supported…

The remaining rules act identically to let semantics. If a symbol is bound to the LHS, then the binding is passed through untouched (including hints).

with-slots tries to scan the input bindings to find discrepancies (such as duplicate binds), and to re-use existing hinted information for binds. In the case that the user decides to re-hint a RHS var that has already been hinted a-priori, with-slots will allow the hint for that binding, but revert to prior hinting unless the user continues to specify new hints. This seems rare in practice.

It’s common to import the symbols for the [clojure.lang Counted Indexed] interfaces when using with-slots.

An example:

By default, structural will warn us if we’re dispatching to slow operations inside a with-slots invocation, and how to help hint stuff:

structural.core> (let [m {:a 2 :b 3}] (with-slots [{:keys [a b]} m] a))
[:with-slots.warning/using-generic 
  :get :ns #namespace[structural.core] 
  :fields {:keys [a b]} :coll m :try-hinting [clojure.lang Associative IPersistentMap java.util.Map]]
2

If we follow the directives, we can get rid of the warning:

structural.core> (let [m {:a 2 :b 3}] (with-slots [{:keys [a b]} ^clojure.lang.IPersistentMap m] a))
2

No warnings this time, and if we look at the macroexpansion:

structural.core> (use 'clojure.pprint)
nil
structural.core> (binding [*print-meta* true] 
                      (pprint (macroexpand-1 '(with-slots [{:keys [a b]} ^clojure.lang.IPersistentMap m] a))))
(clojure.core/let
 [^clojure.lang.IPersistentMap coll18242
  ^clojure.lang.IPersistentMap m
  a
  (.valAt ^clojure.lang.IPersistentMap coll18242 :a)
  b
  (.valAt ^clojure.lang.IPersistentMap coll18242 :b)]
 a)
(ns blah
 (:import [clojure.lang Indexed Counted])
;;a botmove is a pair of vectors...hints aren't explicitly
;;necessary, but we'll use them here for edification:
(defrecord botmove [^clojure.lang.IPersistentVector path
                    ^clojure.lang.IPersistentVector position])

(with-slots
;;the :fields key allows us to define type-hinted method invocations
  [{:fields [^Counted path
             ^Indexed position]} ^botmove (->botmove [] [1 2])
;;literal structures are automatically hinted; in this case
;;we efficient destructure :keys into .valAt calls, and :fields
;;into a hinted .hashCode
   {:keys [a b] :fields [hashCode]}    {:a 2 :b 3}
;;Vectors expand into (ideally) hinted calls to .nth.  Since we've
;;hinted position as ^Indexed
   [x y]          position         
   path-length   (.count path)]
 [hashCode (+ x y)])

;;[2027821082 3]

If we examine the expression’s macroexpansion, we can see that with-slots is dutifully walking the expression, resolving types, and destructuring.

structural.core> 
(def the-expression 
  '(with-slots
    [{:fields [^Counted path
               ^Indexed position]} ^botmove (->botmove [] [1 2])
     {:keys [a b] :fields [hashCode]}    {:a 2 :b 3}
     [x y]          position         
     path-length   (.count path)]
   [hashCode (+ x y)]))

structural.core> (binding [*print-meta* true] (pprint (macroexpand-1 the-expression)))
(clojure.core/let
 [^botmove coll18285
  (->botmove [] [1 2])
  ^Counted path
  (.path ^botmove coll18285)
  ^Indexed position
  (.position ^botmove coll18285)
  ^clojure.lang.IPersistentMap coll18286
  {:a 2, :b 3}
  hashCode
  (.hashCode ^clojure.lang.IPersistentMap coll18286)
  a
  (.valAt ^clojure.lang.IPersistentMap coll18286 :a)
  b
  (.valAt ^clojure.lang.IPersistentMap coll18286 :b)
  x
  (.nth ^Indexed position 0)
  y
  (.nth ^Indexed position 1)
  path-length
  (.count path)]
 [hashCode (+ x y)])
nil

This provides a way to tune performance without deviating too far from Clojure idioms, and provides warnings when the caller is entering a slow path (e.g. causing a function call to get or nth). It’s basically a poor man’s optimizing compiler for the use-case of unpacking type-hinted structures for efficient reads.

The genesis of this library was actually for performance optimizing an ICPFC competition entry. The following examples are naive, but illustrative (a more involved setup would use criterium):

structural.core> (defn add [[x y]] (+ x y))
structural.core> (time (dotimes [i 10000000] (add [1 2])))
"Elapsed time: 140.237211 msecs"
structural.core> (defn add2 [v] (with-slots [[x y]  ^Indexed v] (+ x y)))
#'structural.core/add2
structural.core> (time (dotimes [i 10000000] (add2 [1 2])))
"Elapsed time: 86.436209 msecs"
structural.core> (defn add3 [v] (with-slots [{:fields [x y]}  ^xy v] (+ x y)))
#'structural.core/add3
structural.core> (time (dotimes [i 10000000] (add3 (->xy 1 2))))
"Elapsed time: 29.117979 msecs"

structural.core/sfn, sdefn

Analagous to clojure.core/fn and defn, they are convenience wrappers to extend the semantics of with-slots to functions definitions. They also rewrite let into with-slots forms (where the default common form of with-slots is equivalent to clojure’s let exactly, except it will emit (optional) warnings about potential slow access paths.

The equivalent function example from the previous section would be:

(structural.core/sdefn add3 [^xy {:fields [x y]}] 
  (+ x y))

Clojure destructuring idioms like :as should work out of the box; behavior for :or is currently undefined....

At the time of writing, only single function bodies are supported, although it should trivial to extend to multiple function bodies.

Intended Uses

This is broadly useful for any destructuring code, but will likely be most useful and practical for highly destructured code paths that happen to fall on hot paths indicated by profiling. There’s no reason the clojure compiler (or a variant using core.analyzer) couldn’t leverage this type of performance analysis directly too. It’s probably best to go with stock destructuring, and treat this as another optimization step after testing.

One area that really benefits is the field-based destructuring. At a language level, Clojure doesn’t have this at all. Being able to flow hints and unpack fields is extremely useful when trying to manage performance, particularly when leveraging interop and direct field access from records and types.

Currently, the hinting is directly focused on interop. Thus you are somewhat tied to the whatever the platform’s implementation denotes (e.g. clojure.lang for CLJ jvm). This is a bit brittle, and will likely be extended to support a generic ^counted and ^indexed hint that will dispatch to the appropriate platform-specific backend (e.g. protocols in cljs).

I’d also like to leverage far more sophisticated analyzer support, rather than the current janky code-walker macrology. We should be able to have a much more elegant set of definitions that can flow types and hints.

Add efficient typed array destructuring akin to vector/nth stuff.

Also, provide optional replacements for =defn= =fn= =let= and any other binding forms.

License

Copyright © 2019 joinr

This program and the accompanying materials are made available under the terms of the Eclipse Public License 2.0 which is available at http://www.eclipse.org/legal/epl-2.0.

This Source Code may also be made available under the following Secondary Licenses when the conditions for such availability set forth in the Eclipse Public License, v. 2.0 are satisfied: GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version, with the GNU Classpath Exception which is available at https://www.gnu.org/software/classpath/license.html.