Document timestamp encoder precision fix #1623

msosnicki · 2024-12-02T13:43:48Z

PR Checklist (not all items are relevant to all PRs)

Added unit-tests (for runtime code)
Added bootstrapped code + smoke tests (when the rendering logic is modified)
Added build-plugins integration tests (when reflection loading is required at codegen-time)
Added alloy compliance tests (when simpleRestJson protocol behaviour is expanded/updated)
Updated dynamic module to match generated-code behaviour
Added documentation
Updated changelog

msosnicki · 2024-12-02T13:55:37Z

modules/core/src/smithy4s/internals/DocumentEncoderSchemaVisitor.scala

-            DNumber(epochSecondsWithNanos)
+            val bd = (BigDecimal(ts.epochSecond) + (BigDecimal(
+              ts.nano
+            ) * BigDecimal(10).pow(-9))).underlying.stripTrailingZeros


potential issue: this is different from the current behavior in the regular encoders, because it strips all the trailing zeros, rather than setting the scale to 9 at all times. Based on this comment jsoniter will respect the scales set on the BigDecimal, so striping all trailing zeros seem like a more general solution, rather than doing that only if nano == 0.

Please let me know what you think, it definitely has to be consistent across both visitors, so I will adjust once this discussion is resolved.

I'm happy with stripping trailing zeros, as long as the adjustment on the jsoniter-based codecs is not too complicated.

Perhaps I missed something, but to me it looks like it's as straighforward as the Document fix.

kubukoz · 2024-12-06T14:41:12Z

@Baccata can you take a look?

ghostbuster91 · 2024-12-09T17:21:00Z

modules/json/src/smithy4s/json/internals/SchemaVisitorJCodec.scala

-        } else {
-          out.writeVal(BigDecimal(x.epochSecond) + x.nano / 1000000000.0)
-        }
+        val bd = BigDecimal(x.epochSecond) + (x.nano * BigDecimal(10).pow(-9))


issue: shouldn't it be BigDecimal(x.epochSecond) + (BigDecimal(x.nano) * BigDecimal(10).pow(-9))?

otherwise we might lose some information due to overflow before the conversion to BigDecimal happens.

The result of this multiplication is already a BigDecimal, so I think it is fine

Baccata · 2024-12-10T13:17:14Z

I'm happy with the change. Did you want to do anything else on this PR ?

plokhotnyuk · 2024-12-10T13:47:46Z

I stored the following script in timestamp.scala and ran it using scala-cli --power timestamp.scala:

//> using jmh
//> using jmhVersion 1.37
package bench

import java.util.concurrent.TimeUnit
import org.openjdk.jmh.annotations.*

@BenchmarkMode(Array(Mode.Throughput))
@OutputTimeUnit(TimeUnit.SECONDS)
@Warmup(iterations = 2, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 1, timeUnit = TimeUnit.SECONDS)
@State(Scope.Thread)
@Fork(1)
class Timestamp {

  @Param(Array("123456789", "123456000", "123000000", "0"))
  var nano: Int = 123456789

  var epochSecond: Long = System.currentTimeMillis() / 1000

  @Benchmark
  def pow() = BigDecimal((BigDecimal(epochSecond) + (nano * BigDecimal(10).pow(-9))).underlying.stripTrailingZeros)

  @Benchmark
  def scale() = BigDecimal(java.math.BigDecimal.valueOf(epochSecond).add(java.math.BigDecimal.valueOf(nano.toLong, 9).stripTrailingZeros))
}

On Intel® Core™ i7-11800H I got the following result using JDK 21:

Benchmark           (nano)   Mode  Cnt          Score   Error  Units
Timestamp.pow    123456789  thrpt    2     689083.298          ops/s
Timestamp.pow    123456000  thrpt    2     753617.071          ops/s
Timestamp.pow    123000000  thrpt    2     745549.753          ops/s
Timestamp.pow            0  thrpt    2     768658.620          ops/s
Timestamp.scale  123456789  thrpt    2  184984953.592          ops/s
Timestamp.scale  123456000  thrpt    2  100797574.077          ops/s
Timestamp.scale  123000000  thrpt    2   83752128.337          ops/s
Timestamp.scale          0  thrpt    2  204584419.459          ops/s

msosnicki · 2024-12-11T15:40:26Z

@plokhotnyuk

The version you posted unfortunately breaks the bootstrappedJS/DocumentSpec (possibly something that could be reported to Scala JS)

Below you can find the results for the version that passes all tests, it behaves slightly worse, but still much better than the pow alternative:

@Benchmark 
def scalaBd() = BigDecimal(epochSecond) + BigDecimal(nano * 1/1000000000.0).underlying().stripTrailingZeros()

Benchmark             (nano)   Mode  Cnt          Score   Error  Units
Timestamp.pow      123456789  thrpt    2     657035.513          ops/s
Timestamp.pow      123456000  thrpt    2     716802.325          ops/s
Timestamp.pow      123000000  thrpt    2     714326.181          ops/s
Timestamp.pow              0  thrpt    2     719915.018          ops/s
Timestamp.scalaBd  123456789  thrpt    2   10934885.665          ops/s
Timestamp.scalaBd  123456000  thrpt    2    9444356.670          ops/s
Timestamp.scalaBd  123000000  thrpt    2    9101303.611          ops/s
Timestamp.scalaBd          0  thrpt    2   51482186.648          ops/s
Timestamp.scale    123456789  thrpt    2  164914297.116          ops/s
Timestamp.scale    123456000  thrpt    2   84470614.413          ops/s
Timestamp.scale    123000000  thrpt    2   69055873.696          ops/s
Timestamp.scale            0  thrpt    2  184479155.075          ops/s

measured on JDK 21, AMD Ryzen 7 3700X.

EDIT: unfortunately this one breaks the same test for Scala Native

plokhotnyuk · 2024-12-11T16:09:01Z

The workaround for Scala.js and Scala Native is using of yet more optimized version:

BigDecimal({
   val es = java.math.BigDecimal.valueOf(epochSecond)
   if (nano == 0) es
   else es.add(java.math.BigDecimal.valueOf(nano, 9).stripTrailingZeros)
})

Here are results of benchmarks:

Benchmark              (nano)   Mode  Cnt          Score   Error  Units
Timestamp.pow       123456789  thrpt    2     748598.854          ops/s
Timestamp.pow       123456000  thrpt    2     672352.329          ops/s
Timestamp.pow       123000000  thrpt    2     749301.419          ops/s
Timestamp.pow               0  thrpt    2     695728.197          ops/s
Timestamp.scalaBd   123456789  thrpt    2   12535456.203          ops/s
Timestamp.scalaBd   123456000  thrpt    2   12591612.068          ops/s
Timestamp.scalaBd   123000000  thrpt    2   12442059.688          ops/s
Timestamp.scalaBd           0  thrpt    2   59858539.270          ops/s
Timestamp.scaleOpt  123456789  thrpt    2  181264531.330          ops/s
Timestamp.scaleOpt  123456000  thrpt    2   92364559.709          ops/s
Timestamp.scaleOpt  123000000  thrpt    2   78387325.699          ops/s
Timestamp.scaleOpt          0  thrpt    2  203003351.828          ops/s

msosnicki · 2024-12-11T16:53:13Z

@plokhotnyuk thanks for help, all green now

plokhotnyuk · 2024-12-11T20:11:31Z

@msosnicki I've provided a serialization method for timestamps in the 2.32.0 release of jsoniter-scala, so after version upgrade you can serialize timestamps without any allocations or redundant CPU overhead as here:

def encodeValue(x: Timestamp, out: JsonWriter): Unit = out.writeTimestampVal(x.epochSecond, x.nano)

Also, DNumber creation could be improved by adding support of negative epochSecond values:

BigDecimal({
   val es = java.math.BigDecimal.valueOf(epochSecond)
   if (nano == 0) es
   else es.add(java.math.BigDecimal.valueOf({
      if (epochSeconds < 0) -nano
      else nano
   }.toLong, 9).stripTrailingZeros)
})

plokhotnyuk · 2024-12-12T10:21:04Z

@msosnicki Please read my previous comment that I updated recently, where I propose more correct and efficient solution.

msosnicki · 2024-12-12T10:22:29Z

BigDecimal({
   val es = java.math.BigDecimal.valueOf(epochSecond)
   if (nano == 0) es
   else es.add(java.math.BigDecimal.valueOf({
      if (epochSeconds < 0) -nano
      else nano
   }.toLong, 9).stripTrailingZeros)
})

Will add a comment about the new jsoniter method.

Quesion about the negative nano from your example, where is it documented? The same value before 1970 could be expressed in two ways: with flipping the sign of nano like you did, or adding to a negative epochSeconds.

plokhotnyuk · 2024-12-12T10:42:49Z

@msosnicki @Baccata I've found only following documentation about the timestamp serialization methods: https://smithy.io/2.0/spec/protocol-traits.html#timestampformat-trait

It requires truncation to milliseconds and do not mention any support of dates before the Linux epoch (Jan 1, 1970).
But from current sources of smithy4s.Timestamp I see that we support parsing and serialization up to nanoseconds (without truncation to milliseconds) and for dates before the Linux epoch.

Also, some method implementations of smith4s.Timestamp require fixes to support negative epochSecond. For example def epochMilli: Long = epochSecond * 1000 + nano / 1000000 should be replaced by

def epochMilli: Long = epochSecond * 1000 + {
  if (epochSecond < 0) -nano
  else nano
} / 1000000

msosnicki · 2024-12-13T08:37:17Z

@msosnicki @Baccata I've found only following documentation about the timestamp serialization methods: https://smithy.io/2.0/spec/protocol-traits.html#timestampformat-trait

It requires truncation to milliseconds and do not mention any support of dates before the Linux epoch (Jan 1, 1970). But from current sources of smithy4s.Timestamp I see that we support parsing and serialization up to nanoseconds (without truncation to milliseconds) and for dates before the Linux epoch.

Also, some method implementations of smith4s.Timestamp require fixes to support negative epochSecond. For example def epochMilli: Long = epochSecond * 1000 + nano / 1000000 should be replaced by
def epochMilli: Long = epochSecond * 1000 + {
  if (epochSecond < 0) -nano
  else nano
} / 1000000

Yeah, I think the logic needs to change in multiple places, to correctly handle values below epoch. Will look into that

EDIT: only changed the decoding side, nanos are expected to always be positive (as in Timestamp class, tests that I added were raising constructor exceptions before).
So during decode we're always rounding down for seconds part (also for negative), no need to flip sings of nano in other places.
It is more consistent with other parts of the code, like Timestamp constructors.
For example 1969-12-31T23:59:58.123Z (-1877 in millis) is expected to be expressed as Timestamp(-2L, 123000000), rather than Timestamp(-1, 877000000).

Fix encoder

427bc5a

msosnicki commented Dec 2, 2024

View reviewed changes

msosnicki changed the title ~~Fix encoder~~ Document timestamp encoder precision fix Dec 3, 2024

Encode epoch seconds without trailing 0s

ed4ff38

ghostbuster91 reviewed Dec 9, 2024

View reviewed changes

msosnicki added 3 commits December 11, 2024 15:33

More performant way to use BigDecimals

724cfba

Version that doesn't break JS

0caa2a1

Reformat

9b1201f

Version that works for both JS and Native

7f20f9b

plokhotnyuk mentioned this pull request Dec 11, 2024

java.math.BigDecimal.valueOf(0, 9).stripTrailingZeros.scale returns 9 instead of 0 scala-js/scala-js#5085

Closed

msosnicki marked this pull request as ready for review December 12, 2024 10:03

Add comment, improve tests and handle timestamps before linux epoch

21b49d3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document timestamp encoder precision fix #1623

Document timestamp encoder precision fix #1623

msosnicki commented Dec 2, 2024 •

edited

Loading

msosnicki Dec 2, 2024

Baccata Dec 9, 2024

msosnicki Dec 9, 2024

kubukoz commented Dec 6, 2024

ghostbuster91 Dec 9, 2024 •

edited

Loading

msosnicki Dec 10, 2024

Baccata commented Dec 10, 2024 •

edited

Loading

plokhotnyuk commented Dec 10, 2024 •

edited

Loading

msosnicki commented Dec 11, 2024

plokhotnyuk commented Dec 11, 2024 •

edited

Loading

msosnicki commented Dec 11, 2024

plokhotnyuk commented Dec 11, 2024 •

edited

Loading

plokhotnyuk commented Dec 12, 2024

msosnicki commented Dec 12, 2024

plokhotnyuk commented Dec 12, 2024 •

edited

Loading

msosnicki commented Dec 13, 2024 •

edited

Loading

Document timestamp encoder precision fix #1623

Are you sure you want to change the base?

Document timestamp encoder precision fix #1623

Conversation

msosnicki commented Dec 2, 2024 • edited Loading

PR Checklist (not all items are relevant to all PRs)

msosnicki Dec 2, 2024

Choose a reason for hiding this comment

Baccata Dec 9, 2024

Choose a reason for hiding this comment

msosnicki Dec 9, 2024

Choose a reason for hiding this comment

kubukoz commented Dec 6, 2024

ghostbuster91 Dec 9, 2024 • edited Loading

Choose a reason for hiding this comment

msosnicki Dec 10, 2024

Choose a reason for hiding this comment

Baccata commented Dec 10, 2024 • edited Loading

plokhotnyuk commented Dec 10, 2024 • edited Loading

msosnicki commented Dec 11, 2024

plokhotnyuk commented Dec 11, 2024 • edited Loading

msosnicki commented Dec 11, 2024

plokhotnyuk commented Dec 11, 2024 • edited Loading

plokhotnyuk commented Dec 12, 2024

msosnicki commented Dec 12, 2024

plokhotnyuk commented Dec 12, 2024 • edited Loading

msosnicki commented Dec 13, 2024 • edited Loading

msosnicki commented Dec 2, 2024 •

edited

Loading

ghostbuster91 Dec 9, 2024 •

edited

Loading

Baccata commented Dec 10, 2024 •

edited

Loading

plokhotnyuk commented Dec 10, 2024 •

edited

Loading

plokhotnyuk commented Dec 11, 2024 •

edited

Loading

plokhotnyuk commented Dec 11, 2024 •

edited

Loading

plokhotnyuk commented Dec 12, 2024 •

edited

Loading

msosnicki commented Dec 13, 2024 •

edited

Loading