feat: define processing time column #7209

fuyufjh · 2023-01-05T08:10:53Z

Is your feature request related to a problem? Please describe.

Provide a column with event ingesting time, in case there is no appropriate time column in user data.

Describe the solution you'd like

Based on #6952, we can introduce proc_time as a column in the source, which is same with Flink: https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/concepts/time_attributes/#processing-time

CREATE SOURCE user_actions (
  user_name STRING,
  data STRING,
  user_action_time AS PROCTIME() -- declare an additional field as a processing time attribute
) WITH (
  ...
);


SELECT TUMBLE_START(user_action_time, INTERVAL '10' MINUTE), COUNT(DISTINCT user_name)
FROM user_actions
GROUP BY TUMBLE(user_action_time, INTERVAL '10' MINUTE);

Describe alternatives you've considered

1. Hidden column.

The problem of this approach is that it doesn't explicitly tell the users the column is associated with source/table. For example:

This is clear:

select _proc_time, id, value from t where k = 2;

This is ambiguous:

select _proc_time, id, value from a join b where a.bid = b.id where k = 1
-- ERROR: _proc_time should be either a._proc_time or b._proc_time

2. System function

This is actually a different thing. It is apparent that functions are evaluated when executing it, rather than data injected.

It doesn't matter for this simple query

select _proc_time, id, value from t where k = 2;

But will be a problem for more complex ones:

select proc_time(), id, value from a join b where a.bid = b.id where k = 1
-- When is proc_time() evaluated? Intuitively, it should be after the two events joined.

We had better call it now() or something else to distinguish from our topic here.

Additional context

No response

The text was updated successfully, but these errors were encountered:

st1page · 2023-01-06T08:29:16Z

prefer the Hidden column, it should be a record's properties but not a impure function

fuyufjh · 2023-01-06T08:48:57Z

prefer the Hidden column, it should be a record's properties but not a impure function

My concern is that the concept "hidden column" is unfamiliar to users, especially Postgres users. For example, he/she may feel confused when writing a.proc_time but there is no proc_time column in describe table or show create table.

it should be a record's properties but not an impure function

But It does look like current_user , an in-pure function, doesn't it? 🤣

st1page · 2023-01-08T15:14:01Z

Ok, just allowing PROCTIME() when the user defines the Source or Table LGTM.
Another question, should we support more kinds of DDL on the source? e.g.

add the PROCTIME column on the source
even more... change the watermark column of a source?
c.c. @BugenZhao

liurenjie1024 · 2023-01-12T07:35:05Z

How we handle this in batch query? The time when it's scaned into our system?

lmatz · 2023-01-18T07:39:31Z

I am wondering if it also makes sense to add a time column when being materialized instead of being just read from the source? I suppose proc_time() is added to the row when it is just read.

fuyufjh · 2023-01-26T04:47:34Z

I am wondering if it also makes sense to add a time column when being materialized instead of being just read from the source? I suppose proc_time() is added to the row when it is just read.

For a table (previously materialized source) - Yes, the processing time should be persisted in that table/MV as well, just like any other column.

For a source (i.e. not materialized) - No, I suppose the only thing we can do is using the read time as processing time, as @liurenjie1024 commented above. To avoid that, users should define an event time e.g. Kafka timestamp

fuyufjh added the type/feature label Jan 5, 2023

github-actions bot added this to the release-0.1.16 milestone Jan 5, 2023

fuyufjh mentioned this issue Jan 6, 2023

Discussion: Force specifying watermark for sources #6750

Closed

fuyufjh mentioned this issue Jan 26, 2023

Implement now/current_date/current_time/current_timestamp #2870

Closed

fuyufjh assigned yuhao-su Jan 30, 2023

fuyufjh modified the milestones: release-0.1.16, release-0.1.17 Jan 30, 2023

yuhao-su modified the milestones: release-0.1.17, next-release-0.1.19, release-0.1.18 Feb 19, 2023

fuyufjh modified the milestones: release-0.18, release-0.19 Mar 22, 2023

yuhao-su closed this as completed May 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: define processing time column #7209

feat: define processing time column #7209

fuyufjh commented Jan 5, 2023 •

edited

Loading

st1page commented Jan 6, 2023

fuyufjh commented Jan 6, 2023 •

edited

Loading

st1page commented Jan 8, 2023 •

edited

Loading

liurenjie1024 commented Jan 12, 2023

lmatz commented Jan 18, 2023

fuyufjh commented Jan 26, 2023

feat: define processing time column #7209

feat: define processing time column #7209

Comments

fuyufjh commented Jan 5, 2023 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

1. Hidden column.

2. System function

Additional context

st1page commented Jan 6, 2023

fuyufjh commented Jan 6, 2023 • edited Loading

st1page commented Jan 8, 2023 • edited Loading

liurenjie1024 commented Jan 12, 2023

lmatz commented Jan 18, 2023

fuyufjh commented Jan 26, 2023

fuyufjh commented Jan 5, 2023 •

edited

Loading

fuyufjh commented Jan 6, 2023 •

edited

Loading

st1page commented Jan 8, 2023 •

edited

Loading