Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec: Add missing last-column-id #7445

Merged
merged 2 commits into from
Jun 5, 2023
Merged

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Apr 27, 2023

I noticed that this was required:

com.fasterxml.jackson.databind.JsonMappingException: Cannot parse missing int: last-column-id (through reference chain: org.apache.iceberg.rest.requests.UpdateTableRequest["updates"]->java.util.ArrayList[0])
	at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:402)
	at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:373)
	at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer._deserializeFromArray(CollectionDeserializer.java:375)
	at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:244)
	at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:28)
	at com.fasterxml.jackson.databind.deser.impl.FieldProperty.deserializeAndSet(FieldProperty.java:138)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:314)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177)
	at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4730)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3690)
	at org.apache.iceberg.rest.RESTCatalogServlet$ServletRequestContext.from(RESTCatalogServlet.java:179)
	at org.apache.iceberg.rest.RESTCatalogServlet.doPost(RESTCatalogServlet.java:78)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
	at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:713)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.Server.handle(Server.java:516)
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.IllegalArgumentException: Cannot parse missing int: last-column-id
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:220)
	at org.apache.iceberg.util.JsonUtil.getInt(JsonUtil.java:108)
	at org.apache.iceberg.MetadataUpdateParser.readAddSchema(MetadataUpdateParser.java:400)
	at org.apache.iceberg.MetadataUpdateParser.fromJson(MetadataUpdateParser.java:245)
	at org.apache.iceberg.rest.RESTSerializers$MetadataUpdateDeserializer.deserialize(RESTSerializers.java:130)
	at org.apache.iceberg.rest.RESTSerializers$MetadataUpdateDeserializer.deserialize(RESTSerializers.java:125)
	at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer._deserializeFromArray(CollectionDeserializer.java:359)
	... 41 more

Java code:

private static MetadataUpdate readAddSchema(JsonNode node) {
JsonNode schemaNode = JsonUtil.get(SCHEMA, node);
Schema schema = SchemaParser.fromJson(schemaNode);
int lastColumnId = JsonUtil.getInt(LAST_COLUMN_ID, node);
return new MetadataUpdate.AddSchema(schema, lastColumnId);
}

@nastra
Copy link
Contributor

nastra commented Apr 27, 2023

just an FYI that there's also #6701 to address the same

@github-actions github-actions bot added the core label Apr 30, 2023
Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change itself LGTM, but it would be good to add a test to TestMetadataUpdateParser where we don't specify last-column-id:

@Test
  public void testAddSchemaFromJsonWithoutLastColumnId() {
    String action = MetadataUpdateParser.ADD_SCHEMA;
    Schema schema = ID_DATA_SCHEMA;
    int lastColumnId = schema.highestFieldId();
    String json =
        String.format("{\"action\":\"add-schema\",\"schema\":%s}", SchemaParser.toJson(schema));
    MetadataUpdate actualUpdate = new MetadataUpdate.AddSchema(schema, lastColumnId);
    assertEquals(action, actualUpdate, MetadataUpdateParser.fromJson(json));
  }

@Fokko
Copy link
Contributor Author

Fokko commented May 1, 2023

@nastra Excellent idea, added a test! 👍🏻

@Fokko Fokko force-pushed the fd-add-missing-last-column-id branch from 07e8839 to 10a1af9 Compare May 1, 2023 21:43
@Fokko Fokko force-pushed the fd-add-missing-last-column-id branch from 10a1af9 to 443b797 Compare May 1, 2023 21:47
@Fokko Fokko requested a review from rdblue June 1, 2023 10:42
@Fokko Fokko merged commit 26af4c7 into master Jun 5, 2023
@Fokko Fokko deleted the fd-add-missing-last-column-id branch June 5, 2023 21:54
@Fokko
Copy link
Contributor Author

Fokko commented Jun 5, 2023

Thanks @nastra, @dramaticlly, @singhpk234 and @rdblue for the review! 🙌🏻

nastra pushed a commit to nastra/iceberg that referenced this pull request Aug 15, 2023
* Spec: Add missing last-column-id to open-api spec

* Add description
Fokko added a commit to Fokko/iceberg that referenced this pull request Nov 11, 2024
I've added this to the spec a while ago:

apache#7445

But I think this was a mistake, and we should not expose this
to the public APIs, as it is much better to track this internally.

I noticed this while reviewing apache/iceberg-rust#587

Removing this as part of the APIs in Java, and the Open-API
update makes it much more resilient, and don't require the
clients to compute this value
Fokko added a commit to Fokko/iceberg that referenced this pull request Nov 11, 2024
Okay, I've added this to the spec a while ago:

apache#7445

But I think this was a mistake, and we should not expose this
to the public APIs, as it is much better to track this internally.

I noticed this while reviewing apache/iceberg-rust#587

Removing this as part of the APIs in Java, and the Open-API
update makes it much more resilient, and don't require the
clients to compute this value. For example. when there are two conflicting
schema changes, the last-column-id must be recomputed correctly when doing
the retry operation.
Fokko added a commit to Fokko/iceberg that referenced this pull request Nov 15, 2024
Okay, I've added this to the spec a while ago:

apache#7445

But I think this was a mistake, and we should not expose this
to the public APIs, as it is much better to track this internally.

I noticed this while reviewing apache/iceberg-rust#587

Removing this as part of the APIs in Java, and the Open-API
update makes it much more resilient, and don't require the
clients to compute this value. For example. when there are two conflicting
schema changes, the last-column-id must be recomputed correctly when doing
the retry operation.
Fokko added a commit that referenced this pull request Nov 25, 2024
* Core,Open-API: Don't expose the `last-column-id`

Okay, I've added this to the spec a while ago:

#7445

But I think this was a mistake, and we should not expose this
to the public APIs, as it is much better to track this internally.

I noticed this while reviewing apache/iceberg-rust#587

Removing this as part of the APIs in Java, and the Open-API
update makes it much more resilient, and don't require the
clients to compute this value. For example. when there are two conflicting
schema changes, the last-column-id must be recomputed correctly when doing
the retry operation.

* Update the tests as well

* Add `deprecation` flag

* Wording

Co-authored-by: Eduard Tudenhoefner <[email protected]>

* Wording

Co-authored-by: Eduard Tudenhoefner <[email protected]>

* Wording

* Thanks Ryan!

* Remove `LOG`

---------

Co-authored-by: Eduard Tudenhoefner <[email protected]>
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
* Core,Open-API: Don't expose the `last-column-id`

Okay, I've added this to the spec a while ago:

apache#7445

But I think this was a mistake, and we should not expose this
to the public APIs, as it is much better to track this internally.

I noticed this while reviewing apache/iceberg-rust#587

Removing this as part of the APIs in Java, and the Open-API
update makes it much more resilient, and don't require the
clients to compute this value. For example. when there are two conflicting
schema changes, the last-column-id must be recomputed correctly when doing
the retry operation.

* Update the tests as well

* Add `deprecation` flag

* Wording

Co-authored-by: Eduard Tudenhoefner <[email protected]>

* Wording

Co-authored-by: Eduard Tudenhoefner <[email protected]>

* Wording

* Thanks Ryan!

* Remove `LOG`

---------

Co-authored-by: Eduard Tudenhoefner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants