Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RowStringInterpolator utility #32367

Merged
merged 6 commits into from
Sep 25, 2024
Merged

Conversation

ahmedabu98
Copy link
Contributor

Utility to substitute values from Beam Rows into a template String.
Depends on #32366

Part of #32365
See #31807 for how this can work for portable dynamic destinations

@github-actions github-actions bot added the java label Aug 29, 2024
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@ahmedabu98
Copy link
Contributor Author

assign set of reviewers

@ahmedabu98
Copy link
Contributor Author

tests are red because #32366 hasn't been merged yet, but this PR is still ready for review

Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @damondouglas for label java.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

github-actions bot commented Sep 7, 2024

Reminder, please take a look at this pr: @damondouglas

Copy link
Contributor

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @robertwb for label java.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

@damondouglas damondouglas self-requested a review September 11, 2024 18:07
Comment on lines 93 to 103
fieldsToReplace = new HashSet<>();
while (m.find()) {
fieldsToReplace.add(StringUtils.strip(m.group(), "{}"));
}

List<String> rowFields =
fieldsToReplace.stream()
.filter(f -> !WINDOWING_METADATA.contains(f))
.collect(Collectors.toList());

RowFilter.validateSchemaContainsFields(rowSchema, rowFields, "string interpolation");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following results from my concern for doing work inside a constructor. I don't want to block this PR. I will move on if you resolve this comment. However, brainstorming out loud, may we consider a constructor, possibly package private, that accepts the template and fieldsToReplace. Then provide a static method, perhaps called compile, that performs the work of this constructor. The benefits of this is that testing the interpolate could be independent of testing the compile phase. Also, this separation may make changes in the future possibly easier as they seem like separate but related phases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if it was not possible for one to instantiate an invalid interpolator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to not being able to instantiate an invalid interpolator.

}

/** Like {@link #interpolate(Row)} but also potentially include windowing information. */
public String interpolate(Row row, BoundedWindow window, PaneInfo paneInfo, Instant timestamp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non blocking but I was concerned about the possibility for future new sources of interpolation. For example, what if someone wants to add a new Thing but include the existing sources of interpolate, row, window, etc? Then in another future, someone else wants to add a new Thing2. This may require some thought. However, what I was envisioning was something like this:

interface Interpolator {
    String interpolate();
}
// instantiated with a helper method maybe
class RowInterpolator {
   String interpolate() { }
}
class RowStringInterpolator 
    public String interpolate(Interpolator ...interpolators) { ... }
}
String result = interpolator.interpolate(rowInterpolator, windowInterpolator, paneInfoInterpolator, timestampInterpolator);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I could see this being something like

interface Interpolator<T> {
  String interpolate(T);
  ...
}

RowStringInterpolator implements Interpolator<Row> {}

With that said though, I'd prefer we leave it off until it is warranted. Most of this work is just for cross-language and portability, where Beam Row is used as a standard. I haven't come across a case where we would benefit with String interpolation using another type, so modularizing this may not actually make a difference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that we would want to support "row-compatible" objects, not just Rows, but I don't think that needs to be done from the start.

@ahmedabu98
Copy link
Contributor Author

Thanks @damondouglas, PTAL

Comment on lines 93 to 103
fieldsToReplace = new HashSet<>();
while (m.find()) {
fieldsToReplace.add(StringUtils.strip(m.group(), "{}"));
}

List<String> rowFields =
fieldsToReplace.stream()
.filter(f -> !WINDOWING_METADATA.contains(f))
.collect(Collectors.toList());

RowFilter.validateSchemaContainsFields(rowSchema, rowFields, "string interpolation");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to not being able to instantiate an invalid interpolator.

public static final String DD = "$DD";
private static final Set<String> WINDOWING_METADATA =
Sets.newHashSet(WINDOW, PANE_INDEX, YYYY, MM, DD);
private static final Pattern TEMPLATE_PATTERN = Pattern.compile("\\{(.+?)}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not care about being able to escape a {? (Maybe this is fine as future work.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It complains if we don't escape it in the pattern:

java.util.regex.PatternSyntaxException: Illegal repetition {(.+?)}

Normally, curly brackets signify how many times a character should be repeated

}

/** Like {@link #interpolate(Row)} but also potentially include windowing information. */
public String interpolate(Row row, BoundedWindow window, PaneInfo paneInfo, Instant timestamp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that we would want to support "row-compatible" objects, not just Rows, but I don't think that needs to be done from the start.

}

/** Performs string interpolation on the template using values from the input {@link Row}. */
public String interpolate(Row row) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation to have a separate no-windowing implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figured why not, but on a second thought will remove it to keep windowing properties as first-class citizens

@ahmedabu98 ahmedabu98 merged commit e3c6f47 into apache:master Sep 25, 2024
28 checks passed
reeba212 pushed a commit to reeba212/beam that referenced this pull request Dec 4, 2024
* RowStringInterpolator

* spotless

* address comments

* address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants