Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skipping the timestamp check for permission failures #877

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pallavia7
Copy link
Collaborator

@pallavia7 pallavia7 commented Nov 18, 2024

Summary

Moved the permission check before timestamp check. If permission fails, skipped the timestamp check

Why / Goal

User reported an issue in analyzer: the job failed open when users don't have permission to certain Hive tables. The correct behavior should be that any table permission issues should be caught in the analyzer step.
The root cause is that the timestamp check is enabled by default, and it runs before the table permission check. Since timestamp check requires accessing data, it failed open.

Test Plan

[+ ] Added Unit Tests

Checklist

  • Documentation update

Reviewers


@pengyu-hou
Copy link
Collaborator

@pallavia7 you could run sbt scalafmt for the CI failure.

@pallavia7 pallavia7 force-pushed the pallavi--analyzer-permvalidation branch from 9eb30a1 to b9e9f72 Compare November 18, 2024 21:51
if (!skipTimestampCheck) {
val gbTables = ListBuffer[String]()
joinConf.joinParts.toScala.foreach { part =>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change


// join parts
val joinPart = Builders.GroupBy(
sources = Seq(getTestGBSourceWithTs(namespace=namespace)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure how access denial is replayed here. Could you explain? Thanks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used Mockito.spy on tableUtils objects. when(tableUtils.checkTablePermission(any(), any())).thenReturn(false) modifies checkPermission behavior. So when tableUtils.checkTablePermission is invoked, the method returns false which happens when there is no access to table.

@pallavia7 pallavia7 force-pushed the pallavi--analyzer-permvalidation branch from b9e9f72 to d582acf Compare November 19, 2024 20:57
runTablePermissionValidation((gbTables.toList ++ List(joinConf.left.table)).toSet)
} else Set()

if (!skipTimestampCheck && noAccessTables.isEmpty) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we print a warning error if we are skipping the timestamp check because there are tables with permission issues?

def testJoinAnalyzerInvalidTablePermissions(): Unit = {
val spark: SparkSession = SparkSessionBuilder.build("AnalyzerTest" + "_" + Random.alphanumeric.take(6).mkString, local = true)
val tableUtils = spy(TableUtils(spark))
when(tableUtils.checkTablePermission(any(), any())).thenReturn(false)
Copy link
Collaborator

@hzding621 hzding621 Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider mocking the tableUtils.sql to throw a runtime exception to mimic table permission issue, since the timestamp check logic will try to access the table data using tableUtils.sql, and we want to ensure that this is not triggered and gated by the permission check first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants