Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KYUUBI #5594][AUTHZ] BuildQuery should respect normal node's input #5627

Closed
wants to merge 27 commits into from

Conversation

AngersZhuuuu
Copy link
Contributor

Why are the changes needed?

To close #5594
For case

def filter_func(iterator):
                for pdf in iterator:
                    yield pdf[pdf.id == 1]

df = spark.read.table("test_mapinpandas")
execute_result = df.mapInPandas(filter_func, df.schema).show()

The logical plan is

GlobalLimit 21
+- LocalLimit 21
   +- Project [cast(id#5 as string) AS id#11, name#6]
      +- MapInPandas filter_func(id#0, name#1), [id#5, name#6]
         +- HiveTableRelation [`default`.`test_mapinpandas`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#0, name#1], Partition Cols: []]

When handle MapInPandas, we didn't match its input with HiveTableRelation, cause we miss input table's columns. This pr fix this

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

With this patch, it can extract the correct privilege object now
截屏2023-11-02 上午10 25 38

Was this patch authored or co-authored using generative AI tooling?

s"does not have [select] privilege on [$db1/$table1/id,$db1/$table1/scope]")
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a UT, the plan is


GlobalLimit 1
+- LocalLimit 1
   +- Project [cast(id#22 as string) AS id#30, cast(scope#23 as string) AS scope#31]
      +- MapInPandas mapInPandasUDF(id#17, scope#18)#21, [id#22, scope#23]
         +- HiveTableRelation [`spark_catalog`.`default`.`table1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#17, scope#18], Partition Cols: []]

Without this pr, this query only extract table, miss column privilege

"Permission denied: user [someone] does not have [select] privilege on [default/table1]" did not contain "does not have [select] privilege on [default/table1/id,default/table1/scope]"
ScalaTestFailureLocation: org.apache.kyuubi.plugin.spark.authz.ranger.HiveCatalogRangerSparkExtensionSuite at (RangerSparkExtensionSuite.scala:1125)
org.scalatest.exceptions.TestFailedException: "Permission denied: user [someone] does not have [select] privilege on [default/table1]" did not contain "does not have [select] privilege on [default/table1/id,default/table1/scope]"
	at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)

@AngersZhuuuu
Copy link
Contributor Author

ping @yaooqinn

@codecov-commenter
Copy link

codecov-commenter commented Nov 6, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (c1685c6) 61.35% compared to head (d0b022e) 61.38%.
Report is 1 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #5627      +/-   ##
============================================
+ Coverage     61.35%   61.38%   +0.02%     
  Complexity       23       23              
============================================
  Files           607      607              
  Lines         35935    35946      +11     
  Branches       4936     4937       +1     
============================================
+ Hits          22047    22064      +17     
  Misses        11498    11498              
+ Partials       2390     2384       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AngersZhuuuu
Copy link
Contributor Author

ping @yaooqinn Could you take a review?

@yaooqinn
Copy link
Member

we have test for aggs, why do they not fail after this change

@AngersZhuuuu
Copy link
Contributor Author

why do they not fail after this change

This PR is just a supplement to some special plan cases. Make sure we get all the projection list.

For below plan

GlobalLimit 1
+- LocalLimit 1
   +- Project [cast(id#22 as string) AS id#30, cast(scope#23 as string) AS scope#31]
      +- MapInPandas mapInPandasUDF(id#17, scope#18)#21, [id#22, scope#23]
         +- HiveTableRelation [`spark_catalog`.`default`.`table1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#17, scope#18], Partition Cols: []]

if we only extract the Projection's projectionList, it's clear that we miss some column since Project's projectionList's exprId is not the same as the HiveRelation, we better collect all node's input to make avoid miss connection.

About below change
截屏2023-11-24 下午2 56 25

We can see plan


Aggregate [count(id#1483) AS count(id)#1486L]
+- Filter (id#1483 > 1)
   +- PermanentViewMarker `spark_catalog`.`default`.`view1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id, scope], false
      +- Project [id#1483]
         +- HiveTableRelation [`spark_catalog`.`default`.`table1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1483, scope#1484], Partition Cols: []]

Aggregate and Filter's attribute all add to conditionList, but we judge if (projectionList.isEmpty) in mergeProjection.
So without this change will cause extracting wrong column.

@AngersZhuuuu
Copy link
Contributor Author

Change some code to make it more simple cc @yaooqinn

Copy link

github-actions bot commented Mar 9, 2024

Thanks for the PR! This PR is being closed due to inactivity. This isn't a judgement on the merit of the PR in any way. If this is still an issue with the latest version of Kyuubi, please reopen it and ask a committer to remove the Stale tag!

Thank you for using Kyuubi!

@github-actions github-actions bot added the Stale label Mar 9, 2024
@github-actions github-actions bot closed this Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] BuildQuery should respect normal node's input
3 participants