-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20172][Core] Add file permission check when listing files in FsHistoryProvider #17495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Change-Id: I7c7014ed83dbba786d90ca530380e94c086b497b
| .filter { entry => | ||
| try { | ||
| val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L) | ||
| fs.access(entry.getPath, FsAction.READ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is tagged with @LimitedPrivate({"HDFS", "Hive"}), but I think it should be fine to call it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this API actually calls fs.getFileStatus() which is a remote call and completely unnecessary in this context, since you already have the FileStatus.
You could instead directly do the check against the FsPermission object (same way the fs.access() does after it performs the remote call).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, let me change it. Thanks.
Change-Id: Ie2bd075561e481266dc8047e2af604f4d6c83810
|
Test build #75431 has finished for PR 17495 at commit
|
|
@tgravescs @vanzin , would you please help reviewing this PR. Thanks a lot. |
|
Ping @vanzin @tgravescs again. Sorry to bother you and really appreciate your time. |
|
Test build #75594 has finished for PR 17495 at commit
|
|
Sorry @jerryshao I know you have a few up but I'm swamped and probably won't get to them to next week. |
| } | ||
|
|
||
| test("log without read permission should be filtered out before actual reading") { | ||
| class TestFsHistoryProvider extends FsHistoryProvider(createTestConf()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll need this from another test in this same file:
// setReadable(...) does not work on Windows. Please refer JDK-6728842.
assume(!Utils.isWindows)
In fact shouldn't this test be merged with the test for SPARK-3697?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference of this unit test is that this UT checks whether the file is filtered out during check, and for SPARK-3697 UT, it only checks the final result, so file could be filtered out during read. Let me merge this two UTs.
| .filter { entry => | ||
| try { | ||
| val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L) | ||
| fs.access(entry.getPath, FsAction.READ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this API actually calls fs.getFileStatus() which is a remote call and completely unnecessary in this context, since you already have the FileStatus.
You could instead directly do the check against the FsPermission object (same way the fs.access() does after it performs the remote call).
Change-Id: I95fe3f765c8b66cc14bc888899f99dc8a0466e91
|
Test build #75684 has finished for PR 17495 at commit
|
| try { | ||
| val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L) | ||
|
|
||
| def canAccess = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return type? Also I'd make this a top-level method (with the entry and action as parameters), maybe even in SparkHadoopUtil... just to avoid the deeply-nested method declaration. That allows you to easily write a unit test for it (yay!).
| val perm = entry.getPermission | ||
| val ugi = UserGroupInformation.getCurrentUser | ||
| val user = ugi.getShortUserName | ||
| val groups = ugi.getGroupNames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is only used in one place, just inline the call.
| } else if (perm.getOtherAction.implies(FsAction.READ)) { | ||
| true | ||
| } else { | ||
| throw new AccessControlException(s"Permission denied: user=$user, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why throw an exception instead of returning false? That makes the function's interface weird.
You could just log the access issue here if you want, and then you can even remove the try..catch.
| val ugi = UserGroupInformation.getCurrentUser | ||
| val user = ugi.getShortUserName | ||
| val groups = ugi.getGroupNames | ||
| if (user == entry.getOwner && perm.getUserAction.implies(FsAction.READ)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is not correct. The code in Hadoop's FileSystem.checkAccessPermissions is slightly different and actually correct.
For example, if you have a file with permissions 066 and your user matches the owner, you do not have permission to read the file, even if you belong to a group that has permissions to read it. Your code allows that user to read that file.
e.g.
$ sudo ls -la /tmp/test
total 132
d---rwxrwx 2 vanzin vanzin 4096 Abr 12 10:30 .
drwxrwxrwt 78 root root 126976 Abr 12 10:30 ..
$ ls /tmp/test
ls: cannot open directory '/tmp/test': Permission denied
There's also an issue with superusers (they should always have permissions), but then the Hadoop library also has that problem, so maybe we can ignore that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about it. Let me fix it.
Change-Id: I38a5491e555496004204bf99634f7147dac6c642
|
Test build #75755 has finished for PR 17495 at commit
|
|
Jenkins, retest this please. |
|
Test build #75758 has started for PR 17495 at commit |
|
Jenkins, test this please. |
|
Test build #75774 has finished for PR 17495 at commit
|
| val path = new Path(logFile2.toURI) | ||
| val fs = path.getFileSystem(SparkHadoopUtil.get.conf) | ||
| val status = fs.getFileStatus(path) | ||
| SparkHadoopUtil.get.checkAccessPermission(status, FsAction.READ) should be (true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these tests here? There's no SparkHadoopUtilSuite currently but it makes a lot more sense to create one, since that's where the code lives.
You also don't need to reference real files here; you can create custom FileStatus objects to test the behavior you want (the constructors are public and provide all the parameters you need).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vanzin , I will update the code.
Change-Id: Ib9ac4be0a896531a529c260197431df1c3adf77a
|
Test build #75884 has started for PR 17495 at commit |
|
Jenkins, retest this please. |
|
Test build #75890 has finished for PR 17495 at commit
|
|
Jenkins, retest this please. |
|
Test build #75897 has finished for PR 17495 at commit
|
| class SparkHadoopUtilSuite extends SparkFunSuite with Matchers { | ||
| test("check file permission") { | ||
| import FsAction._ | ||
| val user = UserGroupInformation.getCurrentUser.getShortUserName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using getCurrentUser, I think it would be safer to use createUserForTesting + ugi.doAs. That lets you define the user name and the groups and thus avoids possible clashes with the OS configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would be better, thanks for the suggestion.
Change-Id: Iafc17472e44a511402a3faa5e1889fa445b3c386
|
Test build #75924 has finished for PR 17495 at commit
|
| val groups = UserGroupInformation.getCurrentUser.getGroupNames | ||
| require(!groups.isEmpty) | ||
|
|
||
| val testUser = user + "-" + Random.nextInt(100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to get the current user just for this... just create a random user name.
Change-Id: Ic9851b569da3458895e7c7ea8a5474941c466585
|
Test build #75965 has finished for PR 17495 at commit
|
|
LGTM, merging to master / 2.2. |
…sHistoryProvider ## What changes were proposed in this pull request? In the current Spark's HistoryServer we expected to get `AccessControlException` during listing all the files, but unfortunately it was not worked because we actually doesn't check the access permission and no other calls will throw such exception. What was worse is that this check will be deferred until reading files, which is not necessary and quite verbose, since it will be printed out the exception in every 10 seconds when checking the files. So here with this fix, we actually check the read permission during listing the files, which could avoid unnecessary file read later on and suppress the verbose log. ## How was this patch tested? Add unit test to verify. Author: jerryshao <[email protected]> Closes #17495 from jerryshao/SPARK-20172. (cherry picked from commit 592f5c8) Signed-off-by: Marcelo Vanzin <[email protected]>
…sHistoryProvider ## What changes were proposed in this pull request? In the current Spark's HistoryServer we expected to get `AccessControlException` during listing all the files, but unfortunately it was not worked because we actually doesn't check the access permission and no other calls will throw such exception. What was worse is that this check will be deferred until reading files, which is not necessary and quite verbose, since it will be printed out the exception in every 10 seconds when checking the files. So here with this fix, we actually check the read permission during listing the files, which could avoid unnecessary file read later on and suppress the verbose log. ## How was this patch tested? Add unit test to verify. Author: jerryshao <[email protected]> Closes apache#17495 from jerryshao/SPARK-20172.
What changes were proposed in this pull request?
In the current Spark's HistoryServer we expected to get
AccessControlExceptionduring listing all the files, but unfortunately it was not worked because we actually doesn't check the access permission and no other calls will throw such exception. What was worse is that this check will be deferred until reading files, which is not necessary and quite verbose, since it will be printed out the exception in every 10 seconds when checking the files.So here with this fix, we actually check the read permission during listing the files, which could avoid unnecessary file read later on and suppress the verbose log.
How was this patch tested?
Add unit test to verify.