Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#772] fix(kerberos): cache proxy user ugi to avoid memory leak #773

Merged
merged 7 commits into from
Mar 29, 2023

Conversation

zuston
Copy link
Member

@zuston zuston commented Mar 28, 2023

What changes were proposed in this pull request?

  1. To avoid memory leak by caching of proxy user UGI.

Why are the changes needed?

Fix: #772

The Hadoop filesystem instance will be created too many time in cache,
which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and UGI.
The scheme and authority are not changed every time. But for UGI, if we invoke the
createProxyUser, it will always create a new one, that means the every invoking Filesystem.get(),
it will be cached due to different key.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  1. Existing UTs
  2. Added tests

@zuston
Copy link
Member Author

zuston commented Mar 28, 2023

PTAL @leixm

@codecov-commenter
Copy link

codecov-commenter commented Mar 28, 2023

Codecov Report

Merging #773 (3702631) into master (e8d9909) will increase coverage by 0.21%.
The diff coverage is 33.33%.

@@             Coverage Diff              @@
##             master     #773      +/-   ##
============================================
+ Coverage     63.11%   63.33%   +0.21%     
- Complexity     1955     1986      +31     
============================================
  Files           230      231       +1     
  Lines         11346    11462     +116     
  Branches       1119     1126       +7     
============================================
+ Hits           7161     7259      +98     
- Misses         3789     3800      +11     
- Partials        396      403       +7     
Impacted Files Coverage Δ
...uniffle/common/security/HadoopSecurityContext.java 78.57% <33.33%> (-2.51%) ⬇️

... and 20 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@leixm leixm self-requested a review March 28, 2023 06:41
@jerqi
Copy link
Contributor

jerqi commented Mar 28, 2023

What changes were proposed in this pull request?

(Please outline the changes and how this PR fixes the issue.)

Why are the changes needed?

Fix: #772

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  1. Existing UTs

Could we modify the description about What changes?

@@ -75,6 +78,7 @@ public HadoopSecurityContext(
refreshIntervalSec,
refreshIntervalSec,
TimeUnit.SECONDS);
proxyUserUgiPool = Maps.newConcurrentMap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use JavaUtils here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes.

@jerqi
Copy link
Contributor

jerqi commented Mar 28, 2023

It seems that we need some extra tests .

@advancedxy
Copy link
Contributor

It seems that we need some extra tests .

+1. At least, you should test the filesystem instance should be the same for the same proxy users.

@zuston zuston requested a review from jerqi March 28, 2023 08:56

FileSystem fileSystem1 = context.runSecured("alex", () -> FileSystem.get(kerberizedHdfs.getConf()));
FileSystem fileSystem2 = context.runSecured("alex", () -> FileSystem.get(kerberizedHdfs.getConf()));
assertEquals(fileSystem1, fileSystem2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertTrue(filesystem1 == fileSystem2)?

I believe it's more about they are the same instance.

Copy link
Contributor

@advancedxy advancedxy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except one minor comment

Copy link
Contributor

@advancedxy advancedxy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending CI passes

@zuston zuston merged commit 33e0d23 into apache:master Mar 29, 2023
xianjingfeng pushed a commit to xianjingfeng/incubator-uniffle that referenced this pull request Apr 5, 2023
…apache#773)

### What changes were proposed in this pull request?

1. To avoid memory leak by caching of proxy user UGI.

### Why are the changes needed?

Fix: apache#772 

The Hadoop filesystem instance will be created too many time in cache, 
which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and UGI. 
The scheme and authority are not changed every time. But for UGI, if we invoke the 
createProxyUser, it will always create a new one, that means the every invoking `Filesystem.get()`,
it will be cached due to different key.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?
1. Existing UTs
2. Added tests
jerqi pushed a commit that referenced this pull request Apr 13, 2023
### What changes were proposed in this pull request?

1. To avoid memory leak by caching of proxy user UGI.

### Why are the changes needed?

Fix: #772 

The Hadoop filesystem instance will be created too many time in cache, 
which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and UGI. 
The scheme and authority are not changed every time. But for UGI, if we invoke the 
createProxyUser, it will always create a new one, that means the every invoking `Filesystem.get()`,
it will be cached due to different key.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?
1. Existing UTs
2. Added tests
jerqi added a commit that referenced this pull request Apr 13, 2023
@jerqi
Copy link
Contributor

jerqi commented Apr 13, 2023

This is a fix. It should be merged branch 0.7. But there is conflict with branch 0.7. Could you raise a pr to branch 0.7?

zuston added a commit that referenced this pull request Apr 17, 2023
### What changes were proposed in this pull request?

1. To avoid memory leak by caching of proxy user UGI.

### Why are the changes needed?

Fix: #772 

The Hadoop filesystem instance will be created too many time in cache, 
which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and UGI. 
The scheme and authority are not changed every time. But for UGI, if we invoke the 
createProxyUser, it will always create a new one, that means the every invoking `Filesystem.get()`,
it will be cached due to different key.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?
1. Existing UTs
2. Added tests
jerqi added a commit that referenced this pull request Apr 17, 2023
zuston added a commit to zuston/incubator-uniffle that referenced this pull request Apr 17, 2023
…apache#773)

1. To avoid memory leak by caching of proxy user UGI.

Fix: apache#772

The Hadoop filesystem instance will be created too many time in cache,
which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and UGI.
The scheme and authority are not changed every time. But for UGI, if we invoke the
createProxyUser, it will always create a new one, that means the every invoking `Filesystem.get()`,
it will be cached due to different key.

No.

1. Existing UTs
2. Added tests
zuston added a commit that referenced this pull request Apr 17, 2023
…773) (#824)

### What changes were proposed in this pull request?

1. To avoid memory leak by caching of proxy user UGI.

### Why are the changes needed?

Fix: #772

The Hadoop filesystem instance will be created too many time in cache, which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and UGI. The scheme and authority are not changed every time. But for UGI, if we invoke the createProxyUser, it will always create a new one, that means the every invoking `Filesystem.get()`, it will be cached due to different key.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

1. Existing UTs
2. Added tests
xianjingfeng pushed a commit to xianjingfeng/incubator-uniffle that referenced this pull request Jun 20, 2023
xianjingfeng pushed a commit to xianjingfeng/incubator-uniffle that referenced this pull request Jun 20, 2023
…apache#773)

1. To avoid memory leak by caching of proxy user UGI.

Fix: apache#772

The Hadoop filesystem instance will be created too many time in cache,
which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and UGI.
The scheme and authority are not changed every time. But for UGI, if we invoke the
createProxyUser, it will always create a new one, that means the every invoking `Filesystem.get()`,
it will be cached due to different key.

No.

1. Existing UTs
2. Added tests
xianjingfeng pushed a commit to xianjingfeng/incubator-uniffle that referenced this pull request Jun 20, 2023
xianjingfeng pushed a commit to xianjingfeng/incubator-uniffle that referenced this pull request Jun 20, 2023
… leak (apache#773) (apache#824)

1. To avoid memory leak by caching of proxy user UGI.

Fix: apache#772

The Hadoop filesystem instance will be created too many time in cache, which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and UGI. The scheme and authority are not changed every time. But for UGI, if we invoke the createProxyUser, it will always create a new one, that means the every invoking `Filesystem.get()`, it will be cached due to different key.

No.

1. Existing UTs
2. Added tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Memory leak caused by UserGroupInformation.createProxyUser
6 participants