Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when serval process read the same file,it will be slow using jni fuse #14007

Closed
lilyzhoupeijie opened this issue Aug 31, 2021 · 6 comments
Closed
Assignees
Labels
priority-medium type-feature This issue is a feature request

Comments

@lilyzhoupeijie
Copy link
Contributor

lilyzhoupeijie commented Aug 31, 2021

when serval process read the same file,it will be slow using jni fuse

There is an synchronized func, and will cause read very slowly when reading the same file.

alluxio/integration/fuse/src/main/java/alluxio/fuse/AlluxioJniFuseFileSystem.java

 // FileInStream is not thread safe
      synchronized (is) {
        if (!mOpenFileEntries.containsKey(fd)) {
          LOG.error("Cannot find fd {} for {}", fd, path);
          return -ErrorCodes.EBADFD();
        }
        if (offset - is.getPos() < is.remaining()) {
          is.seek(offset);
          while (rd >= 0 && nread < sz) {
            rd = is.read(buf, nread, sz - nread);
            if (rd >= 0) {
              nread += rd;
            }
          }
        }
      }
@lilyzhoupeijie lilyzhoupeijie added the type-feature This issue is a feature request label Aug 31, 2021
@apc999
Copy link
Contributor

apc999 commented Sep 2, 2021

@lilyzhoupeijie yes this is a known limitation. @LuQQiu will take a look on the potential optimizations later.

@LuQQiu
Copy link
Contributor

LuQQiu commented Sep 15, 2021

when multiple processes reading the same file,
instead of one open() one release() for one process,
Fuse will use one open() one release() for a file no matter how many processes access it.

The current way

  1. open(PATH) give per PATH an FID, get the FileInputStream, store in Map<FID, FileInStream> mOpenFileEntries
  2. read(PATH, FID, offset, len), get the FileInStream based on the FID, read the file
  3. release(PATH, FID), remove the FID from mOpenFileEntries.

To achieve one input stream for each process + each file, we need to use kind of Map<PID + PATH, FileInputStream> mOpenFileEntries structure.

  1. open(PATH) get the Fuse context PID, get an input stream for this PID + PATH, store in structure
  2. read(PATH, FID, offset, len), get the Fuse context PID, see if mOpenFileEntries contains the PID+PATH, if yes, get it, if not, create a new one. and then read the file.
  3. release(PATH, FID), remove all the values with given PATH.
    This solution is much more inefficient than the current one given that each read is small and needs to do the map key comparison.

Could not come up with a solution that can truly improve the performance given the current libfuse structure.

@apc999
Copy link
Contributor

apc999 commented Sep 16, 2021

hi @lilyzhoupeijie one quick clarifications:

when multiple processes are reading the same file, they will not be synchronized on the same input stream --- this is because each process gets a unique file descriptor (FD) with a dedicated input stream. The only case when different fuse threads are contending is when there are multiple 128-KB reads dispatched from libfuse but all are from the same process reading the same file. Do you observe slowness when you have only one process reading the file?

@apc999
Copy link
Contributor

apc999 commented Sep 16, 2021

discussed further offline with Lu, @lilyzhoupeijie I think there might be some slow read going on, preventing other libfuse worker threads competing the same FD to proceed. We want to reproduce this issue and identify the reason of the slowness.

@LuQQiu
Copy link
Contributor

LuQQiu commented Oct 27, 2021

From the jstack, there are 31 libfuse threads, 19 are running fine, 8 are waiting to get data

stackTrace:
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006f6453700> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at alluxio.client.block.stream.GrpcBlockingStream.receive(GrpcBlockingStream.java:195)
at alluxio.client.block.stream.GrpcDataMessageBlockingStream.receiveDataMessage(GrpcDataMessageBlockingStream.java:84)
at alluxio.client.block.stream.GrpcDataMessageBlockingStream.waitForComplete(GrpcDataMessageBlockingStream.java:115)
at alluxio.client.block.stream.GrpcDataReader.close(GrpcDataReader.java:186)
at alluxio.client.block.stream.BlockInStream.closeDataReader(BlockInStream.java:510)
at alluxio.client.block.stream.BlockInStream.seek(BlockInStream.java:402)
at alluxio.client.file.AlluxioFileInStream.seek(AlluxioFileInStream.java:330)
at alluxio.fuse.AlluxioJniFuseFileSystem.readInternal(AlluxioJniFuseFileSystem.java:375)
- locked <0x00000007ba28fe98> (a alluxio.client.file.AlluxioFileInStream)
at alluxio.fuse.AlluxioJniFuseFileSystem.lambda$read$5(AlluxioJniFuseFileSystem.java:352)

4 are waiting to get the

stackTrace:
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006f6453700> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at alluxio.client.block.stream.GrpcBlockingStream.receive(GrpcBlockingStream.java:195)
at alluxio.client.block.stream.GrpcDataMessageBlockingStream.receiveDataMessage(GrpcDataMessageBlockingStream.java:84)
at alluxio.client.block.stream.GrpcDataMessageBlockingStream.waitForComplete(GrpcDataMessageBlockingStream.java:115)
at alluxio.client.block.stream.GrpcDataReader.close(GrpcDataReader.java:186)
at alluxio.client.block.stream.BlockInStream.closeDataReader(BlockInStream.java:510)
at alluxio.client.block.stream.BlockInStream.seek(BlockInStream.java:402)
at alluxio.client.file.AlluxioFileInStream.seek(AlluxioFileInStream.java:330)
at alluxio.fuse.AlluxioJniFuseFileSystem.readInternal(AlluxioJniFuseFileSystem.java:375)
- locked <0x00000007ba28fe98> (a alluxio.client.file.AlluxioFileInStream)
at alluxio.fuse.AlluxioJniFuseFileSystem.lambda$read$5(AlluxioJniFuseFileSystem.java:352)
stackTrace:
java.lang.Thread.State: BLOCKED (on object monitor)
at alluxio.fuse.AlluxioJniFuseFileSystem.readInternal(AlluxioJniFuseFileSystem.java:369)

There are 8 threads, 4 pairs, that share the same FileInStream
Screen Shot 2021-10-27 at 3 30 41 PM

@lilyzhoupeijie we do find that there is some contention during the FileInStream read, but the lock contention is not severe, but the more important problem is read being slow. If we modify to let each thread has one FileInStream, we may still face the read slowness issue. If the read slowness comes from resources contention like CPU resources contention, or network I/O, changing the FileInStream synchronization may not help.

Can you share us more information of how to reproduce the issue?

@LuQQiu
Copy link
Contributor

LuQQiu commented Oct 28, 2021

Close this issue, more important issue is some files are slow in reading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority-medium type-feature This issue is a feature request
Projects
None yet
Development

No branches or pull requests

3 participants