Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuse IO stressbench hardening #14540

Closed
LuQQiu opened this issue Nov 18, 2021 · 9 comments
Closed

Fuse IO stressbench hardening #14540

LuQQiu opened this issue Nov 18, 2021 · 9 comments
Labels
priority-medium stale The PR/Issue does not have recent activities and will be closed automatically type-feature This issue is a feature request

Comments

@LuQQiu
Copy link
Contributor

LuQQiu commented Nov 18, 2021

Is your feature request related to a problem? Please describe.
Fuse IO stressbench according to users including @maobaolong has some limitations
it requires

  • can only test data written by Fuse IO stressbench, cannot test existing files
  • Unstable, no retry logic for read/write/ls, if one error thrown, the whole test will failed.
  • Require the cluster setup to be fixed. In cluster testing, each worker node needs to have one worker, one Fuse and one job worker. If one process down, the test will not be passed.

Describe the solution you'd like
Hardening the Fuse IO stressbench to make it more user-friendly

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Urgency
Explain why the feature is important

Additional context
Add any other context or screenshots about the feature request here.

@LuQQiu LuQQiu added type-feature This issue is a feature request priority-medium labels Nov 18, 2021
@LuQQiu
Copy link
Contributor Author

LuQQiu commented Nov 18, 2021

Thoughts from @ssz1997

  1. If an exception is thrown during read/write/ls, retry to read/write/ls the file 3 times before actually throwing an exception.
  2. During reading, if an exception occurs for a thread, we should still report the throughput but report one thread was not working in the result.
  3. During cluster reading, if an exception occurs for a job worker such that it didn’t read at all, other job workers should still do their job. Still calculate the cluster throughput and report the failure of this job worker.
  4. Correct the logics so that the --cluster-limit parameter works in cluster mode. Currently all worker nodes must participate in the benchmark.
  5. Use 0-based index for the dir name for each job worker, instead of using -.
  6. Support data deletion.
  7. If the Alluxio path is not empty, currently the cluster test would fail because there are more folders/files in the testing directory than job workers. Create an empty directory for the stressbench

@kailiu6
Copy link

kailiu6 commented Dec 3, 2021

@LuQQiu hi, when I used alluxxio memeory cache, the speed, is same with disk memory, why? Can you help what wrong

my command is:( use alluxio fuse to write/read)

  1. bin/aluuxio-mount /mnt/disk

  2. config bellow:
    alluxio.worker.tieredstore.levels=1
    alluxio.worker.tieredstore.level0.alias=MEM
    alluxio.worker.tieredstore.level0.dirs.path=/data01/alluxio_mem
    alluxio.worker.tieredstore.level0.dirs.mediumtype=MEM
    alluxio.worker.tieredstore.level0.dirs.quota=100GB

  3. /integration/fuse/bin/alluxio-fuse mount -o kernel_cache,entry_timeout=7200,attr_timeout=7200,max_read=524288 /data01/alluxio/alluxio_storage/disk1 /test_alluxio/alluxio_storage_new/disk1

when I used disk memory query is 3.3s

when i used emory cache is 3.31s

why?

@LuQQiu
Copy link
Contributor Author

LuQQiu commented Dec 3, 2021

[ec2-user@GrpcRemoteRead-worker-0 alluxio]$ bin/alluxio runClass alluxio.stress.cli.fuse.FuseIOBench --local-path /mnt/alluxio-standalone-fuse-write/ --bench-timeout 3h --buffer-size 30KB --cluster-limit 1 --file-size 15KB --num-dirs 36 --num-files-per-dir 100 --operation Write --threads 36 --warmup 10
java.nio.file.FileSystemException: /mnt/alluxio-standalone-fuse-write/local-task-0: Input/output error
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
	at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
	at java.nio.file.Files.createDirectory(Files.java:674)
	at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
	at java.nio.file.Files.createDirectories(Files.java:767)
	at alluxio.stress.cli.fuse.FuseIOBench.prepare(FuseIOBench.java:145)
	at alluxio.stress.cli.Benchmark.run(Benchmark.java:150)
	at alluxio.stress.cli.Benchmark.mainInternal(Benchmark.java:91)
	at alluxio.stress.cli.fuse.FuseIOBench.main(FuseIOBench.java:81)
[ec2-user@GrpcRemoteRead-worker-0 alluxio]$ bin/alluxio runClass alluxio.stress.cli.fuse.FuseIOBench --local-path /mnt/alluxio-standalone-fuse-write/3k_15kb/ --bench-timeout 3h --buffer-size 30KB --cluster-limit 1 --file-size 15KB --num-dirs 36 --num-files-per-dir 100 --operation Write --threads 36 --warmup 10
java.nio.file.FileSystemException: /mnt/alluxio-standalone-fuse-write/3k_15kb: Input/output error
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
	at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
	at java.nio.file.Files.createDirectory(Files.java:674)
	at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
	at java.nio.file.Files.createDirectories(Files.java:767)
	at alluxio.stress.cli.fuse.FuseIOBench.prepare(FuseIOBench.java:145)
	at alluxio.stress.cli.Benchmark.run(Benchmark.java:150)
	at alluxio.stress.cli.Benchmark.mainInternal(Benchmark.java:91)
	at alluxio.stress.cli.fuse.FuseIOBench.main(FuseIOBench.java:81)
2021-12-03 21:32:25,569 ERROR AlluxioJniFuseFileSystem - Failed to mkdir /local-task-0:
alluxio.exception.FileDoesNotExistException: File /fuseIOStressBench/local-task-0 creation failed. Component 1(fuseIOStressBench) does not exist
        at alluxio.client.file.BaseFileSystem.rpc(BaseFileSystem.java:575)
        at alluxio.client.file.BaseFileSystem.createDirectory(BaseFileSystem.java:151)
        at alluxio.client.file.MetadataCachingBaseFileSystem.createDirectory(MetadataCachingBaseFileSystem.java:92)
        at alluxio.fuse.AlluxioJniFuseFileSystem.mkdirInternal(AlluxioJniFuseFileSystem.java:536)
        at alluxio.fuse.AlluxioJniFuseFileSystem.lambda$mkdir$9(AlluxioJniFuseFileSystem.java:524)
        at alluxio.fuse.AlluxioFuseUtils.call(AlluxioFuseUtils.java:278)
        at alluxio.fuse.AlluxioJniFuseFileSystem.mkdir(AlluxioJniFuseFileSystem.java:524)
        at alluxio.jnifuse.AbstractFuseFileSystem.mkdirCallback(AbstractFuseFileSystem.java:262)
2021-12-03 21:33:06,971 ERROR AlluxioJniFuseFileSystem - Failed to mkdir /3k_15kb:
alluxio.exception.FileDoesNotExistException: File /fuseIOStressBench/3k_15kb creation failed. Component 1(fuseIOStressBench) does not exist
        at alluxio.client.file.BaseFileSystem.rpc(BaseFileSystem.java:575)
        at alluxio.client.file.BaseFileSystem.createDirectory(BaseFileSystem.java:151)
        at alluxio.client.file.MetadataCachingBaseFileSystem.createDirectory(MetadataCachingBaseFileSystem.java:92)
        at alluxio.fuse.AlluxioJniFuseFileSystem.mkdirInternal(AlluxioJniFuseFileSystem.java:536)
        at alluxio.fuse.AlluxioJniFuseFileSystem.lambda$mkdir$9(AlluxioJniFuseFileSystem.java:524)
        at alluxio.fuse.AlluxioFuseUtils.call(AlluxioFuseUtils.java:278)
        at alluxio.fuse.AlluxioJniFuseFileSystem.mkdir(AlluxioJniFuseFileSystem.java:524)
        at alluxio.jnifuse.AbstractFuseFileSystem.mkdirCallback(AbstractFuseFileSystem.java:262)

Should not error out.

FYI @ssz1997

@LuQQiu
Copy link
Contributor Author

LuQQiu commented Dec 9, 2021

That error is strange, that IOException should be caught by Files.createDirectories

    public static Path createDirectories(Path dir, FileAttribute<?>... attrs)
        throws IOException
    {
        // attempt to create the directory
        try {
            createAndCheckIsDirectory(dir, attrs); <-- throw here
            return dir;
        } catch (FileAlreadyExistsException x) {
            // file exists and is not a directory
            throw x;
        } catch (IOException x) {
            // parent may not exist or other reason <-- should be caught here instead of throwing it to users
        }

@LuQQiu
Copy link
Contributor Author

LuQQiu commented Dec 10, 2021

time bin/alluxio runClass alluxio.stress.cli.fuse.FuseIOBench --local-path /mnt/alluxio-standalone-fuse-write/bigger/ --bench-timeout 3h --buffer-size 15KB --cluster-limit 1 --file-size 15KB --num-dirs 36 --num-files-per-dir 1000 --operation Write --threads 36 --warmup 30 --duration 300
java.nio.file.FileSystemException: /mnt/alluxio-standalone-fuse-write/bigger: Input/output error
alluxio.exception.FileDoesNotExistException: File /fuseIOStressBench/bigger creation failed. Component 1(fuseIOStressBench) does not exist
        at alluxio.client.file.BaseFileSystem.rpc(BaseFileSystem.java:575)
        at alluxio.client.file.BaseFileSystem.createDirectory(BaseFileSystem.java:151)
        at alluxio.client.file.MetadataCachingBaseFileSystem.createDirectory(MetadataCachingBaseFileSystem.java:92)
        at alluxio.fuse.AlluxioJniFuseFileSystem.mkdirInternal(AlluxioJniFuseFileSystem.java:536)
        at alluxio.fuse.AlluxioJniFuseFileSystem.lambda$mkdir$9(AlluxioJniFuseFileSystem.java:524)
        at alluxio.fuse.AlluxioFuseUtils.call(AlluxioFuseUtils.java:278)
        at alluxio.fuse.AlluxioJniFuseFileSystem.mkdir(AlluxioJniFuseFileSystem.java:524)
        at alluxio.jnifuse.AbstractFuseFileSystem.mkdirCallback(AbstractFuseFileSystem.java:262)
~

@ssz1997
Copy link
Contributor

ssz1997 commented Dec 21, 2021

@LuQQiu Is there another way other than using mount or alluxio-fuse stat commands to find the Alluxio path in the Fuse process? I'm asking because, to check if one Alluxio path exists or create one, we need to know the path beforehand, and this Alluxio path would be the Alluxio path in the Fuse process.

@LuQQiu
Copy link
Contributor Author

LuQQiu commented Dec 30, 2021

The error is fixed by #14742

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The PR/Issue does not have recent activities and will be closed automatically label Jan 27, 2023
@jja725
Copy link
Contributor

jja725 commented Jan 27, 2023

seems like FIO is a more popular way of testing, close it for now

@jja725 jja725 closed this as completed Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority-medium stale The PR/Issue does not have recent activities and will be closed automatically type-feature This issue is a feature request
Projects
None yet
Development

No branches or pull requests

4 participants