-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jepio - ARM Support + CI Changes #1577
Conversation
The current version of azcopy dies on ARM systems with a SIGBUS and the following stacktrace: INFO: Scanning... INFO: Any empty folders will not be processed, because source and/or destination doesn't have full folder support unexpected fault address 0xffff72640a5f fatal error: fault [signal SIGBUS: bus error code=0x1 addr=0xffff72640a5f pc=0x733d8c] goroutine 25 [running]: runtime.throw(0x9de596, 0x5) /usr/lib/go-1.16/src/runtime/panic.go:1117 +0x54 fp=0x4000d982f0 sp=0x4000d982c0 pc=0x43914 runtime.sigpanic() /usr/lib/go-1.16/src/runtime/signal_unix.go:731 +0x284 fp=0x4000d98330 sp=0x4000d982f0 pc=0x5b214 github.com/Azure/azure-storage-azcopy/v10/common.(*TransferStatus).AtomicLoad(...) /home/jeremi/github/azure-storage-azcopy-2/common/fe-ste-models.go:690 github.com/Azure/azure-storage-azcopy/v10/ste.(*JobPartPlanTransfer).TransferStatus(...) /home/jeremi/github/azure-storage-azcopy-2/ste/JobPartPlan.go:377 github.com/Azure/azure-storage-azcopy/v10/ste.(*jobPartMgr).ScheduleTransfers(0x4000adc000, 0xb54930, 0x4000ad2000) /home/jeremi/github/azure-storage-azcopy-2/ste/mgr-JobPartMgr.go:396 +0x42c fp=0x4000d99f20 sp=0x4000d98340 pc=0x733d8c github.com/Azure/azure-storage-azcopy/v10/ste.(*jobsAdmin).scheduleJobParts(0x4000496000) /home/jeremi/github/azure-storage-azcopy-2/ste/JobsAdmin.go:287 +0x44 fp=0x4000d99fd0 sp=0x4000d99f20 pc=0x724ee4 runtime.goexit() /usr/lib/go-1.16/src/runtime/asm_arm64.s:1130 +0x4 fp=0x4000d99fd0 sp=0x4000d99fd0 pc=0x78a44 created by github.com/Azure/azure-storage-azcopy/v10/ste.initJobsAdmin /home/jeremi/github/azure-storage-azcopy-2/ste/JobsAdmin.go:212 +0x5ec What happens is that a JobPartPlanTransfer struct is created inside a memory mapped 'plan' file and there are several members of that struct that are accessed atomically. This requires that those members have the natural alignment for their type, which is guaranteed at the struct level, but the struct itself is not properly aligned inside the plan file. JobPartPlanTransfer is located after the fixed size JobPartPlanHeader struct and a variable size 'CommandString'. It is that variable sized string that breaks the whole alignment (most of the time). x86 doesn't care but ARM does not like it one bit and SIGBUS. To remedy this, ensure that the start of the first JobPartPlanTransfer is 8 byte aligned both when writing the file and when accessing it. This makes azcopy work on ARM64 systems that I have tested. Signed-off-by: Jeremi Piotrowski <[email protected]>
9c4099f
to
1e6a46e
Compare
If you install qemu-user on the VM, then you should be able to run the tests using |
This is fascinating: I think the testing is working correctly, but due to the emulation you're managing to reliably reproduce this issue (race condition): #1555 👍 |
@mohsha-msft sorry it took me a bit of time to do this test on a real ARM device, it worked great. @jepio thank you so much for this contribution! |
Great. I'll remove the ARM E2E tests from CI. Will leave just the executables. We'll add it once Microsoft DevOps starts supporting ARM64 Agent. |
@mohsha-msft ok let's do that and get this merged for 10.13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Implementation
N/A
How to run ARM64 compiled tests on AMD64 arch processor
This tells the compiler to compile for arm64 arch and not amd64 processor.
To emulate the behavior of ARM64, we can use QEMU Emulator. Note that it doesn't emulate the entire architecture. It just translates syscalls and other details so that program can run using the host kernel.
Keep in mind that the emulator is slow so it cannot keep up with parallel execution and can cause race condition. I, unknowingly reproduced issue #1555. So remove the race detector (-race flag) from the test command.
Note:
And we've moved from personal repository (as dependency)
github.com/jiacfan/keychain
->github.com/wastore/keychain
github.com/jiacfan/keyctl
->github.com/wastore/keyctl