Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jepio - ARM Support + CI Changes #1577

Merged
merged 5 commits into from
Oct 13, 2021
Merged

Conversation

mohsha-msft
Copy link
Contributor

@mohsha-msft mohsha-msft commented Sep 30, 2021

Implementation

N/A

How to run ARM64 compiled tests on AMD64 arch processor

GOARCH=arm64 GOOS=linux go build -o azcopy_linux_arm64

This tells the compiler to compile for arm64 arch and not amd64 processor.

To emulate the behavior of ARM64, we can use QEMU Emulator. Note that it doesn't emulate the entire architecture. It just translates syscalls and other details so that program can run using the host kernel.

 GOARCH=arm64 GOOS=linux go test -c -o test.arm64
./test.arm64

Keep in mind that the emulator is slow so it cannot keep up with parallel execution and can cause race condition. I, unknowingly reproduced issue #1555. So remove the race detector (-race flag) from the test command.

Note:

And we've moved from personal repository (as dependency)

  1. github.com/jiacfan/keychain -> github.com/wastore/keychain
  2. github.com/jiacfan/keyctl -> github.com/wastore/keyctl

jepio and others added 3 commits September 23, 2021 10:48
The current version of azcopy dies on ARM systems with a SIGBUS and the following stacktrace:

  INFO: Scanning...
  INFO: Any empty folders will not be processed, because source and/or destination doesn't have full folder support
  unexpected fault address 0xffff72640a5f
  fatal error: fault
  [signal SIGBUS: bus error code=0x1 addr=0xffff72640a5f pc=0x733d8c]

  goroutine 25 [running]:
  runtime.throw(0x9de596, 0x5)
          /usr/lib/go-1.16/src/runtime/panic.go:1117 +0x54 fp=0x4000d982f0 sp=0x4000d982c0 pc=0x43914
  runtime.sigpanic()
          /usr/lib/go-1.16/src/runtime/signal_unix.go:731 +0x284 fp=0x4000d98330 sp=0x4000d982f0 pc=0x5b214
  github.com/Azure/azure-storage-azcopy/v10/common.(*TransferStatus).AtomicLoad(...)
          /home/jeremi/github/azure-storage-azcopy-2/common/fe-ste-models.go:690
  github.com/Azure/azure-storage-azcopy/v10/ste.(*JobPartPlanTransfer).TransferStatus(...)
          /home/jeremi/github/azure-storage-azcopy-2/ste/JobPartPlan.go:377
  github.com/Azure/azure-storage-azcopy/v10/ste.(*jobPartMgr).ScheduleTransfers(0x4000adc000, 0xb54930, 0x4000ad2000)
          /home/jeremi/github/azure-storage-azcopy-2/ste/mgr-JobPartMgr.go:396 +0x42c fp=0x4000d99f20 sp=0x4000d98340 pc=0x733d8c
  github.com/Azure/azure-storage-azcopy/v10/ste.(*jobsAdmin).scheduleJobParts(0x4000496000)
          /home/jeremi/github/azure-storage-azcopy-2/ste/JobsAdmin.go:287 +0x44 fp=0x4000d99fd0 sp=0x4000d99f20 pc=0x724ee4
  runtime.goexit()
          /usr/lib/go-1.16/src/runtime/asm_arm64.s:1130 +0x4 fp=0x4000d99fd0 sp=0x4000d99fd0 pc=0x78a44
  created by github.com/Azure/azure-storage-azcopy/v10/ste.initJobsAdmin
          /home/jeremi/github/azure-storage-azcopy-2/ste/JobsAdmin.go:212 +0x5ec

What happens is that a JobPartPlanTransfer struct is created inside a memory
mapped 'plan' file and there are several members of that struct that are
accessed atomically. This requires that those members have the natural
alignment for their type, which is guaranteed at the struct level, but the
struct itself is not properly aligned inside the plan file. JobPartPlanTransfer
is located after the fixed size JobPartPlanHeader struct and a variable size
'CommandString'. It is that variable sized string that breaks the whole
alignment (most of the time). x86 doesn't care but ARM does not like it
one bit and SIGBUS.

To remedy this, ensure that the start of the first JobPartPlanTransfer is 8
byte aligned both when writing the file and when accessing it. This makes
azcopy work on ARM64 systems that I have tested.

Signed-off-by: Jeremi Piotrowski <[email protected]>
@jepio
Copy link
Member

jepio commented Oct 4, 2021

If you install qemu-user on the VM, then you should be able to run the tests using qemu-aarch64 on linux, but you'll have to disable -race. This article is relevant: https://ctrl-c.us/posts/test-goarch.html.

@jepio
Copy link
Member

jepio commented Oct 4, 2021

This is fascinating: I think the testing is working correctly, but due to the emulation you're managing to reliably reproduce this issue (race condition): #1555 👍

@mohsha-msft
Copy link
Contributor Author

Hey @jepio ,

Thanks for the help with QEMU. I was able to test ARM64 with this but, as you might have seen, I found race condition.
I, unknowingly, reproduced error #1555 . I'll talk to my team about it.

@zezha-msft
Copy link
Contributor

@mohsha-msft sorry it took me a bit of time to do this test on a real ARM device, it worked great.

@jepio thank you so much for this contribution!

@mohsha-msft
Copy link
Contributor Author

@zezha-msft ,

Great. I'll remove the ARM E2E tests from CI. Will leave just the executables. We'll add it once Microsoft DevOps starts supporting ARM64 Agent.

@zezha-msft
Copy link
Contributor

@mohsha-msft ok let's do that and get this merged for 10.13

@zezha-msft zezha-msft added this to the 10.13.0 milestone Oct 12, 2021
Copy link
Contributor

@zezha-msft zezha-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mohsha-msft mohsha-msft merged commit 0d6165c into dev Oct 13, 2021
@mohsha-msft mohsha-msft deleted the mohsha-msft/branch-from-jepio-arm branch October 13, 2021 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants