Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issue with TCP and IPv6 #211

Closed
janvanbouwel opened this issue Mar 20, 2023 · 3 comments
Closed

Memory issue with TCP and IPv6 #211

janvanbouwel opened this issue Mar 20, 2023 · 3 comments

Comments

@janvanbouwel
Copy link
Contributor

janvanbouwel commented Mar 20, 2023

Using Java, on Linux with libzt dev branch

I switched from a normal network (using ipv4) to an ad-hoc network which obviously uses ipv6. UDP continued to work (although I'm using a custom wrapper around ZeroTierNative, I'm not sure if ZeroTierDatagramSocket supports ipv6). TCP however is broken with a memory issue.

In my actual application (where a lot more is going on, for example UDP traffic), I get one of the following errors within at most 4 TCP connections, usually on the client side.

For testing I adapted the Java example slightly to use an ad hoc network and put both client and server sockets in a loop with nothing else going on (see code at bottom, can share full code if necessary). With that the error is a lot more sporadic, sometimes on the first connection, sometimes after 20 or more. It happens both when multi-threading, or single-threaded where every connection should be properly closed before opening a new one.

I have not tried this yet in other languages, or using ipv6 in a managed network. Reading the log it created didn't help, and the core dump it mentioned is seemingly not generated (I'm not very familiar with this kind of debugging).

Any ideas, any more info you need or tips on debugging this?

Errors

On client side.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f93b114786a, pid=128500, tid=128539
#
# JRE version: OpenJDK Runtime Environment (17.0.2+8) (build 17.0.2+8-86)
# Java VM: OpenJDK 64-Bit Server VM (17.0.2+8-86, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libzt.so+0x4f86a][thread 128572 also had an error]
  sys_sem_signal+0xa
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/jan/git/adhoc-problem/core.128500)
#
# An error report file with more information is saved as:
# /home/jan/git/adhoc-problem/hs_err_pid128500.log
int_mallinfo(): unaligned fastbin chunk detected

An error I got on the server side.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f9c98564f79, pid=129183, tid=129184
#
# JRE version: OpenJDK Runtime Environment (17.0.2+8) (build 17.0.2+8-86)
# Java VM: OpenJDK 64-Bit Server VM (17.0.2+8-86, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libzt.so+0x6cf79]  tcp_output+0x199
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/jan/git/adhoc-problem/core.129183)
#
# An error report file with more information is saved as:
# /home/jan/git/adhoc-problem/hs_err_pid129183.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Client and Server

Only changes to libzt java example, + using networkid ff0000ffff000000 and server's ipv6 address.

Server

ZeroTierServerSocket listener = new ZeroTierServerSocket(port);
while(true) {

    ZeroTierSocket conn = listener.accept();
    ZeroTierInputStream inputStream = conn.getInputStream();
    DataInputStream dataInputStream = new DataInputStream(inputStream);
    String message = dataInputStream.readUTF();
    System.out.println("recv: " + message);
    conn.close();
}

Client (delay set to 500ms or so when using threads)

while(true) {
// new Thread(()-> {
    try {
        ZeroTierSocket socket = new ZeroTierSocket(remoteAddr, port);
        ZeroTierOutputStream outputStream = socket.getOutputStream();
        DataOutputStream dataOutputStream = new DataOutputStream(outputStream);
        dataOutputStream.writeUTF("Hello from java!");
        socket.close();
        System.out.println(1);
    } catch (Exception e) {
        System.out.println(e);
    }

// }).start();


  ZeroTierNative.zts_util_delay(20);
}
@janvanbouwel
Copy link
Contributor Author

@joseph-henry Any ideas? Maybe just on how to approach debugging this? This is quite the showstopper for me.

@janvanbouwel
Copy link
Contributor Author

Instead of an ad-hoc network I tried with a normal network, set up with both IPv4 and IPv6 addresses. Using the assigned ipv6 address gives me the same error, I did not encounter it with ipv4.

@bostick
Copy link
Contributor

bostick commented Aug 4, 2023

FYI, I am looking at this and can reproduce with some work:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000126d4d950, pid=72164, tid=28931
#
# JRE version: OpenJDK Runtime Environment Homebrew (20.0.1) (build 20.0.1)
# Java VM: OpenJDK 64-Bit Server VM Homebrew (20.0.1, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64)
# Problematic frame:
# C  [libzt.dylib+0x41950]  tcp_output+0x4a0
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/brenton/development/projects/memory-issue/hs_err_pid72164.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/Homebrew/homebrew-core/issues
#

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

bostick added a commit to zerotier/ZeroTierOne that referenced this issue Aug 21, 2023
joseph-henry added a commit that referenced this issue Aug 21, 2023
Fix #211: Use tcpip_input for IPv6 instead of ethernet_input
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants