Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pub/sub example creates unkillable process with 100% core usage and no output when switched to use string #436

Closed
Cypher1 opened this issue Oct 6, 2024 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@Cypher1
Copy link

Cypher1 commented Oct 6, 2024

r.e. FAQ

I've seen the 'dynamically sized' portion of the FAQ. This is not about the ability to transmit variably sized data types. This issue is about how badly iceoryx handles dynamically sized data.
Ideally this would be prevented at compile time, but at minimum the program should respond to SIGKILL/SIGTERM and should probably crash when failing to handle dynamically sized data, rather than spinning at 100% core usage.

Required information

Operating system:

  • OS name, version: Ubuntu 24.04.1 LTS
  • Additionally, on Linux, Mac Os, Unix, output of: uname -a: Linux HOSTNAME 6.8.0-45-generic # 45-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 30 12:02:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Rust version:
rustc 1.80.1 (3f5fd8dd4 2024-08-06)

Cargo version:
cargo 1.80.1 (376290515 2024-07-16)

iceoryx2 version:
v0.4.1

Detailed log output:
I think this is
a) easy to reproduce and
b) is worth reproducing, so I have not collected log data myself.

Observed result or behaviour:
./sub runs at 100% core usage and does not respond to attempts to kill it (e.g. SIGKILL, SIGTERM). It has to be SIGSTOP'd and abandoned.

pub.rs

use std::process::ExitCode;
use core::time::Duration;
use iceoryx2::prelude::*;

const CYCLE_TIME: Duration = Duration::from_secs(1);

fn main() -> ExitCode {
    let mut count = 0;
    let node = NodeBuilder::new().create::<ipc::Service>().expect("1");

    // create our port factory by creating or opening the service
    let service = node.service_builder(&"My/Funk/ServiceName4".try_into().expect("2"))
        .publish_subscribe::<String>()
        .open_or_create().expect("3");

    let publisher = service.publisher_builder().create().expect("4");

    while let NodeEvent::Tick = node.wait(CYCLE_TIME) {
        let sample = publisher.loan_uninit().expect("5");
        let sample = sample.write_payload(format!("{:?}", count));
        count += 1;
        sample.send().expect("6");
    }
    ExitCode::SUCCESS
}

sub.rs

use std::process::ExitCode;
use core::time::Duration;
use iceoryx2::prelude::*;

const CYCLE_TIME: Duration = Duration::from_secs(1);

fn main() -> ExitCode {
    let node = NodeBuilder::new().create::<ipc::Service>().expect("1");

    // create our port factory by creating or opening the service
    let service = node.service_builder(&"My/Funk/ServiceName4".try_into().expect("2"))
        .publish_subscribe::<String>()
        .open_or_create().expect("3");

    let subscriber = service.subscriber_builder().create().expect("4");

    while let NodeEvent::Tick = node.wait(CYCLE_TIME) {
        while let Some(sample) = subscriber.receive().expect("5") {
            println!("received: {:?}", *sample);
        }
    }
    ExitCode::SUCCESS
}

Expected result or behaviour:
The following alternatives would be acceptable.

  • pub/sub fails to build at compile time due to indirection/dynamically sized data
  • pub/sub fails to run due to indirection/dynamically sized data
  • pub/sub fails at runtime but can be killed

Conditions where it occurred / Performed steps:
Build the above files and attempt to run them in any order.

@Cypher1 Cypher1 added the bug Something isn't working label Oct 6, 2024
@Cypher1
Copy link
Author

Cypher1 commented Oct 6, 2024

Also, I suspect that the sub process is performing a buffer overflow and reading from memory not associated with the sender. I've not spent the time to actually prove this but it would explain the high CPU usage.

@elfenpiff
Copy link
Contributor

@Cypher1 Thanks for the bug report!

You cannot send a String with iceoryx2. The reason is that a String uses heap memory and contains a pointer. iceoryx2 supports only self-contained data types.
We already provide some containers that are shared memory compatible, in your case, take a look at the complex data types example: https://github.com/eclipse-iceoryx/iceoryx2/tree/main/examples/rust/complex_data_types

Here, we demonstrate how to use a FixedSizeByteString and a FixedSizeByteString as payload data.

Ideally this would be prevented at compile time,

With v0.5 we will introduce a macro that prevents users from using incompatible types. Also, we will add this to the documentation so that no one else will run into this issue before v0.5!

but at minimum the program should respond to SIGKILL/SIGTERM and should probably crash when failing to handle dynamically sized data, rather than spinning at 100% core usage.

You uncovered a bug in our signal handler caused by a SEGFAULT by using the string. This should definitely not happen and I completely agree. We will look into it and fix it.

@elfenpiff elfenpiff self-assigned this Oct 6, 2024
elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Oct 6, 2024
elfenpiff added a commit that referenced this issue Oct 6, 2024
…tions

[#436] Document payload type restrictions
elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Oct 6, 2024
elfenpiff added a commit that referenced this issue Oct 7, 2024
elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Oct 7, 2024
… type restriction warnings to all examples that handle with types
elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Oct 7, 2024
… type restriction warnings to all examples that handle with types
elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Oct 7, 2024
@Cypher1
Copy link
Author

Cypher1 commented Oct 7, 2024

Great to hear! Thanks

Just btw, on early reflection (and I haven't spent much time on it yet) I don't think a macro should be needed for the safety I'm describing.
I think it would be simpler (and more 'Rust') to provide a trait like SelfContained and implement it for all primitives and maybe even Copy types. Whatever is happening during the serialization / copying into shared memory step is not working for heap data and this is likely unsafe.

Anyway, sounds like you have it in hand, thanks for the speedy response

elfenpiff added a commit that referenced this issue Oct 7, 2024
…ignals-in-sighandler

[#436] Split up the sig handler into non fatal and fatal signals
elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Oct 7, 2024
elfenpiff added a commit that referenced this issue Oct 7, 2024
…tions

[#436] Add to the FAQ how to define a custom type; Add type restricti…
@elfenpiff
Copy link
Contributor

@Cypher1

I added the type restrictions and how to define payload types for iceoryx to all example documentations and the FAQ. Furthermore, with #448 the signal handling should now be correct - so if you let your String example run again, the subscriber should terminate when the SIGSEGV is received.

Just btw, on early reflection (and I haven't spent much time on it yet) I don't think a macro should be needed for the safety I'm describing.
I think it would be simpler (and more 'Rust') to provide a trait like SelfContained and implement it for all primitives and maybe even Copy types.

Yes, this was our idea. To have a proc macro with a corresponding trait like SelfContained or ShmSend which is implemented for all primitive types. So you just have to annotate your struct with #[ShmSend] and if one member does not implement the trait you get a compile failure.

Whatever is happening during the serialization / copying into shared memory step is not working for heap data and this is likely unsafe.

One of the benefits of zero copy communication is that you do not need to serialize the data at all. You can use the API so that you do not need to perform a single copy. The workflow should be like acquire memory with Publisher::loan() provide this memory to your thing that produces data - for instance V4L2 API has a call so that the camera writes the frame directly into the provided memory - and then you send it out to the next process.
So you do not have a single copy in this pipeline and then iceoryx2 becomes incredibly fast.

@elfenpiff
Copy link
Contributor

#448 and #447 are merged and solving the issue. @Cypher1 if there is still an open problem please feel free to reopen this issue or create a new one.

@Cypher1
Copy link
Author

Cypher1 commented Oct 8, 2024

Thanks! That should do it (aside from the ongoing questions around detecting safe vs unsafe types [relocatable I think would be the requirement] types at compile time). I think I'm going to have to use flatbuffer or cap'n proto (I have dynamic message size requirements and want to support communications between machines, so I can't get the perf improvements from iceoryx) but I'm really glad you all are working on it.

@elfenpiff
Copy link
Contributor

@Cypher1 Until the end of this year, we will focus on making the allocation in the publisher as dynamic as possible so that the user will no longer need to set the worst-case size. When this is done (and request-response, a lot of users are waiting for it) we will look into relocatable types then you shouldn't need to serialize data anymore which should give you another performance boost.

@Cypher1
Copy link
Author

Cypher1 commented Oct 8, 2024

Exciting to hear, Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants