Message latency depends on the workload of the process owning the data writer #5504

MMarcus95 · 2024-12-17T19:46:26Z

Is there an already existing issue for this?

I have searched the existing issues

Expected behavior

The time between message publication and message reception does not depend on the workload of the of the process owning the data writer

Current behavior

The time between message publication and message reception depends on the workload of the of the process owning the data writer

Steps to reproduce

I'm testing the message latency between a data writer and a data reader. They belong to separate processes, and I'm using the discovery server, launching the server in a third process. To publish the message I'm using the following while loop

RoundTripTimeMsg msg;
while (true)
{
    msg.id() += 1;
    msg.timestamp() = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now().time_since_epoch()).count();
    writer->write(&msg);
    
    for (int i = 0; i < 10000000; i++) {
        // Simulating workload
    }
}

The for loop simulates a workload. RoundTripTimeMsg is instead a custom message defined as follows

struct RoundTripTimeMsg{
    unsigned long long id;
    long double timestamp;
};

The data reader is instead receiving the message and printing the elapsed time between the current time and the time saved in the timestamp field of the received message. Its callback does basically the following

auto received_msg = (RoundTripTimeMsg *)msg;
auto now = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now().time_since_epoch()).count();
std::cout << "Received message with id: " << received_msg->id() << " and delta time: " << (now - received_msg->timestamp())/1000000.0 << " ms" << std::endl;

Both the data writer and the data reader use their default QoS.

I have noticed that if inside the while loop I reduce the workload (using a lower number in the condition) the data reader receives a message earlier. If I increase the workload, the message is received later. What is it happening?

Fast DDS version/commit

v3.1.0

Platform/Architecture

Other. Please specify in Additional context section.

Transport layer

UDPv4

Additional context

Platform: Ubuntu Jammy Jellyfish 22.04 amd64

XML configuration file

No response

Relevant log output

No response

Network traffic capture

No response

The text was updated successfully, but these errors were encountered:

EugenioCollado · 2024-12-19T10:48:08Z

Hi @MMarcus95 ,

We have replicated this same context you are describing and have not encountered any latency. Based on your description, the observed latency behavior is likely due to CPU resource contention and not an actual DDS-related issue. The workload simulation inside the DataWriter's loop can monopolize the CPU core it is running on, potentially delaying the other Fast DDS middleware threads if they are running on the same core.

To better understand the cause of the issue, please provide additional details about your system’s configuration, including the number of CPU cores, and observe the CPU core usage during your tests.

Thank you!

MMarcus95 · 2024-12-19T14:11:44Z

Hi @EugenioCollado,

thanks for the feedback.

I'm running the test I was talking about inside a Docker Image with Ubuntu 22.04. The CPU is an Intel 13th Gen i7-13700H. In the following there is a screenshot of the lscpu command

This is the CPU core usage when using the same workload I was mentioning before (so the condition number for the for loop is 10000000)

In this other example instead, I'm reducing the workload, having as condition number of the for loop 100000

In both cases, the sender thread is indeed using around 100% of CPU.

However, I see that the message latency is much lower in the second case, as you can see in this plot

Please let me know if you need any other information.

Thanks!

EugenioCollado · 2024-12-24T15:45:27Z

Hi @MMarcus95 ,

After further investigation, we have replicated the behavior you described. It appears that the issue is not directly related to CPU workload but rather to the publication frequency itself. In your case, the workload in the loop was consuming time, effectively reducing the publication frequency. This can be checked by replacing the workload simulation with sleep calls and seeing the issue persists.

We believe the observed "latency" is a result of how the kernel and its scheduler manage processes. Specifically, the process might be removed from the core's cache and subsequently reloaded, introducing delays.

Our tests show that actions such as isolating a core, forcing the process to run on that core, increasing its priority, and similar mechanisms to prevent the CPU from reloading the process when the publisher is called significantly improve the situation. We would appreciate it if you could test these actions on your side and share your feedback on the results. This would help confirm our conclusions and ensure that these approaches effectively mitigate the issue in your environment.

That said, since this behavior stems from kernel-level process management and not from DDS itself, it is not currently on our roadmap to implement changes addressing this issue in Fast DDS. However, if you'd like, you are welcome to reach out to us directly, and we can explore potential solutions tailored to your specific use case.

Thank you!

MMarcus95 added the triage Issue pending classification label Dec 17, 2024

EugenioCollado added need more info Issue that requires more info from contributor and removed triage Issue pending classification labels Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Message latency depends on the workload of the process owning the data writer #5504

Message latency depends on the workload of the process owning the data writer #5504

MMarcus95 commented Dec 17, 2024

EugenioCollado commented Dec 19, 2024

MMarcus95 commented Dec 19, 2024

EugenioCollado commented Dec 24, 2024

Message latency depends on the workload of the process owning the data writer #5504

Message latency depends on the workload of the process owning the data writer #5504

Comments

MMarcus95 commented Dec 17, 2024

Is there an already existing issue for this?

Expected behavior

Current behavior

Steps to reproduce

Fast DDS version/commit

Platform/Architecture

Transport layer

Additional context

XML configuration file

Relevant log output

Network traffic capture

EugenioCollado commented Dec 19, 2024

MMarcus95 commented Dec 19, 2024

EugenioCollado commented Dec 24, 2024