Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow sharing DomainParticipant with C++ applications #25

Merged
merged 3 commits into from
Apr 27, 2021

Conversation

asorbini
Copy link
Collaborator

@asorbini asorbini commented Apr 7, 2021

This PR adjusts an internal Connext QoS policy to enable sharing of the DomainParticipant created by rmw_connextdds using the C API, with ROS 2 applications using the Modern C++ API.

With this changes, it is possible for the application to access the DomainParticipant and reuse it, e.g. to create endpoints, instead of creating a dedicated participant:

#include <chrono>
#include <cstdio>
#include <memory>
#include <utility>

#include "rclcpp/rclcpp.hpp"

// Include RTI Connext DDS "modern C++" API
#include <dds/dds.hpp>
// Include type support code generated by rtiddsgen
#include "std_msgs/msg/String.hpp"

using namespace std::chrono_literals;
using namespace dds::core;

class DdsTalker : public rclcpp::Node
{
public:
  DdsTalker()
  : Node("dds_talker"),
    count_(0)
  {
    // Create a function to send messages periodically.
    setvbuf(stdout, NULL, _IONBF, BUFSIZ);
    auto publish_message =
      [this]() -> void
      {
        std_msgs::msg::String msg("Hello World: " + std::to_string(count_++));
        RCLCPP_INFO(this->get_logger(), "Publishing: '%s'", msg.data().c_str());
        writer_.write(msg);
      };
    // Create a DataWriter for topic "rt/chatter"
    auto participant = dds::domain::find(0);
    assert(dds::core::null != participant);
    dds::pub::Publisher publisher(participant);
    dds::topic::Topic<std_msgs::msg::String> topic(participant,
      "rt/chatter", "std_msgs::msg::dds_::String_");
    rti::core::policy::Property props;
    // Required to properly support unbounded types: size of the pre-allocated
    // sample memory pool. Sample's beyond this size will be dynamically allocated.
    props.set({"dds.data_writer.history.memory_manager.fast_pool.pool_buffer_max_size", "1024"}, false);
    auto writer_qos = publisher.default_datawriter_qos(); 
    writer_qos << props;
    writer_qos << policy::History(policy::HistoryKind::KEEP_LAST, 7);
    writer_ = dds::pub::DataWriter<std_msgs::msg::String>(publisher, topic, writer_qos);

    // Use a timer to schedule periodic message publishing.
    timer_ = this->create_wall_timer(1s, publish_message);
  }

private:
  size_t count_ = 1;
  dds::pub::DataWriter<std_msgs::msg::String> writer_{nullptr};
  rclcpp::TimerBase::SharedPtr timer_;
};

int main(int argc, char * argv[])
{
  rclcpp::init(argc, argv);
  rclcpp::spin(std::make_shared<DdsTalker>());
  rclcpp::shutdown();
  return 0;
}

The application must be linked with -rdynamic in order to share the DDS DomainParticipantFactory with rmw_connextdds. This can be achieved in cmake with:

set_target_properties(executable PROPERTIES ENABLE_EXPORTS true)

It would be great to include this PR in Galactic, since it's such a trivial change and it enables access to some features which are not yet available in the RMW (e.g. zero copy).

The modified QoS Policy has no effect on the C core, and is only used by other language bindings built around it. I can run CI for this set of changes, but I don't expect any effect to "normal operations".

@ivanpauno
Copy link
Member

Full CI:

  • Linux Build Status
  • macOS Build Status
  • Windows Build Status

@asorbini
Copy link
Collaborator Author

asorbini commented Apr 8, 2021

For some reason, there were a bunch of failures which, from a quick inspection, all seem to be related from that "dreaded" (because unnecessary) exception, thrown by rosidl_typesuppor_cpp:

'Handle's typesupport identifier (rosidl_typesupport_cpp) is not supported by this library, at /Users/osrf/jenkins-agent/workspace/ci_osx/ws/src/ros2/rosidl_typesupport/rosidl_typesupport_cpp/src/type_support_dispatch.hpp:113'

Almost had a heart attack when I opened the Test Results, but I think we can avoid any coronary attacks if rosidl_typesupport#102 is completed. I had opened a PR about this but closed it after I noticed the other, and also, just removing/downgrading the statement didn't seem to be enough, because there were tests depending on that.

Anyway, I have no idea why things would start failing only here and now. I don't see how these small changes might have such catastrophic effects.

@ivanpauno
Copy link
Member

ivanpauno commented Apr 8, 2021

I'm not sure if ros2/rosidl_typesupport#102 is the solution or not.
Maybe @clalancette can chime in.

I can also take a closer look tomorrow, but I need some hours to understand what's happening.


I can easily reproduce the problems locally.
Go to <your_ws_root>/build/rclcpp/test/rclcpp and with the workspace sourced run:

RMW_IMPLEMENTATION=rmw_connextdds ./test_allocator_memory_strategy

you will see the same failures that the last CI job (with this PR or not).

This didn't happen before because all the CI_USE_CONNEXTDDS, CI_USE_FASTRTPS_STATIC and CI_USE_CYCLONEDDS checkboxes were selected.
In this case I disabled cyclonedds to reduce the time the CI jobs were going to take (I kept the fastrtps checkbox because I'm almost sure it will fail to build without it).
Most tests are only run with the "default rmw" (cyclonedds). If that's not present, a default rmw is chosen using alphabetic order (in this case rmw_connextdds).
So we never saw this tests failing before because they were never run 😄 .

The purpose of the "single rmw" jobs in build.ros2.org is to catch this kind of bugs (e.g. https://build.ros2.org/view/Rci/job/Rci__nightly-connext_ubuntu_focal_amd64/), but I think that job isn't working for the new rmw_connextdds implementation.

(edit) rephrased some parts, corrected spelling mistakes

@ivanpauno
Copy link
Member

In this case I disabled cyclonedds to reduce the time the CI jobs were going to take

I also typically select only one of the rmw boxes when testing a PR in a rmw implementation, to actually run all the tests with that implementation.
In this case I didn't think about alphabetic order 😂, so that happend by chance.
I was only trying to reduce the time the jobs take to run.

@asorbini
Copy link
Collaborator Author

asorbini commented Apr 8, 2021

I can easily reproduce the problems locally.

A'ight, spongebobgettingupfromchair.jpg

I'll be taking a look and report back once I have some idea of what's going on. I'll test with #22 because I expect that to be merged soon.

@asorbini
Copy link
Collaborator Author

asorbini commented Apr 8, 2021

Re: failure in test_allocator_memory_strategy, I think this should have been changed to rmw_connextdds. Instead, I mistakenly thought the test was passing without that exception (when instead I wasn't actually running it), and we ended up removing the exception altogether. I still don't quite understand the implications of it, but restoring the exception for rmw_connextdds I think would make the test pass. I'm in the process of re-building and checking (same with all other failures).

@clalancette
Copy link
Contributor

I'm not sure if ros2/rosidl_typesupport#102 is the solution or not.

I don't think we are going to put that PR in for Galactic. While I think it is an improvement, it does change the contract. Given where we are in the release cycle, that would break API (at least in spirit).

In general, this looks like a feature, not a bugfix, so I'll say that we shouldn't put it in at this time. What I will say, though, is that it looks like it is ABI and API backwards-compatible, so it is something we can put in for a Galactic patch release.

That being said, I'd be OK with reviewing and merging the PRs that restore the test exceptions so we can prepare for this in a Galactic patch release. So I'll go review some of that.

@asorbini
Copy link
Collaborator Author

asorbini commented Apr 9, 2021

In general, this looks like a feature, not a bugfix, so I'll say that we shouldn't put it in at this time. What I will say, though, is that it looks like it is ABI and API backwards-compatible, so it is something we can put in for a Galactic patch release.

I personally consider it a bugfix, because I expected this feature to work out of the box. All the failures are not related to this PR, but they were already present and I had failed to detect them because they only occur if colcon test is run with rmw_connextdds as the default implementation, which I had not done so far.

That's why several tests are "passing in CI" but actually need some fixes to pass (because they are not actually even running in CI).

@asorbini
Copy link
Collaborator Author

asorbini commented Apr 9, 2021

About rosidl_typesupport#102, I think it's ok if it's not merged. I took one out of rmw_cyclonedds_cpp's book, and applied the same "technique" of resetting the error string after calling get_message_type_support_handle() (see #22). This should get rid of those spurious error messages (and it does seem to resolve a bunch of failures caused by the unexpected error string).

@ivanpauno
Copy link
Member

e.g. https://build.ros2.org/view/Rci/job/Rci__nightly-connext_ubuntu_focal_amd64/), but I think that job isn't working for the new rmw_connextdds implementation.

FYI that job is actually working, I saw the yesterdays failure and got confused.
The test failures look pretty similar to that ones we got here.
Note: Tests in build.ros2.org are run with the last released artifacts (in the testing repository).

@ivanpauno
Copy link
Member

Running CI in the same conditions it's being run in nightlies, just in case:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@clalancette
Copy link
Contributor

I personally consider it a bugfix, because I expected this feature to work out of the box. All the failures are not related to this PR, but they were already present and I had failed to detect them because they only occur if colcon test is run with rmw_connextdds as the default implementation, which I had not done so far.

Right, but it is still a feature. We can't continually put new features into Galactic; otherwise the freeze is meaningless. I still think this should be put in in the first Galactic patch release.

@ivanpauno
Copy link
Member

Right, but it is still a feature. We can't continually put new features into Galactic; otherwise the freeze is meaningless. I still think this should be put in in the first Galactic patch release.

I didn't see the previous discussion, sorry.
I agree this looks more like a feature than a bug fix, so merging after the freeze is finished sounds reasonable to me.

This can definetely be included in the first patch release.

@asorbini
Copy link
Collaborator Author

Right, but it is still a feature. We can't continually put new features into Galactic; otherwise the freeze is meaningless. I still think this should be put in in the first Galactic patch release.

I understand your hesitancy, but I don't believe we have been "continually" adding new features. Everything so far has been trying to address some pretty major problems that unfortunately were discovered only in the last 10 days.

It would have been nice to catch #21 a week earlier, but I did my best to file #22 and resolve everything in the most timely fashion to account for the impeding (and now active) freeze.

It's also unfortunate that I didn't realize sooner that there were several tests that had not actually run with rmw_connextdds because they are only run with the default implementation. I also tried to resolve any of those issues very quickly, and as of now, they should all be fixed by #22 and #26 (pending confirmation of a concurrency bug in rclcpp).

In the case of this PR, it's a very minor change, which opens up a wide range of new use cases for ROS 2/Connext users that I would really like for rmw_connextdds to be able to support in Galctic "out of the box".

If it's still "one bridge too far", then I have no problem waiting for the first patch release, which I'm assuming will be in a few months, but I would appreciate a minor exception if possible.

@clalancette
Copy link
Contributor

I understand your hesitancy, but I don't believe we have been "continually" adding new features.

Apologies, that comment wasn't meant particularly for you. It's more a general statement of the state of the Galactic release. We're in a pretty good place now, and we want to minimize the disruption to testing from here to the release date on May 23rd. Given that this change is a feature, and does not break an API or ABI, I think it makes sense to wait for the first patch release. Thanks for your understanding.

@asorbini
Copy link
Collaborator Author

No worries, I'm happy #22 made it in, and I think things look pretty good as they are!

I have one release-related question to help me better understand the ROS 2 release process (and the workflow for this feature).

Once the galactic branch is created, we will only be committing bug fixes based on (failed) testing. At the same time, after branching, we can merge this PR to master. Then once Galactic has been tagged/released, we can open another PR to backport the feature to Galactic for the next patch release.

Is my understanding correct?

@clalancette
Copy link
Contributor

Once the galactic branch is created, we will only be committing bug fixes based on (failed) testing. At the same time, after branching, we can merge this PR to master. Then once Galactic has been tagged/released, we can open another PR to backport the feature to Galactic for the next patch release.

Is my understanding correct?

Yep, you have it exactly right.

@ivanpauno
Copy link
Member

Once the galactic branch is created, we will only be committing bug fixes based on (failed) testing. At the same time, after branching, we can merge this PR to master. Then once Galactic has been tagged/released, we can open another PR to backport the feature to Galactic for the next patch release.

Is my understanding correct?

Yes, that's exactly the pattern we follow in all the "core" repos.


I remember in the past some rmw implementations were maintaining versions for different ROS distros in the same branch, using some version macros that rmw exports.
That's also an option, but it differs from other repos and sometimes causes confusions.
e.g.: people typically forget of that and create a new branch in the next release (though that can be corrected afterwards).

@asorbini
Copy link
Collaborator Author

I remember in the past some rmw implementations were maintaining versions for different ROS distros in the same branch, using some version macros that rmw exports.
That's also an option, but it differs from other repos and sometimes causes confusions.
e.g.: people typically forget of that and create a new branch in the next release (though that can be corrected afterwards).

rmw_connextdds used to do exactly that until recently, when I got rid of all "feature flags" and split everything into dedicated branches (see #15, #16, #17, and #18). I liked the idea of having a single branch to keep everything in one place, because I've got pretty accustomed to "preprocessing" heavily #if/#else'd code, but I felt it was a pretty big overhead for other contributors once PRs started coming in recently.

I'm happy to revisit the approach in the future, but I think it would have to be introduced as a more general practice to the developer community (e.g. by adopting it also in other core repositories) to avoid increasing the "learning curve" for effective contributions only for this repository.

@asorbini asorbini added galactic-backports PR should be backported to Galactic. humble PR scheduled for the H-turtle labels Apr 13, 2021
@asorbini asorbini force-pushed the asorbini/share-participant branch from 0f8390e to cb90c7b Compare April 14, 2021 21:15
@asorbini
Copy link
Collaborator Author

Rebased on top of master (08e456d)

@asorbini asorbini added dashing-backports PR should be backported to Dashing eloquent-backports PR should be backported to Eloquent foxy-backports PR should be backported to Foxy labels Apr 14, 2021
asorbini added a commit that referenced this pull request Apr 14, 2021
* Allow sharing DomainParticipant with C++ applications

Signed-off-by: Andrea Sorbini <[email protected]>
asorbini added a commit that referenced this pull request Apr 14, 2021
* Allow sharing DomainParticipant with C++ applications

Signed-off-by: Andrea Sorbini <[email protected]>
asorbini added a commit that referenced this pull request Apr 14, 2021
* Allow sharing DomainParticipant with C++ applications

Signed-off-by: Andrea Sorbini <[email protected]>
@asorbini
Copy link
Collaborator Author

@clalancette @ivanpauno since we are out of galactic freeze, I'm going to merge this PR to master, unless there are any objections. Same for #41.

I can run a "CI sanity check" (with only Connext enabled) for both PRs before merging, just to verge on the (extremely) safe side, but both PRs should be good to go already.

Thoughts?

@clalancette
Copy link
Contributor

@clalancette @ivanpauno since we are out of galactic freeze, I'm going to merge this PR to master, unless there are any objections. Same for #41.

I don't see a CI run for #41, but maybe it was run elsewhere. I will suggest doing another CI run with this and #41 just to make sure everything is green, but otherwise we are back to our normal "one approval and green CI to merge to master" rules.

@asorbini
Copy link
Collaborator Author

Started CI for #41. I'll run it again for this PR too, but I'll wait until some plans finish (plans are running for #43 too).

@asorbini
Copy link
Collaborator Author

CI validation:

  • Linux Build Status
  • macOS Build Status
  • Windows Build Status

@asorbini
Copy link
Collaborator Author

Failures in rclcpp.TestExecutor.add_remove_node_thread_safe (see this comment) and interactive_markers.TestInteractiveMarkerClient.states (see #40) are expected so I'm going to merge this PR.

@asorbini asorbini merged commit 92234d1 into master Apr 27, 2021
@delete-merged-branch delete-merged-branch bot deleted the asorbini/share-participant branch April 27, 2021 23:22
asorbini added a commit that referenced this pull request Apr 27, 2021
* Allow sharing DomainParticipant with C++ applications
* Add comment about UserObjectQosPolicy
* Resolve uncrustify errors

Signed-off-by: Andrea Sorbini <[email protected]>
asorbini added a commit that referenced this pull request Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dashing-backports PR should be backported to Dashing eloquent-backports PR should be backported to Eloquent foxy-backports PR should be backported to Foxy galactic-backports PR should be backported to Galactic. humble PR scheduled for the H-turtle
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants