doubt about the implement of the emitter-receiver scheme #119

shmily326 · 2022-12-07T12:23:07Z

Hi there, I'm deeply confused by the concrete communication process (timing) in the emitter-receiver scheme implemented in deepbots, since in Webots it takes one basic timestep to transmit and deliver the message from emitters to receivers, which means the action $a_{t}$ adopted by supervisor according to state $s_{t}$ will be delivered to robot in timeslot $t+1$, and the new state(observation) caused by $a_{t}$ will be updated and emitted to supervisor in timeslot $t+2$, which is finally presented in supervisor as $s^{\prime}$ in timeslot $t+3$.

On the basis of the above insight, I find that the transitions saved for RL training in deepbots tutorials is somewhat like $(s_{t}$, $a_{t}$, $r_{t}$, $s_{t+1})$, but in fact, the action which acted on state $s_{t}$ (or the action which robot executed indeed) is somewhat like $a_{t-3}$, there is a difference between $a_{t-3}$ and $a_{t}$ even though timestep is in the scale of millisecond.

To be honest, my question may not be too clear, I'm appreciated if someone could correct me or explain my doubt, thanks a lot!

My doubt is somewhat relative with this issue

KelvinYang0320 · 2022-12-08T13:26:37Z

@shmily326 Thank you for opening an issue! I will look into it and get back to you.

tsampazk · 2022-12-08T14:30:16Z

Thanks @shmily326! You seem to be correct in your comment, we had several issues regarding emitters and receivers since the beginning, and there were some bugs in webots too (see cyberbotics/webots#1384, where multiple issues were fixed).

As @KelvinYang0320 mentioned, we will look into it and incorporate required changes to make it work as close as possible to what is expected.

Meanwhile, i would suggest using the RobotSupervisor scheme which uses the same controller both to control the robot and act as supervisor. Its usage is much more efficient and straightforward in cases where you don't specifically require separation between robot and supervisor. If you want you can share any additional information about your use-case, so we can discuss it further.

KelvinYang0320 · 2022-12-09T03:09:53Z

Hi @shmily326
With the following modifications,
1.

def step(self, action):
        """
        The basic step method that steps the controller,
        calls the method that sends the action through the emitter
        and returns the (observations, reward, done, info) object.

        :param action: Whatever the use-case uses as an action, e.g.
            an integer representing discrete actions
        :type action: Defined by the implementation of handle_emitter
        :return: (observations, reward, done, info) as provided by the
            corresponding methods as implemented for the use-case
        """
        print(self.getFromDef("ROBOT").getPosition()[0], "step-1")
        if super(Supervisor, self).step(self.timestep) == -1:
            exit()
        print(self.getFromDef("ROBOT").getPosition()[0], "step-2")
        self.handle_emitter(action)
        print(self.getFromDef("ROBOT").getPosition()[0], "step-3")
        return (
            self.get_observations(),
            self.get_reward(action),
            self.is_done(),
            self.get_info(),
        )

def handle_emitter(self):
        """
        This emitter uses the user-implemented create_message() method to get
        whatever data the robot gathered, convert it to a string if needed and
        then use the emitter to send the data in a string utf-8 encoding to the
        supervisor.
        """
        print("handle_emitter")
        data = self.create_message()
        ...

def handle_receiver(self):
        """
        This receiver uses the basic Webots receiver-handling code. The
        use_message_data() method should be implemented to actually use the
        data received from the supervisor.
        """
        print("handle_receiver")
        if self.receiver.getQueueLength() > 0:
        ...

you will get the following in cartPoleWorldEmitterReceiver on Webots 2023a:
0:00:00:000~0:00:00:032
RESET
0.0 step-1
handle_receiver
handle_emitter

0:00:00:032~0:00:00:064
-1.546550598149922e-22 step-2
-1.546550598149922e-22 step-3
-1.546550598149922e-22 step-1
handle_receiver
handle_emitter

0:00:00:064~0:00:00:096
1.1115304692030285e-08 step-2
1.1115304692030285e-08 step-3
1.1115304692030285e-08 step-1
handle_receiver
handle_emitter

From my perspective, you will not get the next state in $t+3$. However, we do need to address this issue.

KelvinYang0320 · 2022-12-09T03:42:47Z

@shmily326 I have opened a PR to address that.
Could you check if the problem is solved?

git clone https://github.com/aidudezzz/deepbots.git
git checkout step_function
pip install -e .

shmily326 · 2022-12-09T04:28:41Z

@KelvinYang0320 Thank you for all of your time, I'm working on multi-agent RL (specifically a multi-UAV navigation scenario and Actor-Critic algorithms), thus I think the emitter-receiver scheme would be more appropriate, and I will check the "
step the controller after applying the action" method and get back to you as soon as possible.

KelvinYang0320 · 2022-12-09T05:43:02Z

@shmily326 You can take a look at this PR for a multi-robot example.
Also, we have several examples in deepworlds.

tsampazk · 2022-12-09T08:25:00Z

I'm working on multi-agent RL

That sounds great! For multi-agent scenarios indeed it can be better to have a centralized supervisor that communicates with multiple robots, so you need to use the emitter-receiver scheme. When completed, if you want, we will be happy to include your scenario as an example on our deepworlds repository! 😄

KelvinYang0320 · 2022-12-09T11:36:23Z

@shmily326 You can get updated deepbots by

git clone https://github.com/aidudezzz/deepbots.git
cd ./deepbots
pip install -e .

We have merged the PR.

KelvinYang0320 · 2022-12-27T14:55:05Z

@shmily326 Just a reminder, you can pip install git+https://github.com/aidudezzz/deepbots.git for general use before we publish the next version of deepbots on PyPI.
We would like to close this issue. Feel free to open another issue or reopen it if needed. Also, we will be glad if you share your work or experience with us. 😄

tsampazk transferred this issue from aidudezzz/deepbots-tutorials Dec 7, 2022

tsampazk added the question Further information is requested label Dec 7, 2022

KelvinYang0320 linked a pull request Dec 9, 2022 that will close this issue

Fixed step functions: step the controller after applying the action #120

Merged

tsampazk added refactor Refactoring existing code base bug Something isn't working and removed refactor Refactoring existing code base labels Dec 9, 2022

tsampazk added this to the Release 0.2.0 milestone Dec 9, 2022

KelvinYang0320 self-assigned this Dec 9, 2022

KelvinYang0320 closed this as completed in #120 Dec 9, 2022

KelvinYang0320 reopened this Dec 9, 2022

KelvinYang0320 closed this as completed Dec 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doubt about the implement of the emitter-receiver scheme #119

doubt about the implement of the emitter-receiver scheme #119

shmily326 commented Dec 7, 2022 •

edited

Loading

KelvinYang0320 commented Dec 8, 2022

tsampazk commented Dec 8, 2022

KelvinYang0320 commented Dec 9, 2022

KelvinYang0320 commented Dec 9, 2022

shmily326 commented Dec 9, 2022

KelvinYang0320 commented Dec 9, 2022 •

edited

Loading

tsampazk commented Dec 9, 2022

KelvinYang0320 commented Dec 9, 2022

KelvinYang0320 commented Dec 27, 2022

doubt about the implement of the emitter-receiver scheme #119

doubt about the implement of the emitter-receiver scheme #119

Comments

shmily326 commented Dec 7, 2022 • edited Loading

KelvinYang0320 commented Dec 8, 2022

tsampazk commented Dec 8, 2022

KelvinYang0320 commented Dec 9, 2022

KelvinYang0320 commented Dec 9, 2022

shmily326 commented Dec 9, 2022

KelvinYang0320 commented Dec 9, 2022 • edited Loading

tsampazk commented Dec 9, 2022

KelvinYang0320 commented Dec 9, 2022

KelvinYang0320 commented Dec 27, 2022

shmily326 commented Dec 7, 2022 •

edited

Loading

KelvinYang0320 commented Dec 9, 2022 •

edited

Loading