Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doubt about the implement of the emitter-receiver scheme #119

Closed
shmily326 opened this issue Dec 7, 2022 · 9 comments · Fixed by #120
Closed

doubt about the implement of the emitter-receiver scheme #119

shmily326 opened this issue Dec 7, 2022 · 9 comments · Fixed by #120
Assignees
Labels
bug Something isn't working question Further information is requested
Milestone

Comments

@shmily326
Copy link

shmily326 commented Dec 7, 2022

Hi there, I'm deeply confused by the concrete communication process (timing) in the emitter-receiver scheme implemented in deepbots, since in Webots it takes one basic timestep to transmit and deliver the message from emitters to receivers, which means the action $a_{t}$ adopted by supervisor according to state $s_{t}$ will be delivered to robot in timeslot $t+1$, and the new state(observation) caused by $a_{t}$ will be updated and emitted to supervisor in timeslot $t+2$, which is finally presented in supervisor as $s^{\prime}$ in timeslot $t+3$.

On the basis of the above insight, I find that the transitions saved for RL training in deepbots tutorials is somewhat like $(s_{t}$, $a_{t}$, $r_{t}$, $s_{t+1})$, but in fact, the action which acted on state $s_{t}$ (or the action which robot executed indeed) is somewhat like $a_{t-3}$, there is a difference between $a_{t-3}$ and $a_{t}$ even though timestep is in the scale of millisecond.

To be honest, my question may not be too clear, I'm appreciated if someone could correct me or explain my doubt, thanks a lot!

My doubt is somewhat relative with this issue

@tsampazk tsampazk transferred this issue from aidudezzz/deepbots-tutorials Dec 7, 2022
@tsampazk tsampazk added the question Further information is requested label Dec 7, 2022
@KelvinYang0320
Copy link
Member

@shmily326 Thank you for opening an issue! I will look into it and get back to you.

@tsampazk
Copy link
Member

tsampazk commented Dec 8, 2022

Thanks @shmily326! You seem to be correct in your comment, we had several issues regarding emitters and receivers since the beginning, and there were some bugs in webots too (see cyberbotics/webots#1384, where multiple issues were fixed).

As @KelvinYang0320 mentioned, we will look into it and incorporate required changes to make it work as close as possible to what is expected.

Meanwhile, i would suggest using the RobotSupervisor scheme which uses the same controller both to control the robot and act as supervisor. Its usage is much more efficient and straightforward in cases where you don't specifically require separation between robot and supervisor. If you want you can share any additional information about your use-case, so we can discuss it further.

@KelvinYang0320
Copy link
Member

Hi @shmily326
With the following modifications,
1.

def step(self, action):
        """
        The basic step method that steps the controller,
        calls the method that sends the action through the emitter
        and returns the (observations, reward, done, info) object.

        :param action: Whatever the use-case uses as an action, e.g.
            an integer representing discrete actions
        :type action: Defined by the implementation of handle_emitter
        :return: (observations, reward, done, info) as provided by the
            corresponding methods as implemented for the use-case
        """
        print(self.getFromDef("ROBOT").getPosition()[0], "step-1")
        if super(Supervisor, self).step(self.timestep) == -1:
            exit()
        print(self.getFromDef("ROBOT").getPosition()[0], "step-2")
        self.handle_emitter(action)
        print(self.getFromDef("ROBOT").getPosition()[0], "step-3")
        return (
            self.get_observations(),
            self.get_reward(action),
            self.is_done(),
            self.get_info(),
        )
def handle_emitter(self):
        """
        This emitter uses the user-implemented create_message() method to get
        whatever data the robot gathered, convert it to a string if needed and
        then use the emitter to send the data in a string utf-8 encoding to the
        supervisor.
        """
        print("handle_emitter")
        data = self.create_message()
        ...
def handle_receiver(self):
        """
        This receiver uses the basic Webots receiver-handling code. The
        use_message_data() method should be implemented to actually use the
        data received from the supervisor.
        """
        print("handle_receiver")
        if self.receiver.getQueueLength() > 0:
        ...

you will get the following in cartPoleWorldEmitterReceiver on Webots 2023a:
0:00:00:000~0:00:00:032
RESET
0.0 step-1
handle_receiver
handle_emitter

0:00:00:032~0:00:00:064
-1.546550598149922e-22 step-2
-1.546550598149922e-22 step-3
-1.546550598149922e-22 step-1
handle_receiver
handle_emitter

0:00:00:064~0:00:00:096
1.1115304692030285e-08 step-2
1.1115304692030285e-08 step-3
1.1115304692030285e-08 step-1
handle_receiver
handle_emitter

From my perspective, you will not get the next state in $t+3$. However, we do need to address this issue.

@KelvinYang0320
Copy link
Member

@shmily326 I have opened a PR to address that.
Could you check if the problem is solved?

git clone https://github.com/aidudezzz/deepbots.git
git checkout step_function
pip install -e .

@shmily326
Copy link
Author

@KelvinYang0320 Thank you for all of your time, I'm working on multi-agent RL (specifically a multi-UAV navigation scenario and Actor-Critic algorithms), thus I think the emitter-receiver scheme would be more appropriate, and I will check the "
step the controller after applying the action" method and get back to you as soon as possible.

@KelvinYang0320
Copy link
Member

KelvinYang0320 commented Dec 9, 2022

@shmily326 You can take a look at this PR for a multi-robot example.
Also, we have several examples in deepworlds.

@tsampazk
Copy link
Member

tsampazk commented Dec 9, 2022

I'm working on multi-agent RL

That sounds great! For multi-agent scenarios indeed it can be better to have a centralized supervisor that communicates with multiple robots, so you need to use the emitter-receiver scheme. When completed, if you want, we will be happy to include your scenario as an example on our deepworlds repository! 😄

@tsampazk tsampazk added refactor Refactoring existing code base bug Something isn't working and removed refactor Refactoring existing code base labels Dec 9, 2022
@tsampazk tsampazk added this to the Release 0.2.0 milestone Dec 9, 2022
@KelvinYang0320 KelvinYang0320 self-assigned this Dec 9, 2022
@KelvinYang0320 KelvinYang0320 reopened this Dec 9, 2022
@KelvinYang0320
Copy link
Member

@shmily326 You can get updated deepbots by

git clone https://github.com/aidudezzz/deepbots.git
cd ./deepbots
pip install -e .

We have merged the PR.

@KelvinYang0320
Copy link
Member

@shmily326 Just a reminder, you can pip install git+https://github.com/aidudezzz/deepbots.git for general use before we publish the next version of deepbots on PyPI.
We would like to close this issue. Feel free to open another issue or reopen it if needed. Also, we will be glad if you share your work or experience with us. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants