-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doubt about the implement of the emitter-receiver scheme #119
Comments
@shmily326 Thank you for opening an issue! I will look into it and get back to you. |
Thanks @shmily326! You seem to be correct in your comment, we had several issues regarding emitters and receivers since the beginning, and there were some bugs in webots too (see cyberbotics/webots#1384, where multiple issues were fixed). As @KelvinYang0320 mentioned, we will look into it and incorporate required changes to make it work as close as possible to what is expected. Meanwhile, i would suggest using the RobotSupervisor scheme which uses the same controller both to control the robot and act as supervisor. Its usage is much more efficient and straightforward in cases where you don't specifically require separation between robot and supervisor. If you want you can share any additional information about your use-case, so we can discuss it further. |
Hi @shmily326 def step(self, action):
"""
The basic step method that steps the controller,
calls the method that sends the action through the emitter
and returns the (observations, reward, done, info) object.
:param action: Whatever the use-case uses as an action, e.g.
an integer representing discrete actions
:type action: Defined by the implementation of handle_emitter
:return: (observations, reward, done, info) as provided by the
corresponding methods as implemented for the use-case
"""
print(self.getFromDef("ROBOT").getPosition()[0], "step-1")
if super(Supervisor, self).step(self.timestep) == -1:
exit()
print(self.getFromDef("ROBOT").getPosition()[0], "step-2")
self.handle_emitter(action)
print(self.getFromDef("ROBOT").getPosition()[0], "step-3")
return (
self.get_observations(),
self.get_reward(action),
self.is_done(),
self.get_info(),
) def handle_emitter(self):
"""
This emitter uses the user-implemented create_message() method to get
whatever data the robot gathered, convert it to a string if needed and
then use the emitter to send the data in a string utf-8 encoding to the
supervisor.
"""
print("handle_emitter")
data = self.create_message()
... def handle_receiver(self):
"""
This receiver uses the basic Webots receiver-handling code. The
use_message_data() method should be implemented to actually use the
data received from the supervisor.
"""
print("handle_receiver")
if self.receiver.getQueueLength() > 0:
... you will get the following in cartPoleWorldEmitterReceiver on Webots 2023a: 0:00:00:032~0:00:00:064 0:00:00:064~0:00:00:096 From my perspective, you will not get the next state in |
@shmily326 I have opened a PR to address that.
|
@KelvinYang0320 Thank you for all of your time, I'm working on multi-agent RL (specifically a multi-UAV navigation scenario and Actor-Critic algorithms), thus I think the emitter-receiver scheme would be more appropriate, and I will check the " |
@shmily326 You can take a look at this PR for a multi-robot example. |
That sounds great! For multi-agent scenarios indeed it can be better to have a centralized supervisor that communicates with multiple robots, so you need to use the emitter-receiver scheme. When completed, if you want, we will be happy to include your scenario as an example on our deepworlds repository! 😄 |
@shmily326 You can get updated deepbots by
We have merged the PR. |
@shmily326 Just a reminder, you can |
Hi there, I'm deeply confused by the concrete communication process (timing) in the emitter-receiver scheme implemented in deepbots, since in Webots it takes one basic timestep to transmit and deliver the message from emitters to receivers, which means the action$a_{t}$ adopted by supervisor according to state $s_{t}$ will be delivered to robot in timeslot $t+1$ , and the new state(observation) caused by $a_{t}$ will be updated and emitted to supervisor in timeslot $t+2$ , which is finally presented in supervisor as $s^{\prime}$ in timeslot $t+3$ .
On the basis of the above insight, I find that the transitions saved for RL training in deepbots tutorials is somewhat like$(s_{t}$ , $a_{t}$ , $r_{t}$ , $s_{t+1})$ , but in fact, the action which acted on state $s_{t}$ (or the action which robot executed indeed) is somewhat like $a_{t-3}$ , there is a difference between $a_{t-3}$ and $a_{t}$ even though timestep is in the scale of millisecond.
To be honest, my question may not be too clear, I'm appreciated if someone could correct me or explain my doubt, thanks a lot!
My doubt is somewhat relative with this issue
The text was updated successfully, but these errors were encountered: