Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc : copy tensors across servers #8032

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rgerganov
Copy link
Collaborator

This is an attempt to make copying tensors across servers more efficient. It introduces 2 new RPC commands:

  • HELLO - send after establishing connection to identify the remote party (client or server)
  • REMOTE_COPY_TENSOR - send to the host which contains the source tensor along with the destination tensor and destination endpoint
sequenceDiagram
    Note over Scheduler: Copy X on Server A to Y on Server B
    Scheduler->>Server A: REMOTE_COPY_TENSOR
    Server A->>Server B: HELLO
    Server A->>Server B: SET_TENSOR
    Server B-->>Server A: 
    Server A-->>Scheduler: 
Loading

Start a dedicated backend thread in the rpc-server and use message
passing interface for submitting work to it. This will enable backend
async operations and cross-server communication.
Add new cmd REMOTE_COPY_TENSOR for copying a tensor from one server to
another.
@rgerganov rgerganov mentioned this pull request Jun 20, 2024
4 tasks
GGML_CALL static bool ggml_backend_rpc_buffer_cpy_tensor(ggml_backend_buffer_t buffer, const ggml_tensor * src, ggml_tensor * dst) {
// check if src and dst are on the same server
ggml_backend_buffer_t src_buffer = src->buffer;
ggml_backend_rpc_buffer_context * src_ctx = (ggml_backend_rpc_buffer_context *)src_buffer->context;
ggml_backend_buffer_t dst_buffer = dst->buffer;
ggml_backend_rpc_buffer_context * dst_ctx = (ggml_backend_rpc_buffer_context *)dst_buffer->context;
if (src_ctx->sock != dst_ctx->sock) {
return false;
return remote_copy_tensor(src, dst);
Copy link
Collaborator

@slaren slaren Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cpy_tensor you can only assume that the dst tensor is allocated in buffer. The src tensor may be allocated in any other buffer, including in a different buffer type from a different backend. You cannot assume that the type of src_buffer->context is ggml_backend_rpc_buffer_context because it may be a different buffer type, so you need to check for that.

Comment on lines +458 to +462
memcpy(input.data(), &rpc_src, sizeof(rpc_src));
memcpy(input.data() + sizeof(rpc_src), &rpc_dst, sizeof(rpc_dst));
uint32_t dst_endpoint_size = dst_ctx->endpoint.size();
memcpy(input.data() + 2*sizeof(rpc_tensor), &dst_endpoint_size, sizeof(dst_endpoint_size));
memcpy(input.data() + 2*sizeof(rpc_tensor) + sizeof(dst_endpoint_size), dst_ctx->endpoint.c_str(), dst_endpoint_size);
Copy link
Collaborator

@slaren slaren Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of pattern is very quickly becoming unreadable, which makes the code very hard to review. My suggestion is to make structs for all the messages/commands.

Copy link

@chraac chraac Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, may be we can have a common rpc command base structure, include size, command enum, and then each command can derived from it, append its own parameters, just like this example:

enum rpc_cmd {
    ...
}

struct rpc_cmd_base {
    uint32_t size:
    uint32_t version;  // version number for the rpc command
    rpc_cmd cmd;
};

struct rpc_cmd_xx: rpc_cmd_base  {
    uint32_t param1;
    ...
};

@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants