rpc : copy tensors across servers #8032

rgerganov · 2024-06-20T10:37:34Z

This is an attempt to make copying tensors across servers more efficient. It introduces 2 new RPC commands:

HELLO - send after establishing connection to identify the remote party (client or server)
REMOTE_COPY_TENSOR - send to the host which contains the source tensor along with the destination tensor and destination endpoint

sequenceDiagram
    Note over Scheduler: Copy X on Server A to Y on Server B
    Scheduler->>Server A: REMOTE_COPY_TENSOR
    Server A->>Server B: HELLO
    Server A->>Server B: SET_TENSOR
    Server B-->>Server A: 
    Server A-->>Scheduler:

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Start a dedicated backend thread in the rpc-server and use message passing interface for submitting work to it. This will enable backend async operations and cross-server communication.

Add new cmd REMOTE_COPY_TENSOR for copying a tensor from one server to another.

slaren · 2024-06-21T12:18:53Z

ggml-rpc.cpp

 GGML_CALL static bool ggml_backend_rpc_buffer_cpy_tensor(ggml_backend_buffer_t buffer, const ggml_tensor * src, ggml_tensor * dst) {
    // check if src and dst are on the same server
    ggml_backend_buffer_t src_buffer = src->buffer;
    ggml_backend_rpc_buffer_context * src_ctx = (ggml_backend_rpc_buffer_context *)src_buffer->context;
    ggml_backend_buffer_t dst_buffer = dst->buffer;
    ggml_backend_rpc_buffer_context * dst_ctx = (ggml_backend_rpc_buffer_context *)dst_buffer->context;
    if (src_ctx->sock != dst_ctx->sock) {
-        return false;
+        return remote_copy_tensor(src, dst);


In cpy_tensor you can only assume that the dst tensor is allocated in buffer. The src tensor may be allocated in any other buffer, including in a different buffer type from a different backend. You cannot assume that the type of src_buffer->context is ggml_backend_rpc_buffer_context because it may be a different buffer type, so you need to check for that.

slaren · 2024-06-21T12:21:31Z

ggml-rpc.cpp

+    memcpy(input.data(), &rpc_src, sizeof(rpc_src));
+    memcpy(input.data() + sizeof(rpc_src), &rpc_dst, sizeof(rpc_dst));
+    uint32_t dst_endpoint_size = dst_ctx->endpoint.size();
+    memcpy(input.data() + 2*sizeof(rpc_tensor), &dst_endpoint_size, sizeof(dst_endpoint_size));
+    memcpy(input.data() + 2*sizeof(rpc_tensor) + sizeof(dst_endpoint_size), dst_ctx->endpoint.c_str(), dst_endpoint_size);


This kind of pattern is very quickly becoming unreadable, which makes the code very hard to review. My suggestion is to make structs for all the messages/commands.

Agree, may be we can have a common rpc command base structure, include size, command enum, and then each command can derived from it, append its own parameters, just like this example:

enum rpc_cmd { ... } struct rpc_cmd_base { uint32_t size: uint32_t version; // version number for the rpc command rpc_cmd cmd; }; struct rpc_cmd_xx: rpc_cmd_base { uint32_t param1; ... };

rgerganov added 2 commits June 20, 2024 12:53

rpc : enable async operations

d47e137

Start a dedicated backend thread in the rpc-server and use message passing interface for submitting work to it. This will enable backend async operations and cross-server communication.

rpc : copy tensors across servers

005cf2e

Add new cmd REMOTE_COPY_TENSOR for copying a tensor from one server to another.

rgerganov mentioned this pull request Jun 20, 2024

rpc : enable async operations #7915

Open

4 tasks

slaren reviewed Jun 21, 2024

View reviewed changes

mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rpc : copy tensors across servers #8032

rpc : copy tensors across servers #8032

rgerganov commented Jun 20, 2024

slaren Jun 21, 2024 •

edited

Loading

slaren Jun 21, 2024 •

edited

Loading

chraac Jun 22, 2024 •

edited

Loading

rpc : copy tensors across servers #8032

Are you sure you want to change the base?

rpc : copy tensors across servers #8032

Conversation

rgerganov commented Jun 20, 2024

slaren Jun 21, 2024 • edited Loading

Choose a reason for hiding this comment

slaren Jun 21, 2024 • edited Loading

Choose a reason for hiding this comment

chraac Jun 22, 2024 • edited Loading

Choose a reason for hiding this comment

slaren Jun 21, 2024 •

edited

Loading

slaren Jun 21, 2024 •

edited

Loading

chraac Jun 22, 2024 •

edited

Loading