-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpc : copy tensors across servers #8032
base: master
Are you sure you want to change the base?
Conversation
Start a dedicated backend thread in the rpc-server and use message passing interface for submitting work to it. This will enable backend async operations and cross-server communication.
Add new cmd REMOTE_COPY_TENSOR for copying a tensor from one server to another.
GGML_CALL static bool ggml_backend_rpc_buffer_cpy_tensor(ggml_backend_buffer_t buffer, const ggml_tensor * src, ggml_tensor * dst) { | ||
// check if src and dst are on the same server | ||
ggml_backend_buffer_t src_buffer = src->buffer; | ||
ggml_backend_rpc_buffer_context * src_ctx = (ggml_backend_rpc_buffer_context *)src_buffer->context; | ||
ggml_backend_buffer_t dst_buffer = dst->buffer; | ||
ggml_backend_rpc_buffer_context * dst_ctx = (ggml_backend_rpc_buffer_context *)dst_buffer->context; | ||
if (src_ctx->sock != dst_ctx->sock) { | ||
return false; | ||
return remote_copy_tensor(src, dst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cpy_tensor
you can only assume that the dst
tensor is allocated in buffer
. The src
tensor may be allocated in any other buffer, including in a different buffer type from a different backend. You cannot assume that the type of src_buffer->context
is ggml_backend_rpc_buffer_context
because it may be a different buffer type, so you need to check for that.
memcpy(input.data(), &rpc_src, sizeof(rpc_src)); | ||
memcpy(input.data() + sizeof(rpc_src), &rpc_dst, sizeof(rpc_dst)); | ||
uint32_t dst_endpoint_size = dst_ctx->endpoint.size(); | ||
memcpy(input.data() + 2*sizeof(rpc_tensor), &dst_endpoint_size, sizeof(dst_endpoint_size)); | ||
memcpy(input.data() + 2*sizeof(rpc_tensor) + sizeof(dst_endpoint_size), dst_ctx->endpoint.c_str(), dst_endpoint_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of pattern is very quickly becoming unreadable, which makes the code very hard to review. My suggestion is to make structs for all the messages/commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, may be we can have a common rpc command base structure, include size, command enum, and then each command can derived from it, append its own parameters, just like this example:
enum rpc_cmd {
...
}
struct rpc_cmd_base {
uint32_t size:
uint32_t version; // version number for the rpc command
rpc_cmd cmd;
};
struct rpc_cmd_xx: rpc_cmd_base {
uint32_t param1;
...
};
This is an attempt to make copying tensors across servers more efficient. It introduces 2 new RPC commands:
HELLO
- send after establishing connection to identify the remote party (client or server)REMOTE_COPY_TENSOR
- send to the host which contains the source tensor along with the destination tensor and destination endpoint