IO Binding general questions #20432

CarlosNacher · 2024-04-23T15:08:44Z

CarlosNacher
Apr 23, 2024

Hi,

I am looking the API docs which talks about the "data on CPU" / "data on device" : https://onnxruntime.ai/docs/api/python/api_summary.html#data-inputs-and-outputs

And I have two doubts:

Does IO Binding helps with respect to use the general session.run when you load your data to de CPU and the graph to the GPU (first example in here)? I mean, if you have to "bind"/transfer your inputs/outpus between CPU and GPU with io_binding.bind_input and io_binding.bind_outout respectively, is this not the same that using "general" session.run which also performs this transfer? or if there is any difference, which is it?
Is it correct to use the IO Binding form even when providers = ["CPUExecutionProvider]? I would like to write my code in io binding form without regarding if my provider is CPU or GPU, for the sake of simplicity and generality.

Thank you so much in advance!

tianleiwu · 2024-05-06T05:48:26Z

tianleiwu
May 6, 2024
Collaborator

I/O binding is helpful for both CPU provider and GPU provider especially when input/output tensor size is large (like hundred of MBs of key value cache for LLM). It is because IO binding could avoid data copy if using properly.

If inputs/outputs are small (like a few KBs), you might not notice much difference.

2 replies

CarlosNacher May 6, 2024
Author

Mmm okay I understand… could you explain me / refer a link to any docs where this thing about “avoiding data copy” is explained in order to study it more in detail?
and on the other hand, about my first question, why is not the same using normal run (which loads in CPU and has to transfer to GPU and viceversa) than using IO binding? Is it also about the data copy? So, again, a explanation /docs about it would be really appreciated, thank you very much for helping me to undesrtand!

tianleiwu May 6, 2024
Collaborator

If you do not use I/O binding, the input numpy array might need copy to ORT tensor. There is no docs about it, but you can set a break point with debugger at the following function, and compare the case with/without IO/Binding:

onnxruntime/onnxruntime/python/onnxruntime_pybind_mlvalue.cc

Lines 537 to 595 in addcc4c

    
           static void CopyDataToTensor(PyArrayObject* darray, int npy_type, Tensor& tensor, 
        
                                        MemCpyFunc mem_cpy_to_device = CpuToCpuMemCpy) { 
        
             const auto total_items = tensor.Shape().Size(); 
        
             if (npy_type == NPY_UNICODE) { 
        
               // Copy string data which needs to be done after Tensor is allocated. 
        
               // Strings are Python strings or numpy.unicode string. 
        
               std::string* dst = tensor.MutableData<std::string>(); 
        
               const auto item_size = PyArray_ITEMSIZE(darray); 
        
               const auto num_chars = item_size / PyUnicode_4BYTE_KIND; 
        
               const char* src = reinterpret_cast<const char*>(PyArray_DATA(darray)); 
        
               for (int i = 0; i < total_items; i++, src += item_size) { 
        
                 // Python unicode strings are assumed to be USC-4. Strings are stored as UTF-8. 
        
                 PyObject* pStr = PyUnicode_FromKindAndData(PyUnicode_4BYTE_KIND, src, num_chars); 
        
                 UniqueDecRefPtr<PyObject> strGuard(pStr, DecRefFn<PyObject>()); 
        
                 const char* str = PyUnicode_AsUTF8(pStr); 
        
                 if (str == NULL) { 
        
                   dst[i].clear(); 
        
                 } else { 
        
                   // Size is equal to the longest string size, numpy stores 
        
                   // strings in a single array. 
        
                   dst[i] = str; 
        
                 } 
        
               } 
        
             } else if (npy_type == NPY_STRING || npy_type == NPY_VOID) { 
        
               // Copy string data which needs to be done after Tensor is allocated. 
        
               // Strings are given as bytes (encoded strings). 
        
               // NPY_VOID does not trim final 0. 
        
               // NPY_STRING assumes bytes string ends with a final 0. 
        
               std::string* dst = tensor.MutableData<std::string>(); 
        
               const auto item_size = PyArray_ITEMSIZE(darray); 
        
               const char* src = reinterpret_cast<const char*>(PyArray_DATA(darray)); 
        
               for (int i = 0; i < total_items; i++, src += item_size) { 
        
                 if (npy_type == NPY_STRING) { 
        
                   dst[i] = src; 
        
                 } else { 
        
                   dst[i].assign(src, item_size); 
        
                 } 
        
               } 
        
             } else if (npy_type == NPY_OBJECT) { 
        
               // Converts object into string. 
        
               std::string* dst = tensor.MutableData<std::string>(); 
        
               const auto item_size = PyArray_ITEMSIZE(darray); 
        
               const char* src = reinterpret_cast<const char*>(PyArray_DATA(darray)); 
        
               for (int i = 0; i < total_items; ++i, src += item_size) { 
        
                 // Python unicode strings are assumed to be USC-4. Strings are stored as UTF-8. 
        
                 PyObject* item = PyArray_GETITEM(darray, src); 
        
                 PyObject* pStr = PyObject_Str(item); 
        
                 UniqueDecRefPtr<PyObject> strGuard(pStr, DecRefFn<PyObject>()); 
        
                 dst[i] = py::reinterpret_borrow<py::str>(pStr); 
        
               } 
        
             } else { 
        
               void* buffer = tensor.MutableDataRaw(); 
        
               size_t len; 
        
               if (!IAllocator::CalcMemSizeForArray(tensor.DataType()->Size(), tensor.Shape().Size(), &len)) { 
        
                 throw std::runtime_error("length overflow"); 
        
               } 
        
               mem_cpy_to_device(buffer, PyArray_DATA(darray), len); 
        
             } 
        
           }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IO Binding general questions #20432

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

IO Binding general questions #20432

CarlosNacher Apr 23, 2024

Replies: 1 comment · 2 replies

tianleiwu May 6, 2024 Collaborator

CarlosNacher May 6, 2024 Author

tianleiwu May 6, 2024 Collaborator

CarlosNacher
Apr 23, 2024

Replies: 1 comment 2 replies

tianleiwu
May 6, 2024
Collaborator

CarlosNacher May 6, 2024
Author

tianleiwu May 6, 2024
Collaborator