IO Binding general questions #20432
Unanswered
CarlosNacher
asked this question in
Performance Q&A
Replies: 1 comment 2 replies
-
I/O binding is helpful for both CPU provider and GPU provider especially when input/output tensor size is large (like hundred of MBs of key value cache for LLM). It is because IO binding could avoid data copy if using properly. If inputs/outputs are small (like a few KBs), you might not notice much difference. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am looking the API docs which talks about the "data on CPU" / "data on device" : https://onnxruntime.ai/docs/api/python/api_summary.html#data-inputs-and-outputs
And I have two doubts:
session.run
when you load your data to de CPU and the graph to the GPU (first example in here)? I mean, if you have to "bind"/transfer your inputs/outpus between CPU and GPU withio_binding.bind_input
andio_binding.bind_outout
respectively, is this not the same that using "general"session.run
which also performs this transfer? or if there is any difference, which is it?providers = ["CPUExecutionProvider]
? I would like to write my code in io binding form without regarding if my provider is CPU or GPU, for the sake of simplicity and generality.Thank you so much in advance!
Beta Was this translation helpful? Give feedback.
All reactions