-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA error: invalid device function when compiling and running for amd gfx 1032 #4762
Comments
I get a similar error |
I also had similar error when running on my What solved the problem for me was also setting the environment variable So for me, the HSA_OVERRIDE_GFX_VERSION=9.0.0 make -j16 LLAMA_HIPBLAS=1 LLAMA_HIP_UMA=1 AMDGPU_TARGETS=gfx900 I honestly didn't think that this would work at all, but it certainly did! For me, though, since my iGPU lacks INT8 operators, performance was worse than just using CPU, but it did run on the iGPU (checked with Hope that this works for you too! My guess on why this hasn't been reported much
|
Thank you! This hint finally allowed me to run all 33 layers of Mixtral Q5_K_M on iGPU. Since it's an APU with shared ram, it can't compete with dGPUs, but the speedup is close to 70% nonetheless. CPU (7840u):
GPU (780m):
Strangely, prompt processing is slower on GPU. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
I have a 6700s amd gpu, 8gb vram. I got ooga to work on this computer, but I can't get llama.ccp to work. I compiled with
make clean && make -j16 LLAMA_HIPBLAS=1 AMDGPU_TARGETS=gxf1032
And everything went fine. However, when I try to run, I do export HSA_OVERRIDE_GFX_VERSION=10.3.0
then HIP_VISIBLE_DEVICES=0 ./main -ngl 50 -m /home/lenovoubuntu/Downloads/text-generation-webui-main/models/dolphin-2.6-mistral-7b-dpo.Q4_K_M.gguf -p "Write a function in TypeScript that sums numbers".
(I do HIP devices function since my devices has an igpu as well).
It returns .................................................................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: VRAM kv self = 64.00 MB
llama_new_context_with_model: KV self size = 64.00 MiB, K (f16): 32.00 MiB, V (f16): 32.00 MiB
llama_build_graph: non-view tensors processed: 676/676
llama_new_context_with_model: compute buffer total size = 76.19 MiB
llama_new_context_with_model: VRAM scratch buffer: 73.00 MiB
llama_new_context_with_model: total VRAM used: 4232.06 MiB (model: 4095.06 MiB, context: 137.00 MiB)
CUDA error: invalid device function
current device: 0, in function ggml_cuda_op_flatten at ggml-cuda.cu:7971
hipGetLastError()
GGML_ASSERT: ggml-cuda.cu:226: !"CUDA error"
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)
So, I ran it as as sudo, as it suggested using this command. sudo LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH HSA_OVERRIDE_GFX_VERSION=10.3.0 HIP_VISIBLE_DEVICES=0 ./main -ngl 50 -m /home/lenovoubuntu/Downloads/text-generation-webui-main/models/dolphin-2.6-mistral-7b-dpo.Q4_K_M.gguf -p "Write a function in TypeScript that sums numbers"
I used all of those environment variables since ooga required them, and I was hoping they would fix things here too.
However, that just returns this after seemingly loading the model.
CUDA error: invalid device function
current device: 0, in function ggml_cuda_op_flatten at ggml-cuda.cu:7971
hipGetLastError()
GGML_ASSERT: ggml-cuda.cu:226: !"CUDA error"
[New LWP 23593]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f34398ea42f in __GI___wait4 (pid=23599, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30 ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
#0 0x00007f34398ea42f in __GI___wait4 (pid=23599, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30 in ../sysdeps/unix/sysv/linux/wait4.c
#1 0x000055fb56cca7fb in ggml_print_backtrace ()
#2 0x000055fb56d90f95 in ggml_cuda_error(char const*, char const*, char const*, int, char const*) ()
#3 0x000055fb56d9da1e in ggml_cuda_op_flatten(ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void ()(ggml_tensor const, ggml_tensor const*, ggml_tensor*, float const*, float const*, float*, ihipStream_t*)) ()
#4 0x000055fb56d92df3 in ggml_cuda_compute_forward ()
#5 0x000055fb56cf8898 in ggml_graph_compute_thread ()
#6 0x000055fb56cfca98 in ggml_graph_compute ()
#7 0x000055fb56dbc41e in ggml_backend_cpu_graph_compute ()
#8 0x000055fb56dbcf0b in ggml_backend_graph_compute ()
#9 0x000055fb56d2b046 in llama_decode_internal(llama_context&, llama_batch) ()
#10 0x000055fb56d2bb63 in llama_decode ()
#11 0x000055fb56d66316 in llama_init_from_gpt_params(gpt_params&) ()
#12 0x000055fb56cbc31a in main ()
[Inferior 1 (process 23582) detached]
Aborted
The text was updated successfully, but these errors were encountered: