Undefined identifiers in all_reduce.cu #9

apaszke · 2016-01-24T13:54:49Z

Hello!

I'm trying to compile nccl, but I'm getting the following errors:

Compiling src/all_reduce.cu         > build/obj/all_reduce.o   
src/reduce_kernel.h(199): error: identifier "__half22float2" is undefined

src/reduce_kernel.h(203): error: identifier "__float22half2_rn" is undefined

src/reduce_kernel.h(214): error: identifier "__half22float2" is undefined

src/reduce_kernel.h(218): error: identifier "__float22half2_rn" is undefined

src/reduce_kernel.h(229): error: identifier "__half22float2" is undefined

src/reduce_kernel.h(233): error: identifier "__float22half2_rn" is undefined

src/reduce_kernel.h(248): error: identifier "__half22float2" is undefined

src/reduce_kernel.h(252): error: identifier "__float22half2_rn" is undefined

8 errors detected in the compilation of "/tmp/tmpxft_000004bb_00000000-13_all_reduce.compute_52.cpp1.ii".
make: *** [build/obj/all_reduce.o] Error 2

I have a TITAN X and cuda-7.5 installed. I ran make CUDA_HOME=/usr/local/cuda-7.5 test to build the library.

Do you have any idea why does it fail? I've seen that these identifiers are defined in /usr/local/cuda-7.5/includes/cuda_fp16.h, but it's not included in reduce_kernel.h. Also, they are guarded by a check __CUDA_ARCH__ >= 530, but my GPU has capability 5.2. Since TITAN is a Maxwell card, then it should be supported, right?

The text was updated successfully, but these errors were encountered:

nluehr · 2016-01-24T22:14:23Z

Your build line is correct. As for cuda_fp16.h, it gets included from nccl.h (which is in turn included in core.h) whenever CUDART_VERSION >= 7050.

I'm a bit confused about the include guards you reference in reduce_kernel.h. These should select either Maxwell (CUDA_ARCH>= 500) or Kepler (CUDA_ARCH >= 300 && CUDA_ARCH < 500). There shouldn't be any reference to a CUDA_ARCH 530.

cliffwoolley · 2016-01-24T22:36:55Z

Do you have more than one CUDA Toolkit version installed? Any chance you
have an RC rather than production release version of CUDA 7.5?
On Jan 24, 2016 5:14 PM, "Nathan Luehr" [email protected] wrote:

Your build line is correct. As for cuda_fp16.h, it gets included from
nccl.h (which is in turn included in core.h) whenever CUDART_VERSION >=
7050.

I'm a bit confused about the include guards you reference in
reduce_kernel.h. These should select either Maxwell (CUDA_ARCH>= 500)
or Kepler (CUDA_ARCH >= 300 && CUDA_ARCH < 500). There shouldn't be
any reference to a CUDA_ARCH 530.

—
Reply to this email directly or view it on GitHub
#9 (comment).

apaszke · 2016-01-25T11:51:26Z

That's wierd. I think I should have a production release. When I print contents of /usr/local/cuda/version.txt and /usr/local/cuda-7.5/version they both say CUDA Version 7.5.7.

This is what I've found in cuda_fp16.h:

#if __CUDA_ARCH__ >= 530 || !defined(__CUDA_ARCH__)                                 
__CUDA_FP16_DECL__ __half2 __heq2(const __half2 a, const __half2 b);                
__CUDA_FP16_DECL__ __half2 __hne2(const __half2 a, const __half2 b);                
__CUDA_FP16_DECL__ __half2 __hle2(const __half2 a, const __half2 b);                
__CUDA_FP16_DECL__ __half2 __hge2(const __half2 a, const __half2 b);                
__CUDA_FP16_DECL__ __half2 __hlt2(const __half2 a, const __half2 b);                
__CUDA_FP16_DECL__ __half2 __hgt2(const __half2 a, const __half2 b);                
__CUDA_FP16_DECL__ __half2 __hequ2(const __half2 a, const __half2 b);               
__CUDA_FP16_DECL__ __half2 __hneu2(const __half2 a, const __half2 b);               
__CUDA_FP16_DECL__ __half2 __hleu2(const __half2 a, const __half2 b);               
__CUDA_FP16_DECL__ __half2 __hgeu2(const __half2 a, const __half2 b);               
__CUDA_FP16_DECL__ __half2 __hltu2(const __half2 a, const __half2 b);               
__CUDA_FP16_DECL__ __half2 __hgtu2(const __half2 a, const __half2 b);               
__CUDA_FP16_DECL__ __half2 __hadd2(const __half2 a, const __half2 b);               
__CUDA_FP16_DECL__ __half2 __hsub2(const __half2 a, const __half2 b);               
__CUDA_FP16_DECL__ __half2 __hmul2(const __half2 a, const __half2 b);               
__CUDA_FP16_DECL__ __half2 __hadd2_sat(const __half2 a, const __half2 b);           
__CUDA_FP16_DECL__ __half2 __hsub2_sat(const __half2 a, const __half2 b);           
__CUDA_FP16_DECL__ __half2 __hmul2_sat(const __half2 a, const __half2 b);           
__CUDA_FP16_DECL__ __half2 __hfma2(const __half2 a, const __half2 b, const __half2 c); 
__CUDA_FP16_DECL__ __half2 __hfma2_sat(const __half2 a, const __half2 b, const __half2 c); 
__CUDA_FP16_DECL__ __half __hadd(const __half a, const __half b);                   
__CUDA_FP16_DECL__ __half __hsub(const __half a, const __half b);                   
__CUDA_FP16_DECL__ __half __hmul(const __half a, const __half b);                   
__CUDA_FP16_DECL__ __half __hadd_sat(const __half a, const __half b);               
__CUDA_FP16_DECL__ __half __hsub_sat(const __half a, const __half b);               
__CUDA_FP16_DECL__ __half __hmul_sat(const __half a, const __half b);               
__CUDA_FP16_DECL__ __half __hfma(const __half a, const __half b, const __half c); 
__CUDA_FP16_DECL__ __half __hfma_sat(const __half a, const __half b, const __half c); 
__CUDA_FP16_DECL__ float __low2float(const __half2 l);                              
__CUDA_FP16_DECL__ float __high2float(const __half2 l);                             
__CUDA_FP16_DECL__ float2 __half22float2(const __half2 l);

You can see that one of the missing identifiers in on the last line of this snippet. It only appears in a it's forward declaration and it's definition is that file.

cliffwoolley · 2016-01-25T13:56:00Z

7.5.7 was the release candidate build, actually. The production release
build was numbered 7.5.18.

Please redownload and reinstall CUDA 7.5; it should work with the 7.5.18
build.

Thanks!
On Jan 25, 2016 6:51 AM, "Adam Paszke" [email protected] wrote:

That's wierd. I think I should have a production release. When I print
contents of /usr/local/cuda/version.txt and /usr/local/cuda-7.5/version
they both say CUDA Version 7.5.7.

This is what I've found in cuda_fp16.h:

#if CUDA_ARCH >= 530 || !defined(CUDA_ARCH)
CUDA_FP16_DECL half2 __heq2(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hne2(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hle2(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hge2(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hlt2(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hgt2(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hequ2(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hneu2(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hleu2(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hgeu2(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hltu2(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hgtu2(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hadd2(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hsub2(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hmul2(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hadd2_sat(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hsub2_sat(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hmul2_sat(const __half2 a, const __half2 b);
__CUDA_FP16_DECL half2 __hfma2(const __half2 a, const __half2 b, const __half2 c);
__CUDA_FP16_DECL half2 __hfma2_sat(const __half2 a, const __half2 b, const __half2 c);
__CUDA_FP16_DECL half __hadd(const __half a, const __half b);
__CUDA_FP16_DECL half __hsub(const __half a, const __half b);
__CUDA_FP16_DECL half __hmul(const __half a, const __half b);
__CUDA_FP16_DECL half __hadd_sat(const __half a, const __half b);
__CUDA_FP16_DECL half __hsub_sat(const __half a, const __half b);
__CUDA_FP16_DECL half __hmul_sat(const __half a, const __half b);
__CUDA_FP16_DECL half __hfma(const __half a, const __half b, const __half c);
__CUDA_FP16_DECL half __hfma_sat(const __half a, const __half b, const __half c);
__CUDA_FP16_DECL float low2float(const __half2 l);
__CUDA_FP16_DECL float high2float(const __half2 l);
__CUDA_FP16_DECL float2 __half22float2(const __half2 l);

You can see that one of the missing identifiers in on the last line of
this snippet. It only appears in a it's forward declaration and it's
definition is that file.

—
Reply to this email directly or view it on GitHub
#9 (comment).

apaszke · 2016-01-25T13:57:17Z

I'll try it. Thanks for help! 😊

apaszke · 2016-01-25T22:57:22Z

Yes, that worked. Thanks a lot!

apaszke closed this as completed Jan 25, 2016

fmana mentioned this issue Sep 22, 2016

NCCL allreduce hangs when cudaFreeHost #48

Closed

hpjeonGIT mentioned this issue Oct 28, 2017

nccl all_reduce_test hangs #117

Closed

weberxie mentioned this issue Sep 30, 2020

NCCL hang issue #394

Closed

woensug-choi mentioned this issue Dec 7, 2020

New ray-based multibeam sonar implementation Field-Robotics-Lab/nps_uw_sensors_gazebo#25

Merged

7 tasks

zhouzaida mentioned this issue May 14, 2022

[Fix] Fix bbox overlap fp16 open-mmlab/mmcv#1958

Merged

7 tasks

himanshucodz55 mentioned this issue Jul 25, 2022

RuntimeError: [1] is setting up NCCL communicator and retreiving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got error: Timeout waiting for key: default_pg/0/0 after 1800000 ms #708

Open

xw285cornell mentioned this issue Nov 15, 2022

NCCL Hang with CUDA_LAUNCH_BLOCKING=1 #750

Closed

junior-zsy mentioned this issue Jun 29, 2023

FasterTransformer NcclAllReduceSum with 4 GPUs hangs #901

Closed

raninbowlalala mentioned this issue Jul 4, 2023

2 allreduce and a allgather hang in multi-node #899

Open

dbfancier mentioned this issue Jul 14, 2023

nccl-test hung and tcp socket failed sometimes #914

Closed

acphile mentioned this issue Sep 29, 2023

Question about ncclCommAbort stuck issue #1013

Open

yanminjia mentioned this issue Nov 27, 2023

NCCL Crashes when do NET initialization #1091

Open

alexander-zinoviev pushed a commit to alexander-zinoviev/nccl that referenced this issue Nov 7, 2024

Fix truncated IP address printout in ncclSocketToString (NVIDIA#9)

99554d1

jinhao2 mentioned this issue Nov 12, 2024

How does nccl regtister the memory region by ibv_reg_xxxx ? #1512

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Undefined identifiers in all_reduce.cu #9

Undefined identifiers in all_reduce.cu #9

apaszke commented Jan 24, 2016

nluehr commented Jan 24, 2016

cliffwoolley commented Jan 24, 2016

apaszke commented Jan 25, 2016

cliffwoolley commented Jan 25, 2016

apaszke commented Jan 25, 2016

apaszke commented Jan 25, 2016

Undefined identifiers in all_reduce.cu #9

Undefined identifiers in all_reduce.cu #9

Comments

apaszke commented Jan 24, 2016

nluehr commented Jan 24, 2016

cliffwoolley commented Jan 24, 2016

apaszke commented Jan 25, 2016

cliffwoolley commented Jan 25, 2016

apaszke commented Jan 25, 2016

apaszke commented Jan 25, 2016