Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation Error. #3

Open
sashank-tirumala opened this issue May 12, 2023 · 18 comments
Open

Compilation Error. #3

sashank-tirumala opened this issue May 12, 2023 · 18 comments

Comments

@sashank-tirumala
Copy link

sashank-tirumala commented May 12, 2023

I am on Ubuntu 22.04 NVIDIA 3090 System and have successfully compile PyFlex in the past multiple times. When running compile.sh I get the following error:

-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda (found suitable version "9.2", minimum required is "9.0") 
-- Found PythonInterp: /home/sashank/miniconda3/envs/cloth-funnels/bin/python (found suitable version "3.9.6", minimum required is "3.9") 
-- Found PythonLibs: /home/sashank/miniconda3/envs/cloth-funnels/lib/libpython3.9.so
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- Found pybind11: /home/sashank/miniconda3/envs/cloth-funnels/include (found version "2.7.1" )
-- Configuring done
-- Generating done
-- Build files have been written to: /workspace/cloth-funnels/PyFlex/bindings/build
[  5%] Building CXX object CMakeFiles/pyflex.dir/pyflex.cpp.o
[ 10%] Building CXX object CMakeFiles/pyflex.dir/shadersDemoContext.cpp.o
[ 15%] Building CXX object CMakeFiles/pyflex.dir/imgui.cpp.o
[ 21%] Building CXX object CMakeFiles/pyflex.dir/workspace/cloth-funnels/PyFlex/core/aabbtree.cpp.o
[ 36%] Building CXX object CMakeFiles/pyflex.dir/workspace/cloth-funnels/PyFlex/core/maths.cpp.o
[ 36%] Building CXX object CMakeFiles/pyflex.dir/workspace/cloth-funnels/PyFlex/core/extrude.cpp.o
[ 36%] Building CXX object CMakeFiles/pyflex.dir/workspace/cloth-funnels/PyFlex/core/core.cpp.o
[ 42%] Building CXX object CMakeFiles/pyflex.dir/workspace/cloth-funnels/PyFlex/core/mesh.cpp.o
make[2]: *** No rule to make target '/workspace/cloth-funnels/PyFlex/lib/linux64/NvFlexExtReleaseCUDA_x64.a', needed by 'pyflex.cpython-39-x86_64-linux-gnu.so'.  Stop.
make[2]: *** Waiting for unfinished jobs....
[ 47%] Building CXX object CMakeFiles/pyflex.dir/workspace/cloth-funnels/PyFlex/core/perlin.cpp.o
[ 52%] Building CXX object CMakeFiles/pyflex.dir/workspace/cloth-funnels/PyFlex/core/pfm.cpp.o
[ 57%] Building CXX object CMakeFiles/pyflex.dir/workspace/cloth-funnels/PyFlex/core/platform.cpp.o
[ 63%] Building CXX object CMakeFiles/pyflex.dir/workspace/cloth-funnels/PyFlex/core/png.cpp.o
[ 68%] Building CXX object CMakeFiles/pyflex.dir/workspace/cloth-funnels/PyFlex/core/sdf.cpp.o
[ 73%] Building CXX object CMakeFiles/pyflex.dir/workspace/cloth-funnels/PyFlex/core/tga.cpp.o
[ 78%] Building CXX object CMakeFiles/pyflex.dir/workspace/cloth-funnels/PyFlex/core/voxelize.cpp.o
[ 84%] Building CXX object CMakeFiles/pyflex.dir/opengl/imguiRenderGL.cpp.o
[ 94%] Building CXX object CMakeFiles/pyflex.dir/opengl/shader.cpp.o
[ 94%] Building CXX object CMakeFiles/pyflex.dir/opengl/shadersGL.cpp.o
In file included from /workspace/cloth-funnels/PyFlex/bindings/opengl/imguiRenderGL.cpp:38:0:
/workspace/cloth-funnels/PyFlex/bindings/opengl/../stb_truetype.h:422:33: warning: multi-line comment [-Wcomment]
 #ifndef stbtt_vertex            // you can predefine this to use different values \
                                 ^
/workspace/cloth-funnels/PyFlex/bindings/opengl/shadersGL.cpp: In function 'GLuint LoadTexture(const char*)':
/workspace/cloth-funnels/PyFlex/bindings/opengl/shadersGL.cpp:217:10: warning: converting to non-pointer type 'GLuint {aka unsigned int}' from NULL [-Wconversion-null]
   return NULL;
          ^~~~
/workspace/cloth-funnels/PyFlex/bindings/opengl/shadersGL.cpp: In function 'void InitRenderHeadless(const RenderInitOptions&, int, int)':
/workspace/cloth-funnels/PyFlex/bindings/opengl/shadersGL.cpp:3408:25: warning: invalid conversion from 'EGLConfig {aka void*}' to 'void**' [-fpermissive]
  g_eglConfig = configs[0];
                ~~~~~~~~~^
/workspace/cloth-funnels/PyFlex/bindings/opengl/shadersGL.cpp:3411:33: warning: invalid conversion from 'EGLContext {aka void*}' to 'void**' [-fpermissive]
  g_eglContext = eglCreateContext(
                 ~~~~~~~~~~~~~~~~^
   g_eglDisplay,
   ~~~~~~~~~~~~~                  
   g_eglConfig,
   ~~~~~~~~~~~~                   
   EGL_NO_CONTEXT,
   ~~~~~~~~~~~~~~~                
   NULL);
   ~~~~~                          
/workspace/cloth-funnels/PyFlex/bindings/opengl/shadersGL.cpp:3419:40: warning: invalid conversion from 'EGLSurface {aka void*}' to 'void**' [-fpermissive]
  g_eglSurface = eglCreatePbufferSurface(g_eglDisplay, g_eglConfig,
                 ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
              eglPBAttribs);
              ~~~~~~~~~~~~~              
/workspace/cloth-funnels/PyFlex/bindings/opengl/shadersGL.cpp: At global scope:
/workspace/cloth-funnels/PyFlex/bindings/opengl/shadersGL.cpp:253:14: warning: '{anonymous}::g_eglDisplay' defined but not used [-Wunused-variable]
  EGLDisplay *g_eglDisplay;
              ^~~~~~~~~~~~
In file included from /workspace/cloth-funnels/PyFlex/bindings/pyflex.cpp:1:0:
/workspace/cloth-funnels/PyFlex/bindings/main.cpp: In function 'int GetKeyFromGameControllerButton(SDL_GameControllerButton)':
/workspace/cloth-funnels/PyFlex/bindings/main.cpp:97:5: warning: case value '17' not in enumerated type 'SDL_GameControllerButton' [-Wswitch]
     case SDL_CONTROLLER_BUTTON_RIGHT_TRIGGER:
     ^~~~
In file included from /workspace/cloth-funnels/PyFlex/bindings/scenes.h:30:0,
                 from /workspace/cloth-funnels/PyFlex/bindings/main.cpp:608,
                 from /workspace/cloth-funnels/PyFlex/bindings/pyflex.cpp:1:
/workspace/cloth-funnels/PyFlex/bindings/softgym_scenes/softgym_cloth.h: In member function 'virtual void SoftgymCloth::Initialize(pybind11::dict)':
/workspace/cloth-funnels/PyFlex/bindings/softgym_scenes/softgym_cloth.h:95:17: warning: unused variable 'size' [-Wunused-variable]
             int size = g_buffers->triangles.size();
                 ^~~~
/workspace/cloth-funnels/PyFlex/bindings/pyflex.cpp: In function 'std::tuple<pybind11::array_t<unsigned char, 16>, pybind11::array_t<float, 16>, pybind11::array_t<float, 16> > pyflex_render(bool)':
/workspace/cloth-funnels/PyFlex/bindings/pyflex.cpp:980:9: warning: unused variable 'newScene' [-Wunused-variable]
     int newScene = DoUI();
         ^~~~~~~~
CMakeFiles/Makefile2:82: recipe for target 'CMakeFiles/pyflex.dir/all' failed
make[1]: *** [CMakeFiles/pyflex.dir/all] Error 2
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2

This is inside the docker container you guys provided. I think the relevant line to look at is:
make[2]: *** No rule to make target '/workspace/cloth-funnels/PyFlex/lib/linux64/NvFlexExtReleaseCUDA_x64.a', needed by 'pyflex.cpython-39-x86_64-linux-gnu.so'. Stop.
Seems like I am missing a makefile. Any idea how to fix this?
Thanks

@jillhan1219
Copy link

Same problem. I am on the WSL Ubuntu 18.04 NVIDIA 4090 system, same error under same conditions.

have successfully compile PyFlex in the past multiple times. When running compile.sh I get the following error

Also looking for ideas to fix it, thanks a lot.

@tlpss
Copy link

tlpss commented Jun 19, 2023

I encountered the same issue.

When comparing against the flingbot codebase (on which this codebase is based) I found that Pyflex/lib is missing and Pyflex/external is incomplete. My guess is this is a bug caused by the default .gitignore file that ignores all lib/ folders (see here)

Manually copying the lib/ and external/ folders into this codebase resolved the build error, although I am not 100% sure it was okay to just copy them, in case any modifications were made. Maybe @alpercanberk could confirm this.

@alpercanberk
Copy link
Collaborator

@tlpss no major modifications were made, so copying them over should be fine. let me know if you run into any related issues.

@tlpss
Copy link

tlpss commented Jun 26, 2023

@tlpss no major modifications were made, so copying them over should be fine. let me know if you run into any related issues.

Can confirm that copying the files worked fine for building the simulator.

I encountered a few other issues during the installation of the codebase, but got the task generation to work in the end. Haven't tried training/running the models.

@zcswdt
Copy link

zcswdt commented Aug 16, 2023

我遇到了同样的问题。

当与 flingbot 代码库(该代码库所基于的)进行比较时,我发现它 Pyflex/lib并不Pyflex/external完整。.gitignore我的猜测是这是由忽略所有文件夹的默认文件引起的错误lib/(请参见此处

lib/手动将和文件夹复制external/到此代码库中解决了构建错误,尽管我不能 100% 确定复制它们是否可以,进行任何修改。@alpercanberk 可以证实这一点。

I also encountered the same problem. My system is ubuntu18.04. Can you tell me where to manually copy the lib and external folders?

@zcswdt
Copy link

zcswdt commented Aug 16, 2023

king for ideas to fix it, tha

Have you solved this problem yet?

@alpercanberk
Copy link
Collaborator

I think it should be from Flingbot https://github.com/columbia-ai-robotics/flingbot, but just to make sure @sashank-tirumala could you point us to where you copied the files from?

@tlpss
Copy link

tlpss commented Aug 16, 2023

我遇到了同样的问题。
当与 flingbot 代码库(该代码库所基于的)进行比较时,我发现它 Pyflex/lib并不Pyflex/external完整。.gitignore我的猜测是这是由忽略所有文件夹的默认文件引起的错误lib/(请参见此处
lib/手动将和文件夹复制external/到此代码库中解决了构建错误,尽管我不能 100% 确定复制它们是否可以,进行任何修改。@alpercanberk 可以证实这一点。

I also encountered the same problem. My system is ubuntu18.04. Can you tell me where to manually copy the lib and external folders?

You should take a look at the PyFlex folder from the flingbot codebase here.

That worked for me to compile pyflex and make some modifications to the bindings.

@zcswdt
Copy link

zcswdt commented Aug 17, 2023

我遇到了同样的问题。
当与 flingbot 代码库(该代码库所基于的)进行比较时,我发现它 Pyflex/lib并不Pyflex/external完整。.gitignore我的猜测是这是由忽略所有文件夹的默认文件引起的错误lib/(请参见此处
lib/手动将和文件夹复制external/到此代码库中解决了构建错误,尽管我不能 100% 确定复制它们是否可以,进行任何修改。@alpercanberk 可以证实这一点。

I also encountered the same problem. My system is ubuntu18.04. Can you tell me where to manually copy the lib and external folders?

You should take a look at the PyFlex folder from the flingbot codebase here.

That worked for me to compile pyflex and make some modifications to the bindings.

Thank you very much for your reply. I have successfully compiled according to your guidance, but there are still some errors reported during the evaluation and training. Have you run through the author's training and evaluation code?

@zcswdt
Copy link

zcswdt commented Aug 17, 2023

I think it should be from Flingbot https://github.com/columbia-ai-robotics/flingbot, but just to make sure @sashank-tirumala could you point us to where you copied the files from?

Thank you for your guidance

@tlpss
Copy link

tlpss commented Aug 18, 2023

我遇到了同样的问题。
当与 flingbot 代码库(该代码库所基于的)进行比较时,我发现它 Pyflex/lib并不Pyflex/external完整。.gitignore我的猜测是这是由忽略所有文件夹的默认文件引起的错误lib/(请参见此处
lib/手动将和文件夹复制external/到此代码库中解决了构建错误,尽管我不能 100% 确定复制它们是否可以,进行任何修改。@alpercanberk 可以证实这一点。

I also encountered the same problem. My system is ubuntu18.04. Can you tell me where to manually copy the lib and external folders?

You should take a look at the PyFlex folder from the flingbot codebase here.
That worked for me to compile pyflex and make some modifications to the bindings.

Thank you very much for your reply. I have successfully compiled according to your guidance, but there are still some errors reported during the evaluation and training. Have you run through the author's training and evaluation code?

@zcswdt I did encounter a few issues when I tried to run training, but did not look into them as I was mostly interested in the data generation part.

@zcswdt
Copy link

zcswdt commented Nov 7, 2023

我遇到了同样的问题。
当与 flingbot 代码库(该代码库所基于的)进行比较时,我发现它 Pyflex/lib并不Pyflex/external完整。.gitignore我的猜测是这是由忽略所有文件夹的默认文件引起的错误lib/(请参见此处
lib/手动将和文件夹复制external/到此代码库中解决了构建错误,尽管我不能 100% 确定复制它们是否可以,进行任何修改。@alpercanberk 可以证实这一点。

I also encountered the same problem. My system is ubuntu18.04. Can you tell me where to manually copy the lib and external folders?

You should take a look at the PyFlex folder from the flingbot codebase here.
That worked for me to compile pyflex and make some modifications to the bindings.

Thank you very much for your reply. I have successfully compiled according to your guidance, but there are still some errors reported during the evaluation and training. Have you run through the author's training and evaluation code?

@zcswdt I did encounter a few issues when I tried to run training, but did not look into them as I was mostly interested in the data generation part.

May I ask if you have successfully run the code for the author's training section? When I was training the training instructions provided by the author, I found that as the number of training steps increased, the program would consume memory until all memory was consumed, and then the training program was killed. (Process finished with exit code 137 (interrupted by signal 9: SIGKILL).I don't know what caused this, it's really helpless. Please help me, thank you!

@tlpss
Copy link

tlpss commented Nov 8, 2023

我遇到了同样的问题。
当与 flingbot 代码库(该代码库所基于的)进行比较时,我发现它 Pyflex/lib并不Pyflex/external完整。.gitignore我的猜测是这是由忽略所有文件夹的默认文件引起的错误lib/(请参见此处
lib/手动将和文件夹复制external/到此代码库中解决了构建错误,尽管我不能 100% 确定复制它们是否可以,进行任何修改。@alpercanberk 可以证实这一点。

I also encountered the same problem. My system is ubuntu18.04. Can you tell me where to manually copy the lib and external folders?

You should take a look at the PyFlex folder from the flingbot codebase here.
That worked for me to compile pyflex and make some modifications to the bindings.

Thank you very much for your reply. I have successfully compiled according to your guidance, but there are still some errors reported during the evaluation and training. Have you run through the author's training and evaluation code?

@zcswdt I did encounter a few issues when I tried to run training, but did not look into them as I was mostly interested in the data generation part.

May I ask if you have successfully run the code for the author's training section? When I was training the training instructions provided by the author, I found that as the number of training steps increased, the program would consume memory until all memory was consumed, and then the training program was killed. (Process finished with exit code 137 (interrupted by signal 9: SIGKILL).I don't know what caused this, it's really helpless. Please help me, thank you!

@zcswdt I'm afraid I won't be able to help.. As mentioned before, I have also had some issues with the training script, but have not looked into them properly as I was focused on the data generation.

@alpercanberk
Copy link
Collaborator

Sorry @zcswdt, I don't have access to my original setup anymore. If you're having memory issues, have you been able to try using fewer processes / shrinking the network?

@zcswdt
Copy link

zcswdt commented Nov 9, 2023

Thank you for your reply. Setting the parameter num_processes to 8 will also consume memory. Today, I will adjust it to 1 to see the situation. How to shrink the network? I don't quite understand this

@zcswdt
Copy link

zcswdt commented Nov 9, 2023

@zcswdt I'm afraid I won't be able to help.. As mentioned before, I have also had some issues with the training script, but have not looked into them properly as I was focused on the data generation.

Thank you very much for your reply. I have actually completed the training, but it will consume my memory.

@alpercanberk
Copy link
Collaborator

Thank you for your reply. Setting the parameter num_processes to 8 will also consume memory. Today, I will adjust it to 1 to see the situation. How to shrink the network? I don't quite understand this

If the memory issue is due to the neural network being trained simultaneously with the simulation, then you may be able to set the network to have fewer parameters, I doubt it will change things by much

@zcswdt
Copy link

zcswdt commented Nov 11, 2023

Thank you for your reply. Setting the parameter num_processes to 8 will also consume memory. Today, I will adjust it to 1 to see the situation. How to shrink the network? I don't quite understand this

If the memory issue is due to the neural network being trained simultaneously with the simulation, then you may be able to set the network to have fewer parameters, I doubt it will change things by much

Thank you very much for your reply. Yesterday, I tried to evaluate the code again and found that it not only consumes memory for training, but also estimates approximately 400 data_size when using the evaluation model you provided When it comes to size, it also consumes my memory and causes the problem of killing the program. And during the evaluation, I added num_ process is also set to 1. I really don't know what to do. Your code is too important to me. Can you help me check it? My computer environment is strictly built according to the requirements on your readme. Of course, if possible, remotely control my computer and check my problem. Thank you very much. I really hope for your help. I have been trying this project for three months. Looking forward to your reply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants