Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

viewer ERROR: mj_stackAlloc: out of memory, stack overflow #2305

Open
2 tasks done
junqingqiao opened this issue Dec 21, 2024 · 10 comments
Open
2 tasks done

viewer ERROR: mj_stackAlloc: out of memory, stack overflow #2305

junqingqiao opened this issue Dec 21, 2024 · 10 comments
Labels
bug Something isn't working

Comments

@junqingqiao
Copy link

junqingqiao commented Dec 21, 2024

Intro

Hi!

I am a graduate student at MIT, I use mjx for my exoskeleton research.

My setup

I'm using my custom MJX (with muscle model with flexible tendon), most of which is the same as the default MJX. Python3.11, ubuntu 2404, nvidia 4070

What's happening? What did you expect?

I just merged my code with the newest main mujoco repo. After successful compile and testing. When I run the viewer I get this error:

ERROR: mj_stackAlloc: out of memory, stack overflow
max = 0, available = 0, requested = 24
nefc = 0, ncon = 0

Press Enter to exit ...

There was a similar issue #1280 but the solution wasn't mentioned.

Steps for reproduction

the following is similar to the standard code to run the passive viewer

with mujoco.viewer.launch_passive(mj_model, mj_data) as viewer:
        while viewer.is_running():
            # Update mjx_data from mj_data. The mj_data was modified by the viewer
            # mjx_data = mjx_data.replace(ctrl=mj_data.ctrl, xfrc_applied=mj_data.xfrc_applied)
            # Use the nerual network to generate a ctrl signal
            
            mjx_data = mjx_data.replace(xfrc_applied=jp.array(mj_data.xfrc_applied*10, dtype=jp.float32))
            
            # Generate key
            # key = jax.random.split(key,1)[0]
            # xfrc = jax.random.uniform(key,(mjx_model.nbody, 6), minval=-10, maxval=10)
            # mjx_data = mjx_data.replace(xfrc_applied=xfrc)
            mjx_data = mjx_data.replace(
                qpos= jp.array(mj_data.qpos, dtype=jp.float32),
                qvel= jp.array(mj_data.qvel, dtype=jp.float32),
                time = jp.array(mj_data.time, dtype=jp.float32))
            
            # Update mjx_model from mj_model
            mjx_model = mjx_model.tree_replace({
                'opt.gravity': jp.array(mj_model.opt.gravity, dtype=jp.float32),
                'opt.tolerance': jp.array(mj_model.opt.tolerance, dtype=jp.float32),
                'opt.ls_tolerance': jp.array(mj_model.opt.ls_tolerance, dtype=jp.float32),
                'opt.timestep': jp.array(mj_model.opt.timestep, dtype=jp.float32),
            })
            
            
            # Control Muscle
            mjx_data = mjx_data.replace(biomtu = mjx_data.biomtu.replace(act = jp.ones(mjx_model.nbiomtu)*0.05))
            
            # mjx_data = mjx_step(mjx_model, mjx_data)
            mjx_data = jit_multiple_steps(mjx_model, mjx_data)
            # mjx_data, loss, exps = jit_nn_multi_steps(controller_params, mjx_model, mjx_data, key)
            # mjx_data, key, act = jit_nn_mjx_one_step_no_random(controller_params, mjx_model, mjx_data, key)
            
            mjx.get_data_into(mj_data, mj_model, mjx_data)
            
            # Record the current time at the start of this frame
            current_frame_time = time.time()
        
            # Calculate the difference in time from the last frame
            time_between_frames = current_frame_time - previous_frame_time
        
            # Print the time between frames
            print(f"Time between frames: {time_between_frames} seconds")
            previous_frame_time = current_frame_time
            
            # print("ACT:", mjx_data.biomtu.act)
            # print(mjx_data.qpos)
            # print(mj_data.sen)  
            # print(mjx_data.sensordata[3:6])
            # print(mjx_data.biomtu.act)
            # print(mjx_data.qfrc_inverse[6], mjx_data.qfrc_inverse[15] )
            # print(mjx_data.qfrc_constraint[6], mjx_data.qfrc_constraint[15])
            # print(len(mjx_data.qvel))
            
            
            viewer.sync()

Please ignore the code related to the muscle.

Confirmations

@junqingqiao junqingqiao added the bug Something isn't working label Dec 21, 2024
@yuvaltassa
Copy link
Collaborator

To clarify, this is happening with 3.2.6 or at HEAD?

@junqingqiao
Copy link
Author

my mujoco version shows 3.2.7. Maybe I updated too fast.

@yuvaltassa
Copy link
Collaborator

I think MJX should only be used from official releases, rather than from head? @btaba is that right?

@btaba
Copy link
Collaborator

btaba commented Dec 22, 2024

The MJX and MuJoCo versions should match. Since you are using the latest version of MuJoCo, you'll want to pip install -e . the commit in MJX that corresponds with that MuJoCo version.

I'm not sure this will fix the mj_stackAlloc issue, but it's worth a try. Do you have the full traceback? Indeed this looks similar to #1280

@junqingqiao
Copy link
Author

junqingqiao commented Dec 23, 2024

Hey Btaba, thanks for the quick reply. I will try to get the full traceback today. Since I modified both mujoco and mjx to support my muscle model. I have to build from sketch. Here is my building script.

Instructions

  • Comfigure Cmake
mkdir build && cd build && cmake .. -DCMAKE_BUILD_TYPE:STRING=Release -DCMAKE_INTERPROCEDURAL_OPTIMIZATION:BOOL=OFF -DCMAKE_INSTALL_PREFIX:STRING=/home/bugman/Currentwork/mjx_muscle_tendon/build/mujoco_install -DMUJOCO_BUILD_EXAMPLES:BOOL=OFF -G Ninja -DCMAKE_C_COMPILER:STRING=gcc-13 -DCMAKE_CXX_COMPILER:STRING=g++-13 -DCMAKE_EXE_LINKER_FLAGS:STRING=-Wl,--no-as-needed
  • Build Cmake
cmake --build . --config=Release
  • Cmake install
cmake --install .
  • Copy plugins
mkdir -p /home/bugman/Currentwork/mjx_muscle_tendon/build/mujoco_install/mujoco_plugin && cp lib/libactuator.* /home/bugman/Currentwork/mjx_muscle_tendon/build/mujoco_install/mujoco_plugin && cp lib/libelasticity.* /home/bugman/Currentwork/mjx_muscle_tendon/build/mujoco_install/mujoco_plugin && cp lib/libsensor.* /home/bugman/Currentwork/mjx_muscle_tendon/build/mujoco_install/mujoco_plugin && cp lib/libsdf.* /home/bugman/Currentwork/mjx_muscle_tendon/build/mujoco_install/mujoco_plugin
  • Perpare python
conda activate biomujoco
./make_sdist.sh
  • Build python
MUJOCO_PATH="/home/bugman/Currentwork/mjx_muscle_tendon/build/mujoco_install" MUJOCO_PLUGIN_PATH="/home/bugman/Currentwork/mjx_muscle_tendon/build/mujoco_install/mujoco_plugin" MUJOCO_CMAKE_ARGS="-DCMAKE_INTERPROCEDURAL_OPTIMIZATION:BOOL=OFF -G Ninja" pip wheel -v --no-deps mujoco-*.tar.gz
  • Install python
pip install --upgrade --no-deps --force-reinstall mujoco-*.whl
  • Combine MJX to Mujoco by link
ln -s /home/bugman/Currentwork/mjx_muscle_tendon/mjx/mujoco/mjx /home/bugman/anaconda3/envs/biomujoco/lib/python3.11/site-packages/mujoco/mjx
  • Deal with libstdc++ missing
conda install -c conda-forge libstdcxx-ng

The script worked for the last 8 months. I only got the viewer error 2 days ago.
Thanks.

@junqingqiao
Copy link
Author

I can only trace back to line 428 simulate.render_loop() in the viewer.py. After that is the C++ code that I can't trace back in the python environment.

@yuvaltassa
Copy link
Collaborator

I think I know what the issue is. There was a memory leak that was fixed 4 days ago in 3f855f3, if you sync to current HEAD, this should go away.

@junqingqiao
Copy link
Author

junqingqiao commented Dec 24, 2024

Yuval, thanks for the reply and Merry Christmas!. I synced to the current HEAD and the error presists. If I go back to two weeks ago, there will be no error. Like the image shows.
Screenshot from 2024-12-24 12-45-43

@yuvaltassa
Copy link
Collaborator

Sorry for the faff, I know this is a pain, can you do a binary search to see which commit is responsible? The only one that changed stack allocation is "Disable tendon caternary..." which introduced a memory leak and then that was fixed by "Fix stack allocation leak...".

Stack allocation was also changed by 2691887, and I can see the new message from here, but stack mis-allocations are supposed to now print a line number and function as you can see here, but you're not getting those. Why not? It's very strange...

@yuvaltassa
Copy link
Collaborator

Also, how many tendons do you have in your model? Could it be 6? Because the memory leak was from
int* tendon_actuated = mjSTACKALLOC(d, m->ntendon, int);, which would be 24 bytes...

If that's the case then it would imply that either the fix is bad (I don't think it is), or you somehow didn't really sync to HEAD... perhaps you just need to delete your build directory and try again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants