viewer ERROR: mj_stackAlloc: out of memory, stack overflow #2305

junqingqiao · 2024-12-21T21:09:27Z

Intro

Hi!

I am a graduate student at MIT, I use mjx for my exoskeleton research.

My setup

I'm using my custom MJX (with muscle model with flexible tendon), most of which is the same as the default MJX. Python3.11, ubuntu 2404, nvidia 4070

What's happening? What did you expect?

I just merged my code with the newest main mujoco repo. After successful compile and testing. When I run the viewer I get this error:

ERROR: mj_stackAlloc: out of memory, stack overflow
max = 0, available = 0, requested = 24
nefc = 0, ncon = 0

Press Enter to exit ...

There was a similar issue #1280 but the solution wasn't mentioned.

Steps for reproduction

the following is similar to the standard code to run the passive viewer

with mujoco.viewer.launch_passive(mj_model, mj_data) as viewer:
        while viewer.is_running():
            # Update mjx_data from mj_data. The mj_data was modified by the viewer
            # mjx_data = mjx_data.replace(ctrl=mj_data.ctrl, xfrc_applied=mj_data.xfrc_applied)
            # Use the nerual network to generate a ctrl signal
            
            mjx_data = mjx_data.replace(xfrc_applied=jp.array(mj_data.xfrc_applied*10, dtype=jp.float32))
            
            # Generate key
            # key = jax.random.split(key,1)[0]
            # xfrc = jax.random.uniform(key,(mjx_model.nbody, 6), minval=-10, maxval=10)
            # mjx_data = mjx_data.replace(xfrc_applied=xfrc)
            mjx_data = mjx_data.replace(
                qpos= jp.array(mj_data.qpos, dtype=jp.float32),
                qvel= jp.array(mj_data.qvel, dtype=jp.float32),
                time = jp.array(mj_data.time, dtype=jp.float32))
            
            # Update mjx_model from mj_model
            mjx_model = mjx_model.tree_replace({
                'opt.gravity': jp.array(mj_model.opt.gravity, dtype=jp.float32),
                'opt.tolerance': jp.array(mj_model.opt.tolerance, dtype=jp.float32),
                'opt.ls_tolerance': jp.array(mj_model.opt.ls_tolerance, dtype=jp.float32),
                'opt.timestep': jp.array(mj_model.opt.timestep, dtype=jp.float32),
            })
            
            
            # Control Muscle
            mjx_data = mjx_data.replace(biomtu = mjx_data.biomtu.replace(act = jp.ones(mjx_model.nbiomtu)*0.05))
            
            # mjx_data = mjx_step(mjx_model, mjx_data)
            mjx_data = jit_multiple_steps(mjx_model, mjx_data)
            # mjx_data, loss, exps = jit_nn_multi_steps(controller_params, mjx_model, mjx_data, key)
            # mjx_data, key, act = jit_nn_mjx_one_step_no_random(controller_params, mjx_model, mjx_data, key)
            
            mjx.get_data_into(mj_data, mj_model, mjx_data)
            
            # Record the current time at the start of this frame
            current_frame_time = time.time()
        
            # Calculate the difference in time from the last frame
            time_between_frames = current_frame_time - previous_frame_time
        
            # Print the time between frames
            print(f"Time between frames: {time_between_frames} seconds")
            previous_frame_time = current_frame_time
            
            # print("ACT:", mjx_data.biomtu.act)
            # print(mjx_data.qpos)
            # print(mj_data.sen)  
            # print(mjx_data.sensordata[3:6])
            # print(mjx_data.biomtu.act)
            # print(mjx_data.qfrc_inverse[6], mjx_data.qfrc_inverse[15] )
            # print(mjx_data.qfrc_constraint[6], mjx_data.qfrc_constraint[15])
            # print(len(mjx_data.qvel))
            
            
            viewer.sync()

Please ignore the code related to the muscle.

Confirmations

I searched the latest documentation thoroughly before posting.
I searched previous Issues and Discussions, I am certain this has not been raised before.

The text was updated successfully, but these errors were encountered:

yuvaltassa · 2024-12-21T21:40:26Z

To clarify, this is happening with 3.2.6 or at HEAD?

junqingqiao · 2024-12-21T22:56:42Z

my mujoco version shows 3.2.7. Maybe I updated too fast.

yuvaltassa · 2024-12-22T09:07:02Z

I think MJX should only be used from official releases, rather than from head? @btaba is that right?

btaba · 2024-12-22T11:24:50Z

The MJX and MuJoCo versions should match. Since you are using the latest version of MuJoCo, you'll want to pip install -e . the commit in MJX that corresponds with that MuJoCo version.

I'm not sure this will fix the mj_stackAlloc issue, but it's worth a try. Do you have the full traceback? Indeed this looks similar to #1280

junqingqiao · 2024-12-23T18:52:12Z

Hey Btaba, thanks for the quick reply. I will try to get the full traceback today. Since I modified both mujoco and mjx to support my muscle model. I have to build from sketch. Here is my building script.

Instructions

Comfigure Cmake

mkdir build && cd build && cmake .. -DCMAKE_BUILD_TYPE:STRING=Release -DCMAKE_INTERPROCEDURAL_OPTIMIZATION:BOOL=OFF -DCMAKE_INSTALL_PREFIX:STRING=/home/bugman/Currentwork/mjx_muscle_tendon/build/mujoco_install -DMUJOCO_BUILD_EXAMPLES:BOOL=OFF -G Ninja -DCMAKE_C_COMPILER:STRING=gcc-13 -DCMAKE_CXX_COMPILER:STRING=g++-13 -DCMAKE_EXE_LINKER_FLAGS:STRING=-Wl,--no-as-needed

Build Cmake

cmake --build . --config=Release

Cmake install

cmake --install .

Copy plugins

mkdir -p /home/bugman/Currentwork/mjx_muscle_tendon/build/mujoco_install/mujoco_plugin && cp lib/libactuator.* /home/bugman/Currentwork/mjx_muscle_tendon/build/mujoco_install/mujoco_plugin && cp lib/libelasticity.* /home/bugman/Currentwork/mjx_muscle_tendon/build/mujoco_install/mujoco_plugin && cp lib/libsensor.* /home/bugman/Currentwork/mjx_muscle_tendon/build/mujoco_install/mujoco_plugin && cp lib/libsdf.* /home/bugman/Currentwork/mjx_muscle_tendon/build/mujoco_install/mujoco_plugin

Perpare python

conda activate biomujoco
./make_sdist.sh

Build python

MUJOCO_PATH="/home/bugman/Currentwork/mjx_muscle_tendon/build/mujoco_install" MUJOCO_PLUGIN_PATH="/home/bugman/Currentwork/mjx_muscle_tendon/build/mujoco_install/mujoco_plugin" MUJOCO_CMAKE_ARGS="-DCMAKE_INTERPROCEDURAL_OPTIMIZATION:BOOL=OFF -G Ninja" pip wheel -v --no-deps mujoco-*.tar.gz

Install python

pip install --upgrade --no-deps --force-reinstall mujoco-*.whl

Combine MJX to Mujoco by link

ln -s /home/bugman/Currentwork/mjx_muscle_tendon/mjx/mujoco/mjx /home/bugman/anaconda3/envs/biomujoco/lib/python3.11/site-packages/mujoco/mjx

Deal with libstdc++ missing

conda install -c conda-forge libstdcxx-ng

The script worked for the last 8 months. I only got the viewer error 2 days ago.
Thanks.

junqingqiao · 2024-12-24T02:17:10Z

I can only trace back to line 428 simulate.render_loop() in the viewer.py. After that is the C++ code that I can't trace back in the python environment.

yuvaltassa · 2024-12-24T08:05:27Z

I think I know what the issue is. There was a memory leak that was fixed 4 days ago in 3f855f3, if you sync to current HEAD, this should go away.

junqingqiao · 2024-12-24T17:48:27Z

Yuval, thanks for the reply and Merry Christmas!. I synced to the current HEAD and the error presists. If I go back to two weeks ago, there will be no error. Like the image shows.

yuvaltassa · 2024-12-24T20:28:12Z

Sorry for the faff, I know this is a pain, can you do a binary search to see which commit is responsible? The only one that changed stack allocation is "Disable tendon caternary..." which introduced a memory leak and then that was fixed by "Fix stack allocation leak...".

Stack allocation was also changed by 2691887, and I can see the new message from here, but stack mis-allocations are supposed to now print a line number and function as you can see here, but you're not getting those. Why not? It's very strange...

yuvaltassa · 2024-12-24T20:33:15Z

Also, how many tendons do you have in your model? Could it be 6? Because the memory leak was from
int* tendon_actuated = mjSTACKALLOC(d, m->ntendon, int);, which would be 24 bytes...

If that's the case then it would imply that either the fix is bad (I don't think it is), or you somehow didn't really sync to HEAD... perhaps you just need to delete your build directory and try again?

junqingqiao added the bug Something isn't working label Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

viewer ERROR: mj_stackAlloc: out of memory, stack overflow #2305

viewer ERROR: mj_stackAlloc: out of memory, stack overflow #2305

junqingqiao commented Dec 21, 2024 •

edited

Loading

yuvaltassa commented Dec 21, 2024

junqingqiao commented Dec 21, 2024

yuvaltassa commented Dec 22, 2024

btaba commented Dec 22, 2024

junqingqiao commented Dec 23, 2024 •

edited

Loading

junqingqiao commented Dec 24, 2024

yuvaltassa commented Dec 24, 2024

junqingqiao commented Dec 24, 2024 •

edited

Loading

yuvaltassa commented Dec 24, 2024

yuvaltassa commented Dec 24, 2024

viewer ERROR: mj_stackAlloc: out of memory, stack overflow #2305

viewer ERROR: mj_stackAlloc: out of memory, stack overflow #2305

Comments

junqingqiao commented Dec 21, 2024 • edited Loading

Intro

My setup

What's happening? What did you expect?

Steps for reproduction

Confirmations

yuvaltassa commented Dec 21, 2024

junqingqiao commented Dec 21, 2024

yuvaltassa commented Dec 22, 2024

btaba commented Dec 22, 2024

junqingqiao commented Dec 23, 2024 • edited Loading

Instructions

junqingqiao commented Dec 24, 2024

yuvaltassa commented Dec 24, 2024

junqingqiao commented Dec 24, 2024 • edited Loading

yuvaltassa commented Dec 24, 2024

yuvaltassa commented Dec 24, 2024

junqingqiao commented Dec 21, 2024 •

edited

Loading

junqingqiao commented Dec 23, 2024 •

edited

Loading

junqingqiao commented Dec 24, 2024 •

edited

Loading