Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MJX collision discrepancy between Linux and Mac #2127

Open
2 tasks done
carlosferrazza opened this issue Oct 8, 2024 · 6 comments
Open
2 tasks done

MJX collision discrepancy between Linux and Mac #2127

carlosferrazza opened this issue Oct 8, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@carlosferrazza
Copy link

carlosferrazza commented Oct 8, 2024

Intro

Hi!

I am running a simple test script to simulate in MJX a H1 humanoid robot with (box) collision geometries only at the feet.

My setup

Mujoco 3.1.6, Mujoco-MJX 3.1.6.

What's happening? What did you expect?

I am experiencing discrepancies in the behavior across Linux (either CPU or GPU) and Mac. If I run the same code, attached, I get the result below. Note the feet collisions with the ground, with the feet kind of penetraining into the ground only in the Linux version.

Linux:
https://github.com/user-attachments/assets/93f311d2-5e87-4fbf-9052-8f1bc02a4861

Mac:
https://github.com/user-attachments/assets/6672f320-d795-40ab-91d2-e7bca84d90c9

Steps for reproduction

To reproduce, run test.py, attached in the zip file below.

Minimal model for reproduction

mjx_mwe.zip

Code required for reproduction

See test.py in the zip file.

Confirmations

@carlosferrazza carlosferrazza added the bug Something isn't working label Oct 8, 2024
@btaba
Copy link
Collaborator

btaba commented Oct 8, 2024

Hi @carlosferrazza ,

Since these are running on different platforms, let's try to rule out that we're just looking at numerical differences. Can you run both setups on CPU with either of:

jax.config.update("jax_enable_x64", True)
jax.config.update('jax_default_matmul_precision', jax.lax.Precision.HIGH)

to see if this resolves the behavior you're looking at?

@carlosferrazza
Copy link
Author

carlosferrazza commented Oct 8, 2024

Hi @carlosferrazza ,

Since these are running on different platforms, let's try to rule out that we're just looking at numerical differences. Can you run both setups on CPU with either of:

jax.config.update("jax_enable_x64", True)
jax.config.update('jax_default_matmul_precision', jax.lax.Precision.HIGH)

to see if this resolves the behavior you're looking at?

Thanks for looking so quickly into it! Just tried adding either line, and still results in the original behaviors on both ends.

@btaba
Copy link
Collaborator

btaba commented Oct 8, 2024

Hi @carlosferrazza , I ran the script on mac and linux, and can't reproduce the issue. Out of curiosity, can you run the MJX unit tests with pytest on your Linux machine and see if they pass?

@carlosferrazza
Copy link
Author

Hi @btaba, that is interesting, I got to run this on another Linux machine and it works fine there. So the issue seems to be with my first Linux machine -- strange, give that the Conda envs are identical. I ran unit tests on both Linux machines, they pass with some warnings, but these are the same across both machines.

Any ideas/suggestions on what I could try next to solve the issue on the original Linux machine?

@btaba
Copy link
Collaborator

btaba commented Oct 8, 2024

Hi @carlosferrazza , good to hear at least one of your Linux machines is WAI. I'm assuming you ran on CPU across all machines at this point, and Linux machine A observes unintended behavior, while Linux machine B is WAI.

Nothing obvious comes to mind. We've controlled for JAX numerical precision config and the OS, so perhaps it's an issue with the actual CPU device or your environment. What are the differences between the CPU devices across machines A and B? Are there any differences in your env variables?

@erikfrey, if you have any ideas

@carlosferrazza
Copy link
Author

Hi @btaba, that's right, I have tried both CPU and GPU, and:

  • Linux machine A observes unintented behavior on either CPU or GPU
  • Linux machine B is WAI on either CPU or GPU

The CPU devices are both x86_64. The only relevant difference I could notice is that machine A is an AMD CPU, while machine B is an Intel, but not sure whether that could change anything. Also, no major differences in env variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants