Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Metal.jl #76

Merged
merged 3 commits into from
Jul 24, 2024
Merged

Add support for Metal.jl #76

merged 3 commits into from
Jul 24, 2024

Conversation

zhenwu0728
Copy link
Member

Now our model can run on M-series Mac using GPU.

@zhenwu0728
Copy link
Member Author

Benchmark:

macOS 14.5.0, Darwin 23.5.0

Toolchain:
- Julia: 1.10.4
- LLVM: 15.0.7

Julia packages: 
- Metal.jl: 1.2.0
- LLVMDowngrader_jll: 0.3.0+1

1 device:
- Apple M3 Max (13.840 GiB allocated)
Arch N min median mean max memory allocs samples
CPU 1024 683.167 μs 701.167 μs 739.829 μs 945.500 μs 308.56 KiB 3075 10
CPU 1048576 586.618 ms 593.765 ms 606.757 ms 714.189 ms 307.33 KiB 2996 9
CPU 33554432 19.917 s 19.917 s 19.917 s 19.917 s 307.33 KiB 2996 1
GPU 1024 260.176 ms 277.004 ms 281.977 ms 309.268 ms 5.04 MiB 171640 10
GPU 1048576 221.820 ms 256.475 ms 248.758 ms 274.850 ms 5.04 MiB 171920 10
GPU 33554432 599.450 ms 626.963 ms 673.170 ms 1.036 s 5.04 MiB 171914 8
Arch N Ns min median mean max memory allocs samples
CPU 1024 32 1.834 ms 1.972 ms 1.997 ms 2.231 ms 1.80 MiB 3695 10
CPU 1024 64 5.022 ms 5.266 ms 5.223 ms 5.355 ms 5.65 MiB 4486 10
CPU 1024 128 16.857 ms 17.008 ms 17.039 ms 17.416 ms 20.50 MiB 5689 10
CPU 1048576 32 563.261 ms 563.884 ms 576.715 ms 678.273 ms 1.80 MiB 3616 9
CPU 1048576 64 629.369 ms 631.995 ms 659.017 ms 849.875 ms 5.65 MiB 4407 8
CPU 1048576 128 654.082 ms 654.977 ms 682.798 ms 876.487 ms 20.50 MiB 5689 8
CPU 33554432 32 18.848 s 18.848 s 18.848 s 18.848 s 1.80 MiB 3616 1
CPU 33554432 64 20.795 s 20.795 s 20.795 s 20.795 s 5.65 MiB 4407 1
CPU 33554432 128 21.136 s 21.136 s 21.136 s 21.136 s 20.50 MiB 5689 1
GPU 1024 32 234.488 ms 252.981 ms 264.093 ms 316.517 ms 5.38 MiB 180665 10
GPU 1024 64 265.827 ms 298.026 ms 295.401 ms 326.257 ms 5.91 MiB 199244 10
GPU 1024 128 309.454 ms 346.847 ms 344.838 ms 382.297 ms 7.19 MiB 241456 10
GPU 1048576 32 214.440 ms 248.960 ms 249.994 ms 268.858 ms 5.39 MiB 180913 10
GPU 1048576 64 263.491 ms 277.978 ms 285.221 ms 323.347 ms 5.92 MiB 199491 10
GPU 1048576 128 297.155 ms 331.759 ms 325.221 ms 351.346 ms 7.22 MiB 241840 10
GPU 33554432 32 546.911 ms 581.667 ms 613.843 ms 899.804 ms 5.39 MiB 181009 9
GPU 33554432 64 524.873 ms 589.794 ms 614.841 ms 871.087 ms 5.92 MiB 199572 9
GPU 33554432 128 597.942 ms 659.205 ms 694.217 ms 929.509 ms 7.20 MiB 241749 8
Arch N Ns min median mean max memory allocs samples
CPU 1024 32 12.877 ms 13.145 ms 13.384 ms 15.230 ms 933.91 KiB 3984 10
CPU 1024 64 92.317 ms 94.492 ms 94.585 ms 96.147 ms 5.40 MiB 7697 10
CPU 1048576 32 657.237 ms 660.457 ms 688.958 ms 887.481 ms 933.91 KiB 3984 8
CPU 1048576 64 737.114 ms 748.572 ms 745.994 ms 756.733 ms 5.40 MiB 7697 7
CPU 33554432 32 21.287 s 21.287 s 21.287 s 21.287 s 933.91 KiB 3984 1
CPU 33554432 64 20.937 s 20.937 s 20.937 s 20.937 s 5.40 MiB 7697 1
GPU 1024 32 226.726 ms 257.110 ms 258.882 ms 327.908 ms 6.25 MiB 200722 10
GPU 1024 64 228.963 ms 261.200 ms 262.250 ms 297.629 ms 15.52 MiB 450049 10
GPU 1048576 32 184.881 ms 251.116 ms 252.394 ms 315.713 ms 6.24 MiB 200717 10
GPU 1048576 64 250.513 ms 300.799 ms 300.384 ms 343.170 ms 15.53 MiB 450547 10
GPU 33554432 32 546.412 ms 585.858 ms 635.806 ms 829.269 ms 6.25 MiB 200850 8
GPU 33554432 64 537.751 ms 560.650 ms 611.406 ms 830.820 ms 15.53 MiB 450770 9

@zhenwu0728 zhenwu0728 merged commit ff7af43 into master Jul 24, 2024
3 checks passed
@zhenwu0728 zhenwu0728 deleted the metal branch July 24, 2024 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant