{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":130725814,"defaultBranch":"master","name":"apex","ownerLogin":"NVIDIA","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2018-04-23T16:28:52.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/1728152?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1723873312.0","currentOid":""},"activityList":{"items":[{"before":"c3e4adf49377ceefc73eb8fea2f05f2570d8c031","after":"b7a4acc1c8599f9306b519c9a88c044f1b280a07","ref":"refs/heads/master","pushedAt":"2024-08-30T11:22:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Add Unittest For Distributed Adam With CUDA Graph (#1836)\n\n* Add unittest for distributed adam with cuda graph.\r\n\r\n* Fix the distributed adam issue if user passes float LR.\r\n\r\n* skip if world_size < 8\r\n\r\n---------\r\n\r\nCo-authored-by: Masaki Kozuki ","shortMessageHtmlLink":"Add Unittest For Distributed Adam With CUDA Graph (#1836)"}},{"before":"7d5ecf18bfc4bd01a4de9a9f9152f3751da023e3","after":"c3e4adf49377ceefc73eb8fea2f05f2570d8c031","ref":"refs/heads/master","pushedAt":"2024-08-30T10:20:53.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Reformat grad_output if it's not channels last. (#1837)","shortMessageHtmlLink":"Reformat grad_output if it's not channels last. (#1837)"}},{"before":"2d2db12fdda2821bc74d1efa83a22f6323d86a97","after":"7d5ecf18bfc4bd01a4de9a9f9152f3751da023e3","ref":"refs/heads/master","pushedAt":"2024-08-30T06:08:11.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Enhance Distributed Fused Adam (#1832)\n\n* Support NHWC for distributed fused adam.\r\n\r\nSigned-off-by: Wil Kong \r\n\r\n* Fix the gradient clipping bug with distributed adam.\r\n\r\nSigned-off-by: Wil Kong \r\n\r\n* Support CUDA graph for distributed fused adam.\r\n\r\nSigned-off-by: Wil Kong \r\n\r\n* Make sure key pointers are valid.\r\n\r\nSigned-off-by: Wil Kong \r\n\r\n* Better repr for distributed adam.\r\n\r\nSigned-off-by: Wil Kong \r\n\r\n* Warn if capturable is set but deprecated fused adam is not found.\r\n\r\nSigned-off-by: Wil Kong \r\n\r\n* Preserve memory format in parameter buffer of distributed adam.\r\n\r\nSigned-off-by: Wil Kong \r\n\r\n* Preserve memory format during parameter copy.\r\n\r\nSigned-off-by: Wil Kong \r\n\r\n* Fix the contiguous_param_buffer bug about bprop overlap and redundant copy after all-gather.\r\n\r\nSigned-off-by: Wil Kong \r\n\r\n* Fix the bug that process group is not set.\r\n\r\nSigned-off-by: Wil Kong \r\n\r\n* Add grad_scaler arg to distopt unscale_grads function\r\n\r\nCall unscale_grads within step if grad scaler is provided. Revert grad clipping logic.\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Fix typo\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Update apex/contrib/optimizers/distributed_fused_adam.py\r\n\r\nCo-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>\r\n\r\n* Update apex/contrib/optimizers/distributed_fused_adam.py\r\n\r\nCo-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>\r\n\r\n* Format dist adam code.\r\n\r\n* Revert \"Update apex/contrib/optimizers/distributed_fused_adam.py\"\r\n\r\nThis reverts commit 857e8f4c12817f4b40151811603892b9dcaa0275.\r\n\r\n* Fix the bug of LR tensor dimension which breaks LR scheduler.\r\n\r\n---------\r\n\r\nSigned-off-by: Wil Kong \r\nSigned-off-by: Tim Moon \r\nCo-authored-by: Tim Moon \r\nCo-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>","shortMessageHtmlLink":"Enhance Distributed Fused Adam (#1832)"}},{"before":"70018365c8add3e46574b897149db5d6dd21ef5c","after":"2d2db12fdda2821bc74d1efa83a22f6323d86a97","ref":"refs/heads/master","pushedAt":"2024-08-30T05:15:50.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Traceable GroupNorm (#1835)\n\n* Change GroupNorm set to frozenset for Inductor.\r\n\r\n* Enable Inductor for GroupNorm.\r\n\r\n* Switch GroupNorm from autograd function to torch library.\r\n\r\n* Switch default value of act from '' to None to avoid graph break in legacy Inductor.\r\n\r\n* Add unittest for GroupNorm with Inductor.\r\n\r\n* Update apex/contrib/group_norm/group_norm.py\r\n\r\nCo-authored-by: Masaki Kozuki \r\n\r\n---------\r\n\r\nCo-authored-by: Masaki Kozuki ","shortMessageHtmlLink":"Traceable GroupNorm (#1835)"}},{"before":"79e3dc48856b8786f95b1bdd219cc6a7d9bddd58","after":"70018365c8add3e46574b897149db5d6dd21ef5c","ref":"refs/heads/master","pushedAt":"2024-08-19T16:14:53.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"add **kwargs to DistributedTestBase._run (#1829)","shortMessageHtmlLink":"add **kwargs to DistributedTestBase._run (#1829)"}},{"before":null,"after":"d9beb668be411b77430a0849636539cf8ca7b73c","ref":"refs/heads/crcrpar-patch-1","pushedAt":"2024-08-17T05:41:52.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"remove `run_transformer` from default lists","shortMessageHtmlLink":"remove run_transformer from default lists"}},{"before":"59b80ee8df79cec125794949327f29913c328746","after":"79e3dc48856b8786f95b1bdd219cc6a7d9bddd58","ref":"refs/heads/master","pushedAt":"2024-08-17T05:39:52.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Fix illegal memory access with multi_tensor_apply size above INT_MAX (#1825)\n\nCurrently, multi_tensor_apply causes an illegal memory access due to\r\nan overflow in the `size` field of `TensorListMetadata`. This can be\r\nreproduced using the following standalone script:\r\n\r\n```python\r\nimport torch, amp_C\r\nfrom apex.multi_tensor_apply import multi_tensor_applier\r\nmulti_tensor_adam = amp_C.multi_tensor_adam\r\n\r\nsize = 2**32+1\r\ng_32 = [torch.zeros(size, dtype=torch.float32, device='cuda')]\r\np_32 = [torch.zeros(size, dtype=torch.float32, device='cuda')]\r\nm_32 = [torch.zeros(size, dtype=torch.float32, device='cuda')]\r\nv_32 = [torch.zeros(size, dtype=torch.float32, device='cuda')]\r\n_dummy_overflow_buf = torch.zeros(1, dtype=torch.int32, device='cuda')\r\n\r\nmulti_tensor_applier(multi_tensor_adam, _dummy_overflow_buf, [g_32, p_32, m_32, v_32], 0.0, 0.9, 0.95, 1e-08, 1, 1, 1, 0.1)\r\nprint(g_32)\r\n```","shortMessageHtmlLink":"Fix illegal memory access with multi_tensor_apply size above INT_MAX (#…"}},{"before":"f9f19d5cf08ad14d47608bfebd0bf3089d6d029a","after":"59b80ee8df79cec125794949327f29913c328746","ref":"refs/heads/master","pushedAt":"2024-07-25T12:28:44.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Allow Configurable Cache Directory (#1821)\n\n* Allow Configurable Cache Directory\r\n\r\n* Allow Configurable Cache Directory\r\n\r\n* Change ASP Cache Dir Environment Variable Name","shortMessageHtmlLink":"Allow Configurable Cache Directory (#1821)"}},{"before":"23c1f86520e22b505e8fdfcf6298273dff2d93d8","after":"f9f19d5cf08ad14d47608bfebd0bf3089d6d029a","ref":"refs/heads/master","pushedAt":"2024-07-24T04:36:00.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Do not monkey-patch class methods when registering distopt pre-forward hooks (#1820)\n\nSigned-off-by: Tim Moon ","shortMessageHtmlLink":"Do not monkey-patch class methods when registering distopt pre-forwar…"}},{"before":"f8e60c47c5c3034ddf8181e33910f3da5b289f25","after":"23c1f86520e22b505e8fdfcf6298273dff2d93d8","ref":"refs/heads/master","pushedAt":"2024-07-02T09:10:08.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"release gil (#1816)\n\nSigned-off-by: Masaki Kozuki ","shortMessageHtmlLink":"release gil (#1816)"}},{"before":"7b73b12361068a10b0f44844534613f252a5ea75","after":"f8e60c47c5c3034ddf8181e33910f3da5b289f25","ref":"refs/heads/master","pushedAt":"2024-06-29T04:10:41.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"deprecate uses of torch.cuda.amp (#1813)\n\n* deprecate uses of torch.cuda.amp\r\n\r\n* fix typo","shortMessageHtmlLink":"deprecate uses of torch.cuda.amp (#1813)"}},{"before":"a7de60e57f0534266841e1733262601ad76aaa74","after":"7b73b12361068a10b0f44844534613f252a5ea75","ref":"refs/heads/master","pushedAt":"2024-06-08T01:43:27.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Use torch.testing.all_close instead of get_max_diff in test_lamb.py (#1806)","shortMessageHtmlLink":"Use torch.testing.all_close instead of get_max_diff in test_lamb.py (#…"}},{"before":"a7de60e57f0534266841e1733262601ad76aaa74","after":"4138d31ff0acf4071d1dc001ccb7cd6e00800324","ref":"refs/heads/24.04.01-devel","pushedAt":"2024-04-28T04:50:32.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"Aidyn-A","name":null,"path":"/Aidyn-A","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/31858918?s=80&v=4"},"commit":{"message":"Enhance Distributed Fused Adam (#1794)","shortMessageHtmlLink":"Enhance Distributed Fused Adam (#1794)"}},{"before":null,"after":"a7de60e57f0534266841e1733262601ad76aaa74","ref":"refs/heads/24.04.01-devel","pushedAt":"2024-04-28T04:49:34.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"Aidyn-A","name":null,"path":"/Aidyn-A","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/31858918?s=80&v=4"},"commit":{"message":"Fix reduce_blocks_into_lanes race condition (#1798)\n\n* move __sync_threads() outside if branch\r\n\r\n* add clarifying comment","shortMessageHtmlLink":"Fix reduce_blocks_into_lanes race condition (#1798)"}},{"before":"f3f049246e5bdf6fdddf251ebe6b65dd4ca1ee29","after":"a7de60e57f0534266841e1733262601ad76aaa74","ref":"refs/heads/master","pushedAt":"2024-04-26T06:29:23.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Fix reduce_blocks_into_lanes race condition (#1798)\n\n* move __sync_threads() outside if branch\r\n\r\n* add clarifying comment","shortMessageHtmlLink":"Fix reduce_blocks_into_lanes race condition (#1798)"}},{"before":"6038fc1a364256c52d58fddb4bb0695cf4bbf60e","after":"f3f049246e5bdf6fdddf251ebe6b65dd4ca1ee29","ref":"refs/heads/master","pushedAt":"2024-04-24T19:35:07.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"Aidyn-A","name":null,"path":"/Aidyn-A","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/31858918?s=80&v=4"},"commit":{"message":"NCCL userbuffer for DP RS in DistOpt (#1797)\n\n* NCCL userbuffer for AG/RS in DistOpt\r\n\r\nSigned-off-by: qiyuw \r\n\r\n* remove empty line\r\n\r\nSigned-off-by: qiyuw \r\n\r\n* Add test case\r\n\r\nSigned-off-by: qiyuw \r\n\r\n* fix an issue\r\n\r\nSigned-off-by: Qiyu Wan \r\n\r\n---------\r\n\r\nSigned-off-by: qiyuw \r\nSigned-off-by: Qiyu Wan \r\nCo-authored-by: qiyuw \r\nCo-authored-by: Qiyu Wan ","shortMessageHtmlLink":"NCCL userbuffer for DP RS in DistOpt (#1797)"}},{"before":"b5df1ccf89d8013556b1d1d823fc34268cae8e9c","after":"6038fc1a364256c52d58fddb4bb0695cf4bbf60e","ref":"refs/heads/master","pushedAt":"2024-04-24T19:34:30.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"Aidyn-A","name":null,"path":"/Aidyn-A","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/31858918?s=80&v=4"},"commit":{"message":"Add nccl_allocator for zero-copy user buffer (#1796)\n\n* add nccl_allocator for zero-copy user buffer\r\n\r\n* review comments","shortMessageHtmlLink":"Add nccl_allocator for zero-copy user buffer (#1796)"}},{"before":"c5f6b7958922d5fb730ea7172309a0dbd43033c1","after":"b5df1ccf89d8013556b1d1d823fc34268cae8e9c","ref":"refs/heads/master","pushedAt":"2024-04-19T05:13:09.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Add 2D Fused RoPE (#1784)\n\n* add 2D fused RoPE\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* Update fused_rotary_positional_embedding.h\r\n\r\n---------\r\n\r\nSigned-off-by: Xin Yao ","shortMessageHtmlLink":"Add 2D Fused RoPE (#1784)"}},{"before":"810ffae374a2b9cb4b5c5e28eaeca7d7998fca0c","after":"c5f6b7958922d5fb730ea7172309a0dbd43033c1","ref":"refs/heads/master","pushedAt":"2024-04-19T04:17:24.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"move to correct device for v1 state (#1783)","shortMessageHtmlLink":"move to correct device for v1 state (#1783)"}},{"before":"b496d85fb88a801d8e680872a12822de310951fd","after":"810ffae374a2b9cb4b5c5e28eaeca7d7998fca0c","ref":"refs/heads/master","pushedAt":"2024-03-12T04:38:33.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Update test_fused_softmax.py (#1782)","shortMessageHtmlLink":"Update test_fused_softmax.py (#1782)"}},{"before":"5b67cd5f6b5174ef21a7190fc24583ce52e7187e","after":"b496d85fb88a801d8e680872a12822de310951fd","ref":"refs/heads/master","pushedAt":"2024-02-08T01:28:57.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Support scaled optimizer state in distributed Adam optimizer (#1771)\n\n* Add distopt support for scaled states\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Debug distopt checkpointing with scaled optimizer state\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Debug inconsistent variable name\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Debug checkpointing\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Complain if scaling fp32 states\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Make sure state scaling is done in fp32\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Change from per-parameter scaling factors to per-fragment\r\n\r\nCall _check_params_shard_dtypes within _local_step. Fuse scaling factor computation.\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Support overlapping first bucket AG with scaled state\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Correctly load in per-param-group settings from checkpoint\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Handle with contiguous param buffer and int param sync dtype\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Tweak docs\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Fix excessive memory usage with scaled optim state\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Silence warning about autograd through broadcast\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Debug tests with multiple models\r\n\r\nShows up in PyTorch builds starting 20240118.\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n---------\r\n\r\nSigned-off-by: Tim Moon ","shortMessageHtmlLink":"Support scaled optimizer state in distributed Adam optimizer (#1771)"}},{"before":"7e239f7534562c88dd03e2d3919ed1ec8a872a1f","after":"5b67cd5f6b5174ef21a7190fc24583ce52e7187e","ref":"refs/heads/master","pushedAt":"2024-02-07T20:52:50.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"Aidyn-A","name":null,"path":"/Aidyn-A","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/31858918?s=80&v=4"},"commit":{"message":"Add GPUDirect Storage (#1774)\n\n* add gpu_direct_storage\r\n\r\n* apply suggested changes\r\n\r\n* use OOP API","shortMessageHtmlLink":"Add GPUDirect Storage (#1774)"}},{"before":"141bbf1cf362d4ca4d94f4284393e91dda5105a5","after":"7e239f7534562c88dd03e2d3919ed1ec8a872a1f","ref":"refs/heads/master","pushedAt":"2024-02-07T07:25:31.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Skip the p2p test on single GPU platforms (#1775)","shortMessageHtmlLink":"Skip the p2p test on single GPU platforms (#1775)"}},{"before":"6c8f384b40a596bbed960f5e8d9a808ebd0e93d8","after":"141bbf1cf362d4ca4d94f4284393e91dda5105a5","ref":"refs/heads/master","pushedAt":"2024-01-25T04:40:36.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Update test_adam.py (#1772)","shortMessageHtmlLink":"Update test_adam.py (#1772)"}},{"before":"48c4894c4b38b2b77cd7a0473ca665e89c9c148b","after":"6c8f384b40a596bbed960f5e8d9a808ebd0e93d8","ref":"refs/heads/master","pushedAt":"2024-01-18T16:11:01.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Update test_bottleneck_module.py - Skip BottleNeck Peer Memory Test if Not Supported (#1769)\n\nIf hw configuration disabled peer memory access, skip the bottleneck tests.","shortMessageHtmlLink":"Update test_bottleneck_module.py - Skip BottleNeck Peer Memory Test i…"}},{"before":"f058162b215791b15507bb542f22ccfde49c872d","after":"48c4894c4b38b2b77cd7a0473ca665e89c9c148b","ref":"refs/heads/master","pushedAt":"2024-01-12T17:25:39.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Update test_transducer_joint.py (#1767)\n\nIncrease tolerance to workaround unit test failures \r\n\r\n torch.testing.assert_close(f_grad_ref, f_grad_tst, atol=1e-5, rtol=1e-5)\r\nMismatched elements: 1 / 205636 (0.0%)\r\nGreatest absolute difference: 3.0517578125e-05 at index (3, 27, 390) (up to 1e-05 allowed)\r\nGreatest relative difference: 0.000492095947265625 at index (3, 27, 390) (up to 1e-05 allowed)\r\n\r\n torch.testing.assert_close(g_grad_ref, g_grad_tst, atol=1e-4, rtol=1e-4)\r\nMismatched elements: 1 / 51200 (0.0%)\r\nGreatest absolute difference: 0.0009765625 at index (0, 15, 280) (up to 0.0001 allowed)\r\nGreatest relative difference: 0.0008397102355957031 at index (0, 15, 280) (up to 0.0001 allowed)","shortMessageHtmlLink":"Update test_transducer_joint.py (#1767)"}},{"before":"e9789cc46c3189c9652df3e5752aa3c56909767e","after":"f058162b215791b15507bb542f22ccfde49c872d","ref":"refs/heads/master","pushedAt":"2024-01-12T04:40:32.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Fused RoPE for `thd` format (#1756)\n\n* fused rope for thd format\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* update the test\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* update test\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* remove redudant arguments\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* add comments & simplify code\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n---------\r\n\r\nSigned-off-by: Xin Yao ","shortMessageHtmlLink":"Fused RoPE for thd format (#1756)"}},{"before":"87c4debde8000636ab60b0fc477f324af789c1f7","after":"e9789cc46c3189c9652df3e5752aa3c56909767e","ref":"refs/heads/master","pushedAt":"2024-01-10T15:03:09.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Increase tolerance to workaround tolerance issues on A100 (#1766)\n\nfailures happen with absolute difference of ~0.001190185546875 and relative diff of ~0.0306854248046875.","shortMessageHtmlLink":"Increase tolerance to workaround tolerance issues on A100 (#1766)"}},{"before":"c07a4cf67102b9cd3f97d1ba36690f985bae4227","after":"87c4debde8000636ab60b0fc477f324af789c1f7","ref":"refs/heads/master","pushedAt":"2024-01-05T08:03:05.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"64-bit indexing Adam (#1765)\n\n* all i want for christmas is larger binaries and longer compile times\r\n\r\n* actually compare\r\n\r\n* woops","shortMessageHtmlLink":"64-bit indexing Adam (#1765)"}},{"before":"ccffcc43489f2d3556eab2cff1953e4962fba5b4","after":"c07a4cf67102b9cd3f97d1ba36690f985bae4227","ref":"refs/heads/master","pushedAt":"2024-01-01T05:17:33.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"crcrpar","name":"Masaki Kozuki","path":"/crcrpar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16191443?s=80&v=4"},"commit":{"message":"Make fused layer norm functions backward-compatible (#1760)\n\nSigned-off-by: Tim Moon ","shortMessageHtmlLink":"Make fused layer norm functions backward-compatible (#1760)"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOC0zMFQxMToyMjo0OC4wMDAwMDBazwAAAASole3x","startCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOC0zMFQxMToyMjo0OC4wMDAwMDBazwAAAASole3x","endCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wMS0wMVQwNToxNzozMy4wMDAwMDBazwAAAAPVRa9Q"}},"title":"Activity · NVIDIA/apex"}