[Train] Add example of fine-tuning Llama-2 on Intel Gaudi #44667

harborn · 2024-04-11T08:30:07Z

Why are these changes needed?

To leverage the potential of Intel Gaudi accelerator, we extend Ray Train's capabilities by adding support for Intel Gaudi (HPU) hardware. This PR include an example for fine-tuning Llama-2-7b on multi HPUs.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

doc/source/train/examples/intel_gaudi/llama.ipynb

aslonnie

let docs reviewer know when this is ready for review.

woshiyyya

Thanks for the contributing these two high-quality examples @harborn ! Great demonstration of using LoRA and deepspeed ZERO-3 on Gaudi + Ray for fine-tuning.

Left some comments.

doc/source/train/examples.yml

doc/source/train/examples/intel_gaudi/llama.ipynb

doc/source/train/examples/intel_gaudi/llama_deepspeed.ipynb

doc/source/train/examples/intel_gaudi/llama.ipynb

woshiyyya · 2024-05-14T22:54:07Z

@justinvyu Can you take a look and merge it?

justinvyu

The examples look good! Just a few requests:

Clear the cell outputs, and just put a mock markdown cell with the important output info. For example, just this information:

train_result = TrainOutput(global_step=62, training_loss=1.500297857869056, metrics={'train_runtime': 93.3311, 'train_samples_per_second': 71.042, 'train_steps_per_second': 2.222, 'total_flos': 4.02963202792489e+16, 'train_loss': 1.500297857869056, 'epoch': 2.0, 'memory_allocated (GB)': 34.51, 'max_memory_allocated (GB)': 78.72, 'total_memory_available (GB)': 94.62})

Is it possible to merge these two notebooks so that I can just flip a flag if I want to use deepspeed? Most of the logic is identical, just some extra configs.
(Just a question, not blocking) Should we also allow full parameter finetuning instead of always using lora?

harborn · 2024-05-20T09:52:16Z

The examples look good! Just a few requests:

Clear the cell outputs, and just put a mock markdown cell with the important output info. For example, just this information:
train_result = TrainOutput(global_step=62, training_loss=1.500297857869056, metrics={'train_runtime': 93.3311, 'train_samples_per_second': 71.042, 'train_steps_per_second': 2.222, 'total_flos': 4.02963202792489e+16, 'train_loss': 1.500297857869056, 'epoch': 2.0, 'memory_allocated (GB)': 34.51, 'max_memory_allocated (GB)': 78.72, 'total_memory_available (GB)': 94.62})
Is it possible to merge these two notebooks so that I can just flip a flag if I want to use deepspeed? Most of the logic is identical, just some extra configs.

(Just a question, not blocking) Should we also allow full parameter finetuning instead of always using lora?

Hi, Updated according your comments:

removed unnecessary cell outputs, only keep the important final outputs
merge two notebooks into one, and the final one notebook can run different training method on different execution mode on HPU.
Yes, if not using LoRA training, just remove LoRA conversion when loading pre-trained model.

woshiyyya · 2024-05-22T21:09:04Z

Seems that @harborn addressed the comments. @justinvyu could you take a look again?

harborn · 2024-05-24T02:46:24Z

Seems that @harborn addressed the comments. @justinvyu could you take a look again?

@justinvyu please take a look again! Thanks.

justinvyu

Thanks for making the changes! One last comment then we can merge.

doc/source/train/examples.yml

harborn · 2024-05-29T01:36:55Z

@harborn I think you need to add back the orphan: True metadata: #44667 (comment)

Fixed.

Signed-off-by: Wu, Gangsheng <[email protected]>

…t#44667) Adds an example for fine-tuning Llama-2-7b/70b on multiple HPUs. --------- Signed-off-by: Wu, Gangsheng <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]>

…t#44667) Adds an example for fine-tuning Llama-2-7b/70b on multiple HPUs. --------- Signed-off-by: Wu, Gangsheng <[email protected]>

harborn requested review from matthewdeng, justinvyu, woshiyyya and a team as code owners April 11, 2024 08:30

harborn force-pushed the examples-hpu-llama branch from 2886045 to 5b74dfe Compare April 11, 2024 08:35

harborn changed the title ~~Add example of fine-tuning Llama-2 on Intel Gaudi~~ [Train] Add example of fine-tuning Llama-2 on Intel Gaudi Apr 11, 2024

jerome-habana reviewed Apr 11, 2024

View reviewed changes

doc/source/train/examples/intel_gaudi/llama.ipynb Show resolved Hide resolved

harborn marked this pull request as draft April 15, 2024 05:52

aslonnie reviewed Apr 26, 2024

View reviewed changes

harborn force-pushed the examples-hpu-llama branch 2 times, most recently from e1bd5a9 to fb0028f Compare April 28, 2024 06:36

harborn force-pushed the examples-hpu-llama branch from 570f385 to f5f3268 Compare May 8, 2024 02:45

harborn marked this pull request as ready for review May 8, 2024 02:51

woshiyyya self-assigned this May 8, 2024

woshiyyya reviewed May 10, 2024

View reviewed changes

harborn force-pushed the examples-hpu-llama branch 2 times, most recently from f76ccee to 1be2370 Compare May 13, 2024 09:55

woshiyyya mentioned this pull request May 13, 2024

[Core] Incorrectly detected TPU on a HPU-only node. #45302

Open

harborn force-pushed the examples-hpu-llama branch from 1be2370 to 5bca190 Compare May 14, 2024 05:03

woshiyyya reviewed May 14, 2024

View reviewed changes

doc/source/train/examples/intel_gaudi/llama_deepspeed.ipynb Outdated Show resolved Hide resolved

doc/source/train/examples/intel_gaudi/llama.ipynb Show resolved Hide resolved

woshiyyya approved these changes May 14, 2024

View reviewed changes

jerome-habana approved these changes May 15, 2024

View reviewed changes

justinvyu reviewed May 16, 2024

View reviewed changes

harborn force-pushed the examples-hpu-llama branch 2 times, most recently from 429cb50 to 98eb4df Compare May 20, 2024 09:47

justinvyu reviewed May 24, 2024

View reviewed changes

doc/source/train/examples.yml Outdated Show resolved Hide resolved

doc/source/train/examples.yml Outdated Show resolved Hide resolved

doc/source/train/examples.yml Outdated Show resolved Hide resolved

justinvyu added the go add ONLY when ready to merge, run all tests label May 29, 2024

justinvyu enabled auto-merge (squash) May 29, 2024 02:29

justinvyu merged commit 040b736 into ray-project:master May 29, 2024
8 checks passed

harborn added 23 commits May 29, 2024 09:25

Add examples of fine-tuning llama on Intel Gaudi

a62b3be

Signed-off-by: Wu, Gangsheng <[email protected]>

update

5579602

Signed-off-by: Wu, Gangsheng <[email protected]>

update

373acf1

Signed-off-by: Wu, Gangsheng <[email protected]>

update

5f454b8

Signed-off-by: Wu, Gangsheng <[email protected]>

update

ba95ac9

Signed-off-by: Wu, Gangsheng <[email protected]>

update

40411e9

Signed-off-by: Wu, Gangsheng <[email protected]>

update

c73e367

Signed-off-by: Wu, Gangsheng <[email protected]>

update

7a94ab6

Signed-off-by: Wu, Gangsheng <[email protected]>

update

b4fe0b1

Signed-off-by: Wu, Gangsheng <[email protected]>

update

bbc7aba

Signed-off-by: Wu, Gangsheng <[email protected]>

update

9d9d3dd

Signed-off-by: Wu, Gangsheng <[email protected]>

update

a974699

Signed-off-by: Wu, Gangsheng <[email protected]>

update

6d6682c

Signed-off-by: Wu, Gangsheng <[email protected]>

update

6c59665

Signed-off-by: Wu, Gangsheng <[email protected]>

update

208034c

Signed-off-by: Wu, Gangsheng <[email protected]>

fix comments

aef43e9

Signed-off-by: Wu, Gangsheng <[email protected]>

update performance reference

a3b6021

Signed-off-by: Wu, Gangsheng <[email protected]>

remove env vars

8a10358

Signed-off-by: Wu, Gangsheng <[email protected]>

update notebooks

b9c2dc3

Signed-off-by: Wu, Gangsheng <[email protected]>

merge two examples into one

6160961

Signed-off-by: Wu, Gangsheng <[email protected]>

update

a3d3a93

Signed-off-by: Wu, Gangsheng <[email protected]>

update examples.yml

215d48e

Signed-off-by: Wu, Gangsheng <[email protected]>

update

78572f3

Signed-off-by: Wu, Gangsheng <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Train] Add example of fine-tuning Llama-2 on Intel Gaudi #44667

[Train] Add example of fine-tuning Llama-2 on Intel Gaudi #44667

harborn commented Apr 11, 2024 •

edited

Loading

aslonnie left a comment

woshiyyya left a comment

woshiyyya commented May 14, 2024

justinvyu left a comment

harborn commented May 20, 2024

woshiyyya commented May 22, 2024 •

edited

Loading

harborn commented May 24, 2024

justinvyu left a comment

harborn commented May 29, 2024

[Train] Add example of fine-tuning Llama-2 on Intel Gaudi #44667

[Train] Add example of fine-tuning Llama-2 on Intel Gaudi #44667

Conversation

harborn commented Apr 11, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

aslonnie left a comment

Choose a reason for hiding this comment

woshiyyya left a comment

Choose a reason for hiding this comment

woshiyyya commented May 14, 2024

justinvyu left a comment

Choose a reason for hiding this comment

harborn commented May 20, 2024

woshiyyya commented May 22, 2024 • edited Loading

harborn commented May 24, 2024

justinvyu left a comment

Choose a reason for hiding this comment

harborn commented May 29, 2024

harborn commented Apr 11, 2024 •

edited

Loading

woshiyyya commented May 22, 2024 •

edited

Loading