Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Train] Add example of fine-tuning Llama-2 on Intel Gaudi #44667

Merged
merged 23 commits into from
May 29, 2024

Conversation

harborn
Copy link
Contributor

@harborn harborn commented Apr 11, 2024

Why are these changes needed?

To leverage the potential of Intel Gaudi accelerator, we extend Ray Train's capabilities by adding support for Intel Gaudi (HPU) hardware. This PR include an example for fine-tuning Llama-2-7b on multi HPUs.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@harborn harborn changed the title Add example of fine-tuning Llama-2 on Intel Gaudi [Train] Add example of fine-tuning Llama-2 on Intel Gaudi Apr 11, 2024
@harborn harborn marked this pull request as draft April 15, 2024 05:52
Copy link
Collaborator

@aslonnie aslonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let docs reviewer know when this is ready for review.

@harborn harborn force-pushed the examples-hpu-llama branch 2 times, most recently from e1bd5a9 to fb0028f Compare April 28, 2024 06:36
@harborn harborn marked this pull request as ready for review May 8, 2024 02:51
@woshiyyya woshiyyya self-assigned this May 8, 2024
Copy link
Member

@woshiyyya woshiyyya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contributing these two high-quality examples @harborn ! Great demonstration of using LoRA and deepspeed ZERO-3 on Gaudi + Ray for fine-tuning.

Left some comments.

doc/source/train/examples.yml Outdated Show resolved Hide resolved
doc/source/train/examples.yml Outdated Show resolved Hide resolved
doc/source/train/examples/intel_gaudi/llama.ipynb Outdated Show resolved Hide resolved
doc/source/train/examples/intel_gaudi/llama.ipynb Outdated Show resolved Hide resolved
doc/source/train/examples/intel_gaudi/llama.ipynb Outdated Show resolved Hide resolved
doc/source/train/examples/intel_gaudi/llama.ipynb Outdated Show resolved Hide resolved
@woshiyyya
Copy link
Member

@justinvyu Can you take a look and merge it?

Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples look good! Just a few requests:

  1. Clear the cell outputs, and just put a mock markdown cell with the important output info. For example, just this information:
train_result = TrainOutput(global_step=62, training_loss=1.500297857869056, metrics={'train_runtime': 93.3311, 'train_samples_per_second': 71.042, 'train_steps_per_second': 2.222, 'total_flos': 4.02963202792489e+16, 'train_loss': 1.500297857869056, 'epoch': 2.0, 'memory_allocated (GB)': 34.51, 'max_memory_allocated (GB)': 78.72, 'total_memory_available (GB)': 94.62})
  1. Is it possible to merge these two notebooks so that I can just flip a flag if I want to use deepspeed? Most of the logic is identical, just some extra configs.
  2. (Just a question, not blocking) Should we also allow full parameter finetuning instead of always using lora?

@harborn harborn force-pushed the examples-hpu-llama branch 2 times, most recently from 429cb50 to 98eb4df Compare May 20, 2024 09:47
@harborn
Copy link
Contributor Author

harborn commented May 20, 2024

The examples look good! Just a few requests:

  1. Clear the cell outputs, and just put a mock markdown cell with the important output info. For example, just this information:
train_result = TrainOutput(global_step=62, training_loss=1.500297857869056, metrics={'train_runtime': 93.3311, 'train_samples_per_second': 71.042, 'train_steps_per_second': 2.222, 'total_flos': 4.02963202792489e+16, 'train_loss': 1.500297857869056, 'epoch': 2.0, 'memory_allocated (GB)': 34.51, 'max_memory_allocated (GB)': 78.72, 'total_memory_available (GB)': 94.62})
  1. Is it possible to merge these two notebooks so that I can just flip a flag if I want to use deepspeed? Most of the logic is identical, just some extra configs.
  2. (Just a question, not blocking) Should we also allow full parameter finetuning instead of always using lora?

Hi, Updated according your comments:

  1. removed unnecessary cell outputs, only keep the important final outputs
  2. merge two notebooks into one, and the final one notebook can run different training method on different execution mode on HPU.
  3. Yes, if not using LoRA training, just remove LoRA conversion when loading pre-trained model.

@woshiyyya
Copy link
Member

woshiyyya commented May 22, 2024

Seems that @harborn addressed the comments. @justinvyu could you take a look again?

@harborn
Copy link
Contributor Author

harborn commented May 24, 2024

Seems that @harborn addressed the comments. @justinvyu could you take a look again?

@justinvyu please take a look again! Thanks.

Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes! One last comment then we can merge.

doc/source/train/examples.yml Outdated Show resolved Hide resolved
doc/source/train/examples.yml Outdated Show resolved Hide resolved
doc/source/train/examples.yml Outdated Show resolved Hide resolved
@harborn
Copy link
Contributor Author

harborn commented May 29, 2024

@harborn I think you need to add back the orphan: True metadata: #44667 (comment)

Fixed.

@justinvyu justinvyu added the go add ONLY when ready to merge, run all tests label May 29, 2024
@justinvyu justinvyu enabled auto-merge (squash) May 29, 2024 02:29
@justinvyu justinvyu merged commit 040b736 into ray-project:master May 29, 2024
8 checks passed
harborn added 23 commits May 29, 2024 09:25
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 6, 2024
…t#44667)

Adds an example for fine-tuning Llama-2-7b/70b on multiple HPUs.

---------

Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 6, 2024
…t#44667)

Adds an example for fine-tuning Llama-2-7b/70b on multiple HPUs.

---------

Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 7, 2024
…t#44667)

Adds an example for fine-tuning Llama-2-7b/70b on multiple HPUs.

---------

Signed-off-by: Wu, Gangsheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants