Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulty Defining CPU and GPU Machine Types in Kedro-Vertex (vertexai.yml) #128

Open
7pandeys opened this issue Nov 8, 2023 · 5 comments

Comments

@7pandeys
Copy link

7pandeys commented Nov 8, 2023

Problem:
I'm encountering difficulty in defining the CPU and GPU machine types with respect to nodes and pipelines in vertexai.yml within the Kedro-Vertex framework.

Expected Behavior:
I expect to be able to specify the CPU and GPU machine types for nodes and pipelines in vertex.yml to effectively utilize CPU and GPU resources as needed.

Current Behavior:
I've searched through the documentation and codebase but haven't found clear instructions on how to achieve this. This makes it challenging to optimize the resource utilization for my specific workflow.

Steps to Reproduce:

  1. Create a Kedro-Vertex project.
  2. Attempt to define CPU and GPU types for nodes and pipelines in vertexai.yml.
  3. Encounter difficulties or confusion in the process.

Additional Information:

  • I've reviewed the official documentation, but the guidance on this specific aspect seems to be lacking.
  • I've also searched for relevant examples or discussions on forums and GitHub issues but haven't found any direct solutions.

Environment:

  • Kedro version: 0.18.14
  • Kedro-Vertex version: 0.9.1
  • Python version: 3.8.18
  • Operating System: Mac

Suggested Solution:
It would be helpful to provide more detailed documentation or examples on how to define CPU and GPU machine types for nodes and pipelines in vertex.yml. Alternatively, if this feature is not yet supported, it would be great to know the current status and any workarounds.

Related links
https://github.com/getindata/kedro-vertexai/blob/develop/kedro_vertexai/config.py
https://kedro-vertexai.readthedocs.io/en/0.9.1/source/02_installation/02_configuration.html

Notes:
vertexai.yml is generated by command kedro vertexai init

This issue aims to improve resource management and clarity within Kedro-Vertex, making it easier for users to define CPU and GPU machine types for their nodes and pipelines. Your attention to this matter is greatly appreciated.

@7pandeys 7pandeys changed the title Difficulty Defining CPU and GPU Machine Types in Kedro-Vertex (vertex.yml) Difficulty Defining CPU and GPU Machine Types in Kedro-Vertex (vertexai.yml) Nov 8, 2023
@marrrcin
Copy link
Contributor

marrrcin commented Nov 9, 2023

Hi @7pandeys, thanks for raising the issue.

The Resources configuration section on the page you've linked has exactly the information about using GPUs. Initial configuration generated by kedro vertexai init also creates the vertexai.yml which contains an example of configuration for nodes with GPUs on Vertex AI.

We're open to improvements on that part - what do you propose?


Config generated by kedro vertexai init

resources:
# For nodes that require more RAM you can increase the "memory"
data_import_step:
memory: 4Gi
# Training nodes can utilize more than one CPU if the algoritm
# supports it
model_training:
cpu: 8
memory: 8Gi
gpu: 1
# Default settings for the nodes
__default__:
cpu: 1000m
memory: 2048Mi
node_selectors:
model_training:
cloud.google.com/gke-accelerator: NVIDIA_TESLA_T4

image

@7pandeys
Copy link
Author

7pandeys commented Nov 9, 2023

@marrrcin thanks for response.
Is there a specific parameter or syntax that allows us to specify the machine type or CPU type in the vertexai.yml configuration? If not, what would be the recommended approach to achieve this?

Related links
https://cloud.google.com/compute/docs/cpu-platforms
https://cloud.google.com/compute/docs/machine-resource

@marrrcin
Copy link
Contributor

Follow this guide, our plugin is fully compatible with this approach: https://cloud.google.com/vertex-ai/docs/pipelines/machine-types

@7pandeys
Copy link
Author

Follow this guide, our plugin is fully compatible with this approach: https://cloud.google.com/vertex-ai/docs/pipelines/machine-types

  1. Is it possible to define machine types directly within Kedro-VertexAI without relying on KFP?
  2. If not, are there plans or considerations for enabling this feature in future releases?
  3. Are there recommended workarounds or best practices for specifying machine types when not using KFP?

@marrrcin
Copy link
Contributor

I don't understand your questions. You can configure machine types as you want in vertexai.yml - the configuration in the plugin exposes the configuration available in native Vertex AI. That means that whatever you define in the vertexai.yml configuration file, it will be used in the plugin to set appropriate CPU/memory/GPU resources + node selectors on the Vertex AI side, you don't have to use KFP directly.

   resources: 
  
     # For nodes that require more RAM you can increase the "memory" 
     data_import_step: 
       memory: 4Gi 
  
     # Training nodes can utilize more than one CPU if the algoritm 
     # supports it 
     model_training: 
       cpu: 8 
       memory: 8Gi 
       gpu: 1 
  
     # Default settings for the nodes 
     __default__: 
       cpu: 1000m 
       memory: 2048Mi 
  
   node_selectors: 
     model_training: 
       cloud.google.com/gke-accelerator: NVIDIA_TESLA_T4 
  

I suggest you try to configure our plugin first, then see whether it works for you and whether it matches your requirements on that part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants