Defining LLM Dataset types in Trainers or during Training Workflow #35766

mimipynb · 2025-01-18T15:37:12Z

Feature request

Hi,
Just wondering if its possible to define a way to define the dataset input labels mapping argument in the transformers.Trainer or something similar?

I understand that the input data labels are dependent on the type of training loss function used and problem for example, the type of defining attention mask ids for Mask learner vs Causal learners etc. Would also be extremely helpful if I can have suggestions on how I can check what dataset column labels the trainer/model will read? -- besides the help(model.forward) as this really just shows the model's inputs as it is into the learner.

Any suggestions would be greatly appreciated! thanks!

Motivation

This problem is giving me clinical depression

Your contribution

If there's any suggestions in solving this case, I would love to contribute

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-01-20T14:52:05Z

cc @muellerzr @SunMarc

SunMarc · 2025-01-20T16:24:16Z

Is there a specific issue or bug that you are tracking in your code @mimipynb ?

mimipynb added the Feature request Request for a new feature label Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defining LLM Dataset types in Trainers or during Training Workflow #35766

Defining LLM Dataset types in Trainers or during Training Workflow #35766

mimipynb commented Jan 18, 2025

Rocketknight1 commented Jan 20, 2025

SunMarc commented Jan 20, 2025

Defining LLM Dataset types in Trainers or during Training Workflow #35766

Defining LLM Dataset types in Trainers or during Training Workflow #35766

Comments

mimipynb commented Jan 18, 2025

Feature request

Motivation

Your contribution

Rocketknight1 commented Jan 20, 2025

SunMarc commented Jan 20, 2025