Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining LLM Dataset types in Trainers or during Training Workflow #35766

Open
mimipynb opened this issue Jan 18, 2025 · 2 comments
Open

Defining LLM Dataset types in Trainers or during Training Workflow #35766

mimipynb opened this issue Jan 18, 2025 · 2 comments
Labels
Feature request Request for a new feature

Comments

@mimipynb
Copy link

Feature request

Hi,
Just wondering if its possible to define a way to define the dataset input labels mapping argument in the transformers.Trainer or something similar?

I understand that the input data labels are dependent on the type of training loss function used and problem for example, the type of defining attention mask ids for Mask learner vs Causal learners etc. Would also be extremely helpful if I can have suggestions on how I can check what dataset column labels the trainer/model will read? -- besides the help(model.forward) as this really just shows the model's inputs as it is into the learner.

Any suggestions would be greatly appreciated! thanks!

Motivation

This problem is giving me clinical depression

Your contribution

If there's any suggestions in solving this case, I would love to contribute

@mimipynb mimipynb added the Feature request Request for a new feature label Jan 18, 2025
@Rocketknight1
Copy link
Member

cc @muellerzr @SunMarc

@SunMarc
Copy link
Member

SunMarc commented Jan 20, 2025

Is there a specific issue or bug that you are tracking in your code @mimipynb ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

3 participants