Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for AdapterTrainer to support saving entire model #531

Open
erzaliator opened this issue Apr 6, 2023 · 5 comments
Open

Request for AdapterTrainer to support saving entire model #531

erzaliator opened this issue Apr 6, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@erzaliator
Copy link

🚀 Feature request

Request for AdapterTrainer to support saving complete model

Motivation

AdapterTrainer's function _save() supports saving adapter modules and heads but does not support saving the pretrained model.

I encountered this issue while using the --load_best_model_at_end=True in the training arguments. I get a warning Could not locate the best model at runs/my_current_model/checkpoint-70/pytorch_model.bin, if you are running a distributed training on multiple nodes, you should activate --save_on_each_node.

This is due default AdapterTrainer only saving the adapters and heads and not the rest of the model weights.

So am I missing some arguments that can explicitly save a pytorch_model.bin file? Otherwise, I believe it would be nice to have an feature to save either adapter modules or the complete model in AdapterTrainer.

I am using Pfeiffer's MAD-X task and language adapters.

Your contribution

As of now, I am manually saving the best model using a callback to use model.save_pretrained() as follows (subsequently via load_pretrained best model is loaded manually):

`

  best_acc = -1
  class CustomCallback(TrainerCallback):
      def __init__(self, trainer) -> None:
          super().__init__()
          self._trainer = trainer

      def on_epoch_end(self, args, state, control, **kwargs):
          if control.should_evaluate:
            global best_acc, model, save_best_model_path
            print('USING HEAD: ', model.active_head)
            control_copy = deepcopy(control)
            output_metrics = self._trainer.evaluate(eval_dataset=self._trainer.train_dataset, metric_key_prefix="train@"+lang)
            acc = output_metrics['acc']
            if state.global_step < state.max_steps and best_acc<=acc:
                print('Saving the model using CustomCallback: ', save_best_model_path)
                model.save_pretrained(save_best_model_path, from_pt=True)
                best_acc = acc
            return control_copy

training_args = TrainingArguments(
        seed=SEED,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        logging_strategy="epoch",
        learning_rate=lr,
        num_train_epochs=epoch2,
        per_device_train_batch_size=batch_size2,
        per_device_eval_batch_size=batch_size2,
        output_dir=MODEL_DIR+'_'+lang,
        overwrite_output_dir=False,
        # The next line is important to ensure the dataset labels are properly passed to the model
        remove_unused_columns=False,
        save_total_limit=1,
        load_best_model_at_end=True,
        # resume_from_checkpoint=MODEL_DIR+'/last-checkpoint',
    )

trainer = AdapterTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset2,
    eval_dataset=valid_dataset2,
    compute_metrics=compute_metrics
    )

trainer.add_callback(CustomCallback(trainer))`
@erzaliator erzaliator added the enhancement New feature or request label Apr 6, 2023
@ghost
Copy link

ghost commented Apr 12, 2023

It's possible to save the entire model using the transformers library. However, when loading the model for inference, the default configuration does not include additional parameters (adapters). As a result, the model will only perform inference on the non-adapter parameters. I haven't tried it myself but you should attempt to inference the model while specifying a custom config and see if that works out.

@erzaliator
Copy link
Author

@Celestinian , on the contrary, from the library behavior, the adapter library saves only the adapter modules and heads but not bert weights. As a result, the model attempts to load only adapter weights. I suspect the trainer.state variable needs to be corrected to store both adapter and bert weights.

@Ch-rode
Copy link

Ch-rode commented May 31, 2023

@erzaliator Thank you for bringing this out. I'm sorry to hear that you're experiencing the same frustrating issue. However, when I attempted to use your suggested callback method, I encountered the following error:
AttributeError: 'str' object has no attribute 'evaluate'
@calpt do you have any suggestion?

@erzaliator
Copy link
Author

erzaliator commented Jun 16, 2023

@Ch-rode can you check if the value of Trainer is correct? It should not be initialized as a string object.

Additionally, these are the model details (complete code is here . It does use some custom python files as imports but maybe it could be helpful as reference. With my setup this code is able to save the model including the adapter modules and heads):

from transformers import AutoConfig, AutoAdapterModel
from transformers import AdapterConfig

lang1 = 'en'
dataset1 = 'eng.rst'
lang2 = 'de'
dataset2 = 'deu.rst'

config = AutoConfig.from_pretrained(
    BERT_MODEL,
)
model = AutoAdapterModel.from_pretrained(
    BERT_MODEL,
    config=config,
)

# Load the language adapter
lang_adapter_config = AdapterConfig.load("pfeiffer", reduction_factor=2)
model.load_adapter(lang1+"/wiki@ukp", config=lang_adapter_config)

# Add a new task adapter
model.add_adapter("disrpt")

# Add a classification head for our target task
num_labels = len(set(labels1.names))
head_name = "disrpt-"+dataset1.replace('.', '-')
print('Total prediction labels: ', num_labels)
model.add_classification_head(head_name, num_labels=num_labels)

# set trainable adapter
model.train_adapter(["disrpt"])

# Unfreeze and activate stack setup
lang = lang1
model.active_adapters = Stack(lang, "disrpt")
model.active_head = head_name
lang = dataset1

Additional info:
adapter-transformers==3.2.1
transformers==4.20.1
torch==1.12.1+cu102

@vabatta
Copy link

vabatta commented Jan 10, 2024

@erzaliator I'm missing a piece, are you saying it is possible to save a model with an adapter to be later loaded for inference without using the adapter library?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants