You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While I am trying the training code with m4c_captioner model, I am getting the following error,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option datasets to textcaps
2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option model to m4c_captioner
2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option config to projects/m4c_captioner/configs/m4c_captioner/textcaps/defaults.yaml
2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option env.save_dir to ./save/m4c_captioner/defaults
2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option run_type to train_val
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_LOG_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_REPORT_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_TENSORBOARD_LOGDIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_WANDB_LOGDIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
2022-09-08T19:36:25 | mmf.utils.distributed: XLA Mode:False
2022-09-08T19:36:25 | mmf.utils.distributed: Distributed Init (Rank 0): tcp://localhost:14337
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
2022-09-08T19:36:25 | mmf.utils.distributed: XLA Mode:False
2022-09-08T19:36:25 | mmf.utils.distributed: Distributed Init (Rank 3): tcp://localhost:14337
2022-09-08T19:36:25 | mmf.utils.distributed: Initialized Host root1-Super-Server as Rank 3
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
2022-09-08T19:36:26 | mmf.utils.distributed: XLA Mode:False
2022-09-08T19:36:26 | mmf.utils.distributed: Distributed Init (Rank 1): tcp://localhost:14337
2022-09-08T19:36:26 | mmf.utils.distributed: XLA Mode:False
2022-09-08T19:36:26 | mmf.utils.distributed: Distributed Init (Rank 2): tcp://localhost:14337
2022-09-08T19:36:26 | mmf.utils.distributed: Initialized Host root1-Super-Server as Rank 1
2022-09-08T19:36:26 | mmf.utils.distributed: Initialized Host root1-Super-Server as Rank 2
2022-09-08T19:36:26 | mmf.utils.distributed: Initialized Host root1-Super-Server as Rank 0
2022-09-08T19:36:30 | mmf: Logging to: ./save/m4c_captioner/defaults/train.log
2022-09-08T19:36:30 | mmf_cli.run: Namespace(config_override=None, local_rank=None, opts=['datasets=textcaps', 'model=m4c_captioner', 'config=projects/m4c_captioner/configs/m4c_captioner/textcaps/defaults.yaml', 'env.save_dir=./save/m4c_captioner/defaults', 'run_type=train_val'])
2022-09-08T19:36:30 | mmf_cli.run: Torch version: 1.6.0
2022-09-08T19:36:30 | mmf.utils.general: CUDA Device 0 is: NVIDIA GeForce GTX 1080 Ti
2022-09-08T19:36:30 | mmf_cli.run: Using seed 30613796
2022-09-08T19:36:30 | mmf.trainers.mmf_trainer: Loading datasets
loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
2022-09-08T19:36:52 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training
2022-09-08T19:36:52 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training
2022-09-08T19:36:52 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training
2022-09-08T19:36:52 | mmf.trainers.mmf_trainer: Loading model
loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /home/root1/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f
Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.pooler.dense.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.10.attention.self.value.weight', 'cls.predictions.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'cls.predictions.decoder.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.pooler.dense.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.8.attention.output.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.11.attention.output.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.9.intermediate.dense.weight', 'cls.seq_relationship.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.output.dense.bias', 'cls.seq_relationship.weight', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.query.weight']
This IS expected if you are initializing TextBert from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing TextBert from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TextBert were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TextBert for predictions without further training.
2022-09-08T19:36:57 | mmf.trainers.mmf_trainer: Loading optimizer
2022-09-08T19:36:57 | mmf.trainers.mmf_trainer: Loading metrics
WARNING 2022-09-08T19:36:58 | py.warnings: /media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf/utils/distributed.py:412: UserWarning: No type for scheduler specified even though lr_scheduler is True, setting default to 'Pythia'
builtin_warn(*args, **kwargs)
WARNING 2022-09-08T19:36:58 | py.warnings: /media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf/utils/distributed.py:412: UserWarning: No type for scheduler specified even though lr_scheduler is True, setting default to 'Pythia'
builtin_warn(*args, **kwargs)
WARNING 2022-09-08T19:36:58 | py.warnings: /media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf/utils/distributed.py:412: UserWarning: scheduler attributes has no params defined, defaulting to {}.
builtin_warn(*args, **kwargs)
WARNING 2022-09-08T19:36:58 | py.warnings: /media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf/utils/distributed.py:412: UserWarning: scheduler attributes has no params defined, defaulting to {}.
builtin_warn(*args, **kwargs)
This IS expected if you are initializing TextBert from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing TextBert from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TextBert were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TextBert for predictions without further training.
Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.pooler.dense.weight', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.pooler.dense.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.3.output.LayerNorm.bias', 'cls.predictions.decoder.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'cls.seq_relationship.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'cls.predictions.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.11.output.LayerNorm.bias', 'cls.seq_relationship.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.3.attention.self.value.weight']
This IS expected if you are initializing TextBert from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing TextBert from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TextBert were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TextBert for predictions without further training.
Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'cls.predictions.decoder.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.pooler.dense.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.10.attention.self.key.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'cls.seq_relationship.weight', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.pooler.dense.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.weight']
This IS expected if you are initializing TextBert from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
Traceback (most recent call last):
File "/home/root1/anaconda3/envs/mmf/bin/mmf_run", line 8, in
sys.exit(run())
File "/media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf_cli/run.py", line 129, in run
nprocs=config.distributed.world_size,
File "/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 108, in join
(error_index, name)
Exception: process 0 terminated with signal SIGKILL
I followed some tips to increase the memory to 64Gb, set the num_worker to 0, and reduce the batch_size to 64, but it still doesn't work.Kindly help me to resolve this issue.
The text was updated successfully, but these errors were encountered:
Hi,
While I am trying the training code with m4c_captioner model, I am getting the following error,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence
MMF_USER_DIR,
some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.category=UserWarning,
2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option datasets to textcaps
2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option model to m4c_captioner
2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option config to projects/m4c_captioner/configs/m4c_captioner/textcaps/defaults.yaml
2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option env.save_dir to ./save/m4c_captioner/defaults
2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option run_type to train_val
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence
MMF_LOG_DIR,
some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.category=UserWarning,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence
MMF_REPORT_DIR,
some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.category=UserWarning,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence
MMF_TENSORBOARD_LOGDIR,
some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.category=UserWarning,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence
MMF_WANDB_LOGDIR,
some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.category=UserWarning,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence
MMF_USER_DIR,
some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.category=UserWarning,
2022-09-08T19:36:25 | mmf.utils.distributed: XLA Mode:False
2022-09-08T19:36:25 | mmf.utils.distributed: Distributed Init (Rank 0): tcp://localhost:14337
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence
MMF_USER_DIR,
some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.category=UserWarning,
2022-09-08T19:36:25 | mmf.utils.distributed: XLA Mode:False
2022-09-08T19:36:25 | mmf.utils.distributed: Distributed Init (Rank 3): tcp://localhost:14337
2022-09-08T19:36:25 | mmf.utils.distributed: Initialized Host root1-Super-Server as Rank 3
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence
MMF_USER_DIR,
some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.category=UserWarning,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence
MMF_USER_DIR,
some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.category=UserWarning,
2022-09-08T19:36:26 | mmf.utils.distributed: XLA Mode:False
2022-09-08T19:36:26 | mmf.utils.distributed: Distributed Init (Rank 1): tcp://localhost:14337
2022-09-08T19:36:26 | mmf.utils.distributed: XLA Mode:False
2022-09-08T19:36:26 | mmf.utils.distributed: Distributed Init (Rank 2): tcp://localhost:14337
2022-09-08T19:36:26 | mmf.utils.distributed: Initialized Host root1-Super-Server as Rank 1
2022-09-08T19:36:26 | mmf.utils.distributed: Initialized Host root1-Super-Server as Rank 2
2022-09-08T19:36:26 | mmf.utils.distributed: Initialized Host root1-Super-Server as Rank 0
2022-09-08T19:36:30 | mmf: Logging to: ./save/m4c_captioner/defaults/train.log
2022-09-08T19:36:30 | mmf_cli.run: Namespace(config_override=None, local_rank=None, opts=['datasets=textcaps', 'model=m4c_captioner', 'config=projects/m4c_captioner/configs/m4c_captioner/textcaps/defaults.yaml', 'env.save_dir=./save/m4c_captioner/defaults', 'run_type=train_val'])
2022-09-08T19:36:30 | mmf_cli.run: Torch version: 1.6.0
2022-09-08T19:36:30 | mmf.utils.general: CUDA Device 0 is: NVIDIA GeForce GTX 1080 Ti
2022-09-08T19:36:30 | mmf_cli.run: Using seed 30613796
2022-09-08T19:36:30 | mmf.trainers.mmf_trainer: Loading datasets
loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /home/root1/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /home/root1/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4
loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /home/root1/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79
loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /home/root1/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /home/root1/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4
loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /home/root1/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99
loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /home/root1/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /home/root1/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79
loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /home/root1/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79
loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /home/root1/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /home/root1/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4
loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /home/root1/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79
loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
2022-09-08T19:36:52 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training
2022-09-08T19:36:52 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training
2022-09-08T19:36:52 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training
2022-09-08T19:36:52 | mmf.trainers.mmf_trainer: Loading model
loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /home/root1/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f
Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.pooler.dense.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.10.attention.self.value.weight', 'cls.predictions.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'cls.predictions.decoder.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.pooler.dense.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.8.attention.output.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.11.attention.output.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.9.intermediate.dense.weight', 'cls.seq_relationship.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.output.dense.bias', 'cls.seq_relationship.weight', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.query.weight']
All the weights of TextBert were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TextBert for predictions without further training.
2022-09-08T19:36:57 | mmf.trainers.mmf_trainer: Loading optimizer
2022-09-08T19:36:57 | mmf.trainers.mmf_trainer: Loading metrics
WARNING 2022-09-08T19:36:58 | py.warnings: /media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf/utils/distributed.py:412: UserWarning: No type for scheduler specified even though lr_scheduler is True, setting default to 'Pythia'
builtin_warn(*args, **kwargs)
WARNING 2022-09-08T19:36:58 | py.warnings: /media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf/utils/distributed.py:412: UserWarning: No type for scheduler specified even though lr_scheduler is True, setting default to 'Pythia'
builtin_warn(*args, **kwargs)
WARNING 2022-09-08T19:36:58 | py.warnings: /media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf/utils/distributed.py:412: UserWarning: scheduler attributes has no params defined, defaulting to {}.
builtin_warn(*args, **kwargs)
WARNING 2022-09-08T19:36:58 | py.warnings: /media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf/utils/distributed.py:412: UserWarning: scheduler attributes has no params defined, defaulting to {}.
builtin_warn(*args, **kwargs)
loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /home/root1/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f
loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /home/root1/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f
loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /home/root1/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f
Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['bert.encoder.layer.8.attention.self.query.bias', 'cls.predictions.decoder.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'cls.seq_relationship.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.query.bias', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.output.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.pooler.dense.weight', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.pooler.dense.bias', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.10.output.dense.bias', 'cls.predictions.bias', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'cls.seq_relationship.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.8.output.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.key.weight']
All the weights of TextBert were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TextBert for predictions without further training.
Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.pooler.dense.weight', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.pooler.dense.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.3.output.LayerNorm.bias', 'cls.predictions.decoder.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'cls.seq_relationship.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'cls.predictions.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.11.output.LayerNorm.bias', 'cls.seq_relationship.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.3.attention.self.value.weight']
All the weights of TextBert were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TextBert for predictions without further training.
Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'cls.predictions.decoder.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.pooler.dense.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.10.attention.self.key.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'cls.seq_relationship.weight', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.pooler.dense.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.weight']
All the weights of TextBert were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TextBert for predictions without further training.
2022-09-08T19:37:01 | mmf.trainers.mmf_trainer: ===== Model =====
2022-09-08T19:37:01 | mmf.trainers.mmf_trainer: DistributedDataParallel(
(module): M4CCaptioner(
(text_bert): TextBert(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(30522, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(1): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(2): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
)
(text_bert_out_linear): Identity()
(obj_faster_rcnn_fc7): FinetuneFasterRcnnFpnFc7(
(lc): Linear(in_features=2048, out_features=2048, bias=True)
)
(linear_obj_feat_to_mmt_in): Linear(in_features=2048, out_features=768, bias=True)
(linear_obj_bbox_to_mmt_in): Linear(in_features=4, out_features=768, bias=True)
(obj_feat_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(obj_bbox_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(obj_drop): Dropout(p=0.1, inplace=False)
(ocr_faster_rcnn_fc7): FinetuneFasterRcnnFpnFc7(
(lc): Linear(in_features=2048, out_features=2048, bias=True)
)
(linear_ocr_feat_to_mmt_in): Linear(in_features=3002, out_features=768, bias=True)
(linear_ocr_bbox_to_mmt_in): Linear(in_features=4, out_features=768, bias=True)
(ocr_feat_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(ocr_bbox_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(ocr_drop): Dropout(p=0.1, inplace=False)
(mmt): MMT(
(prev_pred_embeddings): PrevPredEmbeddings(
(position_embeddings): Embedding(100, 768)
(token_type_embeddings): Embedding(5, 768)
(ans_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(ocr_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(emb_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(emb_dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(1): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(2): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(3): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
)
(ocr_ptr_net): OcrPtrNet(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
)
(classifier): ClassifierLayer(
(module): Linear(in_features=768, out_features=6736, bias=True)
)
(losses): Losses(
(losses): ModuleList(
(0): MMFLoss(
(loss_criterion): M4CDecodingBCEWithMaskLoss()
)
)
)
)
)
2022-09-08T19:37:01 | mmf.utils.general: Total Parameters: 92185168. Trained Parameters: 92185168
2022-09-08T19:37:01 | mmf.trainers.core.training_loop: Starting training...
2022-09-08T19:37:01 | mmf.datasets.processors.processors: Loading fasttext model now from /home/root1/.cache/torch/mmf/wiki.en.bin
Traceback (most recent call last):
File "/home/root1/anaconda3/envs/mmf/bin/mmf_run", line 8, in
sys.exit(run())
File "/media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf_cli/run.py", line 129, in run
nprocs=config.distributed.world_size,
File "/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 108, in join
(error_index, name)
Exception: process 0 terminated with signal SIGKILL
I followed some tips to increase the memory to 64Gb, set the num_worker to 0, and reduce the batch_size to 64, but it still doesn't work.Kindly help me to resolve this issue.
The text was updated successfully, but these errors were encountered: