Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bert:tensorflow:Error recorded from training_loop: Read less bytes than requested #1387

Open
MalaJeans opened this issue Jun 26, 2023 · 2 comments

Comments

@MalaJeans
Copy link

When running the bert example, the following error occurs:

ERROR:tensorflow:Error recorded from training_loop: Read less bytes than requested
[[node checkpoint_initializer_133 (defined at run_classifier.py:661) ]]

Original stack trace for 'checkpoint_initializer_133':
File "run_classifier.py", line 981, in
tf.app.run()
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "run_classifier.py", line 880, in main
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in _call_model_fn
config)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in _model_fn
features, labels, is_export_mode=is_export_mode)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu
return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in _call_model_fn
estimator_spec = self._model_fn(features=features, **kwargs)
File "run_classifier.py", line 661, in model_fn
tf.train.init_from_checkpoint(init_checkpoint, assignment_map)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 291, in init_from_checkpoint
init_from_checkpoint_fn)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1684, in merge_call
return self._merge_call(merge_fn, args, kwargs)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1691, in _merge_call
return merge_fn(self._strategy, *args, **kwargs)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 286, in
ckpt_dir_or_file, assignment_map)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 334, in _init_from_checkpoint
_set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 458, in _set_variable_or_list_initializer
_set_checkpoint_initializer(variable_or_list, ckpt_file, tensor_name, "")
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 412, in _set_checkpoint_initializer
ckpt_file, [tensor_name], [slice_spec], [base_type], name=name)[0]
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
name=name)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()

E0626 16:26:39.769550 139724583494016 error_handling.py:70] Error recorded from training_loop: Read less bytes than requested
[[node checkpoint_initializer_133 (defined at run_classifier.py:661) ]]

Original stack trace for 'checkpoint_initializer_133':
File "run_classifier.py", line 981, in
tf.app.run()
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "run_classifier.py", line 880, in main
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in _call_model_fn
config)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in _model_fn
features, labels, is_export_mode=is_export_mode)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu
return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in _call_model_fn
estimator_spec = self._model_fn(features=features, **kwargs)
File "run_classifier.py", line 661, in model_fn
tf.train.init_from_checkpoint(init_checkpoint, assignment_map)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 291, in init_from_checkpoint
init_from_checkpoint_fn)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1684, in merge_call
return self._merge_call(merge_fn, args, kwargs)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1691, in _merge_call
return merge_fn(self._strategy, *args, **kwargs)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 286, in
ckpt_dir_or_file, assignment_map)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 334, in _init_from_checkpoint
_set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 458, in _set_variable_or_list_initializer
_set_checkpoint_initializer(variable_or_list, ckpt_file, tensor_name, "")
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 412, in _set_checkpoint_initializer
ckpt_file, [tensor_name], [slice_spec], [base_type], name=name)[0]
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
name=name)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()

INFO:tensorflow:training_loop marked as finished
I0626 16:26:39.769962 139724583494016 error_handling.py:96] training_loop marked as finished
WARNING:tensorflow:Reraising captured error
W0626 16:26:39.770006 139724583494016 error_handling.py:130] Reraising captured error
Traceback (most recent call last):
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: Read less bytes than requested
[[{{node checkpoint_initializer_133}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run_classifier.py", line 981, in
tf.app.run()
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "run_classifier.py", line 880, in main
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
rendezvous.raise_errors()
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
six.reraise(typ, value, traceback)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/six.py", line 719, in reraise
raise value
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
saving_listeners)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1480, in _train_with_estimator_spec
log_step_count_steps=log_step_count_steps) as mon_sess:
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 584, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1007, in init
stop_grace_period_secs=stop_grace_period_secs)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 725, in init
self._sess = _RecoverableSession(self._coordinated_creator)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1200, in init
_WrappedSession.init(self, self._create_session())
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1205, in _create_session
return self._sess_creator.create_session()
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 871, in create_session
self.tf_sess = self._session_creator.create_session()
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 647, in create_session
init_fn=self._scaffold.init_fn)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 296, in prepare_session
sess.run(init_op, feed_dict=init_feed_dict)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: Read less bytes than requested
[[node checkpoint_initializer_133 (defined at run_classifier.py:661) ]]

Original stack trace for 'checkpoint_initializer_133':
File "run_classifier.py", line 981, in
tf.app.run()
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "run_classifier.py", line 880, in main
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in _call_model_fn
config)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in _model_fn
features, labels, is_export_mode=is_export_mode)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu
return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in _call_model_fn
estimator_spec = self._model_fn(features=features, **kwargs)
File "run_classifier.py", line 661, in model_fn
tf.train.init_from_checkpoint(init_checkpoint, assignment_map)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 291, in init_from_checkpoint
init_from_checkpoint_fn)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1684, in merge_call
return self._merge_call(merge_fn, args, kwargs)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1691, in _merge_call
return merge_fn(self._strategy, *args, **kwargs)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 286, in
ckpt_dir_or_file, assignment_map)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 334, in _init_from_checkpoint
_set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 458, in _set_variable_or_list_initializer
_set_checkpoint_initializer(variable_or_list, ckpt_file, tensor_name, "")
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 412, in _set_checkpoint_initializer
ckpt_file, [tensor_name], [slice_spec], [base_type], name=name)[0]
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
name=name)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()

My running parameters are as follows:
python run_classifier.py
--task_name=MRPC
--do_train=true
--do_eval=true
--data_dir=$GLUE_DIR/MRPC
--vocab_file=$BERT_BASE_DIR/vocab.txt
--bert_config_file=$BERT_BASE_DIR/bert_config.json
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt
--max_seq_length=128
--train_batch_size=32
--learning_rate=2e-5
--num_train_epochs=3.0
--output_dir=/tmp/mrpc_output/

Is there any solution?

@drosenbluth
Copy link

How are you ?

@MalaJeans
Copy link
Author

How are you ?

I downloaded a data set again and modified running parameters. It is OK now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants