You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While training the model, I encountered the following error:
/content/piper/src/python
2024-12-12 19:11:04.493620: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-12 19:11:04.528286: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-12 19:11:04.538373: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-12 19:11:04.564028: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-12-12 19:11:06.393035: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
DEBUG:piper_train:Namespace(dataset_dir='/content/drive/MyDrive/colab/piper/jonadubbing', checkpoint_epochs=5, quality='medium', resume_from_single_speaker_checkpoint=None, logger=True, enable_checkpointing=True, default_root_dir=None, gradient_clip_val=None, gradient_clip_algorithm=None, num_nodes=1, num_processes=None, devices='1', gpus=None, auto_select_gpus=False, tpu_cores=None, ipus=None, enable_progress_bar=True, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=None, max_epochs=9999, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, limit_train_batches=None, limit_val_batches=None, limit_test_batches=None, limit_predict_batches=None, val_check_interval=None, log_every_n_steps=1000, accelerator='gpu', strategy=None, sync_batchnorm=False, precision=32, enable_model_summary=True, weights_save_path=None, num_sanity_val_steps=2, resume_from_checkpoint='/content/epoch=9999-step=1753600.ckpt', profiler=None, benchmark=None, deterministic=None, reload_dataloaders_every_n_epochs=0, auto_lr_find=False, replace_sampler_ddp=True, detect_anomaly=False, auto_scale_batch_size=False, plugins=None, amp_backend='native', amp_level=None, move_metrics_to_cpu=False, multiple_trainloader_mode='max_size_cycle', batch_size=12, validation_split=0.0, num_test_examples=0, max_phoneme_ids=None, hidden_channels=192, inter_channels=192, filter_channels=768, n_layers=6, n_heads=2, seed=1234, num_ckpt=0, save_last=True)
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py:52: LightningDeprecationWarning: Setting Trainer(resume_from_checkpoint=) is deprecated in v1.5 and will be removed in v1.7. Please pass Trainer.fit(ckpt_path=) directly instead.
rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
DEBUG:piper_train:Checkpoints will be saved every 5 epoch(s)
DEBUG:piper_train:0 Checkpoints will be saved
DEBUG:vits.dataset:Loading dataset: /content/drive/MyDrive/colab/piper/jonadubbing/dataset.jsonl
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py:731: LightningDeprecationWarning: trainer.resume_from_checkpoint is deprecated in v1.5 and will be removed in v2.0. Specify the fit checkpoint path with trainer.fit(ckpt_path=) instead.
ckpt_path = ckpt_path or self.resume_from_checkpoint
Restoring states from the checkpoint path at /content/epoch=9999-step=1753600.ckpt
DEBUG:fsspec.local:open file: /content/epoch=9999-step=1753600.ckpt
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py:1659: UserWarning: Be aware that when using ckpt_path, callbacks used to create the checkpoint need to be provided during Trainer instantiation. Please add the following callbacks: ["ModelCheckpoint{'monitor': None, 'mode': 'min', 'every_n_train_steps': 0, 'every_n_epochs': 10, 'train_time_interval': None, 'save_on_train_epoch_end': True}"].
rank_zero_warn(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
DEBUG:fsspec.local:open file: /content/drive/MyDrive/colab/piper/jonadubbing/lightning_logs/version_11/hparams.yaml
Restored all states from the checkpoint file at /content/epoch=9999-step=1753600.ckpt
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/data.py:153: UserWarning: Total length of DataLoader across ranks is zero. Please make sure this was your intention.
rank_zero_warn( Trainer.fit stopped: max_epochs=9999 reached.
The text was updated successfully, but these errors were encountered:
Hi,
That means you will need to set pax epochs greater than 10k because the model was finetuned with 10k epochs.
Anyway, we are preparing things to make a "massive" upgrade to the trainer to support newest dependencies.
Cheers.
Hi, That means you will need to set pax epochs greater than 10k because the model was finetuned with 10k epochs. Anyway, we are preparing things to make a "massive" upgrade to the trainer to support newest dependencies. Cheers.
Hi, if I set it to 10k, it gives the same result. I haven’t tried more than 10k. I might just wait for the update and create a better dataset in the meantime. If you’re open to suggestions, I suggest adding an option to export directly within the same notebook because it can sometimes be confusing to use multiple notebooks.
While training the model, I encountered the following error:
/content/piper/src/python
2024-12-12 19:11:04.493620: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-12 19:11:04.528286: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-12 19:11:04.538373: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-12 19:11:04.564028: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-12-12 19:11:06.393035: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
DEBUG:piper_train:Namespace(dataset_dir='/content/drive/MyDrive/colab/piper/jonadubbing', checkpoint_epochs=5, quality='medium', resume_from_single_speaker_checkpoint=None, logger=True, enable_checkpointing=True, default_root_dir=None, gradient_clip_val=None, gradient_clip_algorithm=None, num_nodes=1, num_processes=None, devices='1', gpus=None, auto_select_gpus=False, tpu_cores=None, ipus=None, enable_progress_bar=True, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=None, max_epochs=9999, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, limit_train_batches=None, limit_val_batches=None, limit_test_batches=None, limit_predict_batches=None, val_check_interval=None, log_every_n_steps=1000, accelerator='gpu', strategy=None, sync_batchnorm=False, precision=32, enable_model_summary=True, weights_save_path=None, num_sanity_val_steps=2, resume_from_checkpoint='/content/epoch=9999-step=1753600.ckpt', profiler=None, benchmark=None, deterministic=None, reload_dataloaders_every_n_epochs=0, auto_lr_find=False, replace_sampler_ddp=True, detect_anomaly=False, auto_scale_batch_size=False, plugins=None, amp_backend='native', amp_level=None, move_metrics_to_cpu=False, multiple_trainloader_mode='max_size_cycle', batch_size=12, validation_split=0.0, num_test_examples=0, max_phoneme_ids=None, hidden_channels=192, inter_channels=192, filter_channels=768, n_layers=6, n_heads=2, seed=1234, num_ckpt=0, save_last=True)
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py:52: LightningDeprecationWarning: Setting
Trainer(resume_from_checkpoint=)
is deprecated in v1.5 and will be removed in v1.7. Please passTrainer.fit(ckpt_path=)
directly instead.rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
DEBUG:piper_train:Checkpoints will be saved every 5 epoch(s)
DEBUG:piper_train:0 Checkpoints will be saved
DEBUG:vits.dataset:Loading dataset: /content/drive/MyDrive/colab/piper/jonadubbing/dataset.jsonl
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py:731: LightningDeprecationWarning:
trainer.resume_from_checkpoint
is deprecated in v1.5 and will be removed in v2.0. Specify the fit checkpoint path withtrainer.fit(ckpt_path=)
instead.ckpt_path = ckpt_path or self.resume_from_checkpoint
Restoring states from the checkpoint path at /content/epoch=9999-step=1753600.ckpt
DEBUG:fsspec.local:open file: /content/epoch=9999-step=1753600.ckpt
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py:1659: UserWarning: Be aware that when using
ckpt_path
, callbacks used to create the checkpoint need to be provided duringTrainer
instantiation. Please add the following callbacks: ["ModelCheckpoint{'monitor': None, 'mode': 'min', 'every_n_train_steps': 0, 'every_n_epochs': 10, 'train_time_interval': None, 'save_on_train_epoch_end': True}"].rank_zero_warn(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
DEBUG:fsspec.local:open file: /content/drive/MyDrive/colab/piper/jonadubbing/lightning_logs/version_11/hparams.yaml
Restored all states from the checkpoint file at /content/epoch=9999-step=1753600.ckpt
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/data.py:153: UserWarning: Total length of
DataLoader
across ranks is zero. Please make sure this was your intention.rank_zero_warn(
Trainer.fit
stopped:max_epochs=9999
reached.The text was updated successfully, but these errors were encountered: