Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] ValueError: not allowed to raise maximum limit (rlimit) #110

Open
iamkhalidbashir opened this issue Jun 14, 2023 · 1 comment
Open
Labels
bug Something isn't working

Comments

@iamkhalidbashir
Copy link

iamkhalidbashir commented Jun 14, 2023

Describe the bug

Error while training:-

  • I tried with sudo same error
  • I am using docker image nvidia/cuda:11.7.0-base-ubuntu22.04
  • The default value of the docker container for command resource.getrlimit(resource.RLIMIT_NOFILE) is (1048576, 1048576)
| > stats_path:None
2023-06-14T07:29:43.025431079Z  | > base:10
2023-06-14T07:29:43.025437149Z  | > hop_length:256
2023-06-14T07:29:43.025444429Z  | > win_length:1024
2023-06-14T07:29:43.025450699Z  > initialization of speaker-embedding layers.
2023-06-14T07:29:43.025462919Z Traceback (most recent call last):
2023-06-14T07:29:43.025469199Z   File "/workspace/coqui-tts/train.py", line 320, in <module>
2023-06-14T07:29:43.025476859Z     trainer = Trainer(
2023-06-14T07:29:43.025484659Z   File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 405, in __init__
2023-06-14T07:29:43.025494939Z     self.use_cuda, self.num_gpus = self.setup_training_environment(args=args, config=config, gpu=gpu)
2023-06-14T07:29:43.025500099Z   File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 632, in setup_training_environment
2023-06-14T07:29:43.025543959Z     resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1]))
2023-06-14T07:29:43.025560229Z ValueError: not allowed to raise maximum limit

Due this line:

Trainer/trainer/trainer.py

Lines 653 to 660 in 9879d3d

if platform.system() != "Windows":
# https://github.com/pytorch/pytorch/issues/973
import resource # pylint: disable=import-outside-toplevel
rlimit = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1]))
# set and initialize Pytorch runtime

To Reproduce

  1. Install coqui-tts in nvidia/cuda:11.7.0-base-ubuntu22.04 docker container
  2. Try train vits model
  3. This error is throw (even with sudo)

Expected behavior

No errors

Logs

| > stats_path:None
2023-06-14T07:29:43.025431079Z  | > base:10
2023-06-14T07:29:43.025437149Z  | > hop_length:256
2023-06-14T07:29:43.025444429Z  | > win_length:1024
2023-06-14T07:29:43.025450699Z  > initialization of speaker-embedding layers.
2023-06-14T07:29:43.025462919Z Traceback (most recent call last):
2023-06-14T07:29:43.025469199Z   File "/workspace/coqui-tts/train.py", line 320, in <module>
2023-06-14T07:29:43.025476859Z     trainer = Trainer(
2023-06-14T07:29:43.025484659Z   File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 405, in __init__
2023-06-14T07:29:43.025494939Z     self.use_cuda, self.num_gpus = self.setup_training_environment(args=args, config=config, gpu=gpu)
2023-06-14T07:29:43.025500099Z   File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 632, in setup_training_environment
2023-06-14T07:29:43.025543959Z     resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1]))
2023-06-14T07:29:43.025560229Z ValueError: not allowed to raise maximum limit

Environment

{
    "CUDA": {
        "GPU": [
            "Tesla V100-FHHL-16GB"
        ],
        "available": true,
        "version": "11.7"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.0.1+cu117",
        "Trainer": "v0.0.20",
        "numpy": "1.22.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.6",
        "version": "#46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020"
    }
}

Additional context

No response

@iamkhalidbashir iamkhalidbashir added the bug Something isn't working label Jun 14, 2023
@SilvioGuedes
Copy link

Change
resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1]))

in /usr/local/lib/python3.10/dist-packages/trainer/trainer.py (line 632)

To:
resource.setrlimit(resource.RLIMIT_NOFILE, (4096, 4096))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants