Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MacOs: MPS backend out of memory #84

Open
DanBigi opened this issue Sep 16, 2024 · 26 comments
Open

MacOs: MPS backend out of memory #84

DanBigi opened this issue Sep 16, 2024 · 26 comments
Assignees

Comments

@DanBigi
Copy link

DanBigi commented Sep 16, 2024

Good evening,
I have tried looking for a solution in previous discussions issues and threads, with no luck - it could also be that I'm ignorant and could not recognize the issue and/or the solution in different configurations, tbh; I'm using noScribe v.0.50 on a MacBook with 8GB RAM and Sonoma 14.1 (I also installed 'rosetta', while trying the advice found here and there), but when I try to feed noScribe with a .mp3 interview file, even only its first 20', I always get this error:

" PyAnnote error: MPS backend out of memory (MPS allocated: 5.71 GB, other allocations: 3.50 GB, max allowed: 9.07 GB). Tried to allocate 7.83 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

Errore nella fase 2 - identificazione degli speaker. "

The last line should translate to "error in phase 2 - speaker identification"; to complete the specification I used, I am feeding an interview between two people (and possibly a cat), mode "precise", "auto" for the speakers' relevation, and with the overlapping speech and timestamps opted in. I guess these specifications are heavy for noScribe, but I can't really change them since I am working with interviews; I could definitely give it more time, but with this error noScribe stops so I can't feed it interviews over the night.

I had found in other threads the suggestion to change the configuration so that noScribe uses the cpu instead of the mps, but (1) I can't find the config.yml file for the life of me, and even worse (2) I'm not sure that the computer is not already using the cpu, since, when I keep the Task Manager open in order to check what's going on, I can see that although it doesn't start from zero it certainly does reach the 8GB it can work with... This last sentence may give an idea of my ignorance on the topic; I do nonetheless want to get my PhD sooner or later, so any help or guidance would be greatly appreciated! Thanks in advance!

@kaixxx
Copy link
Owner

kaixxx commented Sep 17, 2024

You are on the right track: Changing pyannote_xpu to cpu in the file config.yml should solve the issue. You should find the file config.yml in /Users/YOUR_USERNAME/Library/Application Support/noScribe (I'm not on a Mac so I cannot verify this). Make sure to close noScribe before editing this file.
If you also get an error during transcription, you can try to also change the value whisper_xpu to cpu in the config.yml.
Hope this helps with your PhD 👍

@gernophil
Copy link
Collaborator

Hey @DanBigi,
yes the path @kaixxx mentioned is correct. There should be a config.yml that should have the options you need. However, without MPS it's gonna take hours (sometimes even over night) to transcribe. That issue was introduced by PyTorch when they updated their MPS compatibility. Before that update it was almost unusable due to wrong transcription. Unfortunately, after the update 8GB isn't enough. There is the mentioned workaround with the PYTORCH_MPS_HIGH_WATERMARK_RATIO, but that is not suitable for a global solution and according to some references it's dangerous to use (whatever that means). If you have any experience in Python execution and command line, if would suggest to try and get it working from source. Maybe there is also a way to start the app with additional env variables, but I have to look into this first.

@gernophil
Copy link
Collaborator

I was just reminded that this issue should have been solved in PyTorch 2.3, which is the version we use. Seems like it hasn't. However, we are now at PyTorch 2.4.1 so maybe this version solves it. I'll try to test it or compile a pre-release to check on that.

@gernophil
Copy link
Collaborator

I uploaded an new version with PyTorch 2.4.1 (noScribe 0.5.0b). Could you test, if this solves the issue, @DanBigi?

@DanBigi
Copy link
Author

DanBigi commented Sep 17, 2024

Screenshot 2024-09-17 alle 18 16 21 Hello, thanks a lot for your very quick reply and for the support; from yesterday, I tried re-installing the noScribe package I already had since I had now installed rosetta 2 as well, but it didn't solve the issue, which means that I ran into the same MPS backend memory error after feeding the same first 30' of the interview, with "spanish", quality "precise", "auto" for the relevation of the speakers (I'm using this for fear of any cat's noise), and "overlapping speech" and "timestamps" opted in.

I have tried looking through the mentioned path (in my case, /Users/Daniele, and nice to meet both of you by the way) and I can't find the config.yml file, and there could be a problem here as well, I guess, because there is not a "Library" folder inside: the only "Library" folder I could find lies at the same level as the "Users" one, and when I look inside of it I do find an "Application Support" folder, which doesn't have a "noScribe" folder inside - I am attaching a screenshot to show you. To be sure, there is no "noScribe" folder in the "Library" nor in the "Daniele" folder, so I'm really at a loss for the config.yml file. I am specifying this in case it helps with troubleshooting for another (inexperienced and/or ignorant) MacOs user like me.

If you have any experience in Python execution and command line, if would suggest to try and get it working from source.

I'd be glad to do this, if only I knew where to modify the code..! I have no experience with Python; I have worked with LaTeX and R in the past, but they could be completely different languages and/or systems, so I don't know if my little experience can help. Just to be sure, what do you mean when you write "from source"? Is it a reference to the way noScribe can be executed through the installer, instead of opening its file from the "Application" folder?

Coming to your last reply, @gernophil, I'm currently downloading the 0.5.0b version and I'll try the same file with the same specifications to see if it works, and I'll let you know by tomorrow afternoon at the latest; in the meantime, if there is any other thing that is within my capabilities and that could be useful for you, I'd be glad to help, especially considering your vey quick reply - thank you again for this, I really appreciate it :)

...and of course you're already cited in my thesis, I'll let you know if it will ever see the light :)

@gernophil
Copy link
Collaborator

gernophil commented Sep 17, 2024

Hey @DanBigi,
I am on my phone right now, so I'll only answer quickly. Thanks for your nice reply first :).
The Library folder is hidden within you user folder. Just hit Command+Shift+Dot to see the hidden files and folders. But first try 0.5.0b, before editing the config.yml.

I'll come back to you, if I am on my Mac again.

@DanBigi
Copy link
Author

DanBigi commented Sep 17, 2024

That's great @gernophil, will definitely do :) Also thank you for the heads-up on the hidden folders: I think it was a Finder option in some older MacOS and couldn't find it on Sonoma, but thanks to you now I can see the famed "config.yml" file, yay!

@MonikaBarget
Copy link

@kaixxx : not sure if your latest fixes also go for MAC installers, but it worked! 🥇

Here is what I now have in the config file that the tool autogenerates for me - no manual editing required:

app_version: '0.5'
auto_edit_transcript: 'True'
auto_save: 'True'
check_for_update: 'True'
last_filetype: html
last_language: auto
last_overlapping: 1
last_pause: 1sec+
last_quality: precise
last_speaker: auto
last_timestamps: 0
locale: nl
pause_seconds_marker: .
pyannote_xpu: cpu
timestamp_color: '#78909C'
timestamp_interval: 60000
voice_activity_detection_threshold: '0.5'
whisper_fast_beam_size: 1
whisper_fast_compute_type: default
whisper_fast_temperature: 0.0
whisper_precise_beam_size: 1
whisper_precise_compute_type: default
whisper_precise_temperature: 0.0
whisper_xpu: cpu

@kaixxx
Copy link
Owner

kaixxx commented Sep 18, 2024

not sure if your latest fixes also go for MAC installers

No, your issue was different, only related to Windows. Still: Nice to hear that it works now!

@gernophil
Copy link
Collaborator

gernophil commented Sep 18, 2024

Yes, setting whisper_xpu and pyannote_xpu to cpu will fix this error, but at the cost of a lot of speed. The latest fix in the macOS installer just pushed pytorch to a newer version, which I hoped would solve this issue for:

whisper_xpu: mps
pyannote_xpu: mps

If you received the MPS backend out of memory error before, could you also try it again with both set to mps (or simply delete the config.yml, because it should identify your device as mps?

EDIT: Oh, sorry, maybe I am a bit confused. Did you actually test noScribe on a Mac at all?

@kaixxx
Copy link
Owner

kaixxx commented Sep 18, 2024

Did you actually test noScribe on a Mac at all?

The issue of @MonikaBarget was somewhat similar, but only related to Windows and CUDA: #79 (comment)

@DanBigi
Copy link
Author

DanBigi commented Sep 18, 2024

In the meantime, yesterday night I tried out the noScribe 0.5.0b version and I ran into the same MPS backend memory error (same specification as above), and although the message says that the error was incurred into during the identification of the speakers, if one looks at noScribe working it actually happens during the next phase, the embeddings one, so I was wondering: should I modify the config.yml file now? Also, would it be possible to only move the "embeddings" process to the CPU, in order to leave the rest of the transcription on the MPS so as to lose less speed than moving the entire process on the CPU?

@gernophil
Copy link
Collaborator

Actually, this is something @kaixxx has to answer :). But I think we only can set cpu for the whole whisper and the whole pyannote step, but not for the sub steps individually.
What you definitely to is testing with whisper_xpu: cpu or pyannote_xpu: cpu.

@kaixxx
Copy link
Owner

kaixxx commented Sep 18, 2024

I think we only can set cpu for the whole whisper and the whole pyannote step

Yes, that's right.

The "embedding" step is part of the speaker identification process with pyannote. So you must set pyannote_xpu to cpu. But maybe you can leave whisper_cpu on mps and still get a faster transcription. Would be worth a try.

@DanBigi
Copy link
Author

DanBigi commented Sep 19, 2024

Everyone, today's update: I have modified the config.yml file in order for pyannote to use the CPU instead of the MPS and noScribe now works, or at least it didn't run into the MPS memory error yet - the "embeddings" phase has gone beyond the percentages where it stopped in the previous days, and the notebook is more responsive than it was with the MPS on the rest of the things it does in the mean time; on the other hand, noScribe has become very slow, so slow that, after more than two hours, it still hasn't reached 50% of the embeddings. If there is no alternative to leaving the whole pyannote part to operate on the CPU to try out on a MacOs, I think I will leave it working for the next nights and let you know if I find any other way to speed up the process. Thanks a lot for the support until now, you've already been very very useful :)

@gernophil
Copy link
Collaborator

Good to know it works now. Unfortunately, this drastic speed difference is expected when switching from MPS to CPU. Same on windows, if you compare CUDA vs. CPU. This MPS out of memory error is really annoying, but unfortunately, this can only be solved by the PyTorch people. This only happens with 8GB RAM. It never happened on my 16GB machine.
Maybe we can split the process a little more, so we can make more use of MPS, @kaixxx?

@gernophil
Copy link
Collaborator

What you could try is setting the PYTORCH_MPS_HIGH_WATERMARK_RATIO before starting noScribe like this:

  1. open Terminal.app
  2. type: PYTORCH_MPS_HIGH_WATERMARK_RATIO=2.0 /Applications/noScribe.app/Contents/MacOS/noScribe

This comment suggests that the default is 1.4, so setting is between 1.5 and 2.0 should allow you to use more memory. As the comment says:

If 2.0 isn't enough setting it 0.0 will allow torch to use as much memory as needed, but all the memory usage in total (including other applications) over 8GB will come from swap. This will increase wear and tear on your system SSD (I'm assuming your on an Apple Silicon Mac) and could potentially crash the OS.

Having said that I've used it a fair bit set to 0.0 on my 8GB M1 and its not caused a system crash since they added the watermark level system to pytorch

I never tried running an app with setting an environment variable before, but to my understanding it should work.

@DanBigi
Copy link
Author

DanBigi commented Sep 24, 2024

Everyone, I hope you’ve been well the last few days. I have news: first of all, I modified the config.yml file in order for pyannote to use the CPU instead of the MPS and noScribe now works, or at least it didn't run into the MPS memory error yet - the "embeddings" phase has gone beyond the percentages where it stopped in the previous days, and the notebook is more responsive than it was with the MPS on the rest of the things it does in the mean time; on the other hand, noScribe has become very slow, so slow that, after more than two hours, it still hadn't reached 50% of the embeddings (always for the first 30’ of the same interview), so I stopped the transcription because I had other things to do. Next, I tried launching noScribe through the terminal by inputting the line:
PYTORCH_MPS_HIGH_WATERMARK_RATIO=2.0 /Applications/noScribe.app/Contents/MacOS/noScribe
and this is also is a viable solution, as noScribe didn’t run into the MPS Backend memory error; since this was also taking a very long time I modified back the config.yml file in order for pyannote to use the MPS instead of the CPU and this also worked, but didn’t solve the (new) problem of the transcription’s time: I have ran it tonight, and it took 336 minutes to transcribe the same 30’, even with the MPS and the High watermark ratio set to 2.0; I now have the doubt whether the Terminal line setting is compatible with pyannote using the MPS, because the transcription slowed down instead of speeding up as it should (shouldn’t it?). I have the logs of the interrupted and finished transcriptions, I’m willing to share them if they can be useful for you, and in the mean time I can highlight these lines, particularly the last two:

pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.

torchaudio.set_audio_backend("soundfile")

torchvision is not available - cannot save figures

Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.4.0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../Applications/noScribe.app/Contents/Frameworks/models/pytorch_model.bin`

Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.

Model was trained with torch 1.10.0+cu102, yours is 2.4.1. Bad things might happen unless you revert torch to 1.x.

Is there anything I should or can do, with respect to these messages from the Terminal? As always, thanks a lot for your support and have a wonderful day :)

@kaixxx
Copy link
Owner

kaixxx commented Sep 24, 2024

Don't worry about the warnings in your quote from the logfile, they are normal.

What is you setting for whisper_xpu in config.yml? cpu or mps?

@gernophil
Copy link
Collaborator

Thanks for testing :). Maybe the easiest would be to delete the config.yml and let noScribe recreate it. With MPS active, it should either be fast or hit a memory error, but slow transcription shouldn't occur to my understanding :). And MPS should be compatible to the Terminal.app, because that's just the basic where every program is run (you just don't see it normally).

@DanBigi
Copy link
Author

DanBigi commented Sep 24, 2024

Something curious happened with the config.yml file: on Thursday I modified the file in order for pyannote to run on the CPU, then I ran it and it didn't stop but was very slow; before launching noScribe from the terminal I modified the file, setting pyannote back on the MPS, but as I was saying before noScribe continued being very slow, so I guess that the newly modified setting was not taken into consideration by Terminal and/or noScribe..? I don't know; this is what I was trying to mention in the last comment, but I wouldn't even know how to verify it. Lastly, I deleted the file and let noScribe create it anew: now config.yml reads "pyannote_gpu: MPS" and it does go as fast as it was in the first uses (it only took 40' to transcript the last 20' of the interview I was working on), so at least my problem should be solved now; please let me know if you could use the logs and/or any other experimentation on an M1 MacOS, I'd be glad to return the support :)

@gernophil
Copy link
Collaborator

This is really great that it works. @kaixxx, maybe we could make a paragraph in the readme to solve the memory issue. I wouldn't like to include the PYTORCH_MPS_HIGH_WATERMARK_RATIO=2.0 in general. But maybe there would be another option to set this manually in some advanced options and not via the command line. I am not sure, if the config.yml is evaluated early enough?

@kaixxx
Copy link
Owner

kaixxx commented Sep 24, 2024

Great to hear that it works. Adding an advanced option for this in config.yml would be nice.

I am not sure, if the config.yml is evaluated early enough?

The question is when PyTorch is evaluating this environment variable, on import/initialization or at runtime. Would the following test make sense in MacOS? @DanBigi could perform it on his machine:

  • Start noScribe without PYTORCH_MPS_HIGH_WATERMARK_RATIO=2.0 (so PyTorch is initialized without this setting)
  • Before starting the actual transcription, set a global environment variable PYTORCH_MPS_HIGH_WATERMARK_RATIO=2.0
  • Now test the transcription.

If it works, the environment variable will be evaluated on runtime, not during initialization. In this case, we could easily set this variable in code after reading config.yml. Otherwise, we would need to delay the import of PyTorch until config.yml is read, which would be possible but makes the code a little messier.

@DanBigi
Copy link
Author

DanBigi commented Sep 24, 2024

I'd be glad to perform this test, how (where) do I set the global environment variable PYTORCH[...] to 2.0? In the mean time, I ran another transcription and I got the MPS backend memory error, so I'm thinking of trying the watermark at 0.0 to see what happens... I could do this and the test you mentioned by tomorrow afternoon, if you tell me how to do it, and tell you the results afterwards :)

@gernophil
Copy link
Collaborator

  • Before starting the actual transcription, set a global environment variable PYTORCH_MPS_HIGH_WATERMARK_RATIO=2.0

I don't think this will work. Can't think of a way to set an env variable, if the process is already running.

@kaixxx
Copy link
Owner

kaixxx commented Sep 24, 2024

Can't think of a way to set an env variable, if the process is already running.

You are probably right. So the only way would be to modify the code. It's quite simple to set an env variable via os.environ['MY_VARIABLE'] = 'my_value'

@kaixxx kaixxx changed the title HELP PLEASE noScribe on MacOs - Error in phase 2 speaker identification & PyAnnote MacOs: MPS backend out of memory Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants