-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor issues and questions running code (salsa+salsa_lite) #2
Comments
Hi Andres, Thank you very much for your detailed note on running the code in this repo. I am terribly sorry for reply you this late. For SALSA-lite, the other repo was for publication purposes. This current repo will be used to maintain both salsa and salsa-lite. For the relative path. I should have put For the installation, I exported my current anaconda environment and it was bloated as I kept on installing more packages ^^. Thanks for fixing it. For the pretrained models, do you want to upload any of your models. My previously trained models have some extra components that I experimented but do not seem to help. I will try to upload a pretrained model soon. We are happy to receive your PR to merge the changes that you have addressed. If you do not mind, please also feel free to edit README to include the information on the running time and storage size I am very happy that this repo seems to be useful for you. Thanks again. |
Hi Andres, when I use this instruction:pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html,I'm going to get an incorrect instruction like this:ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'. Coud you tell me how to solve it? Thank you very much! |
Hi @xieyin666, it looks like you might not be in the correct directory. Please make sure you are in the directory where the requirement file is present, or you can call |
Thank you for your reply. I just downloaded your original code, but after unzipping, file requirements.txt is not in it. |
That is because there is not "requirements.txt". As I understand, you have to create one yourself, and add in it the dependencies that @andres-fr mentioned that they worked for him. |
@xieyin666 @karnwatcharasupat @Peter-72 |
ok ,thank you! |
@andres-fr |
@xieyin666 I believe that is an enhancement that is going to require some effort and can't be addressed here. In this issue #4 I have provided code for on-the-fly, parallelized computation. You could use that as basis, and then check e.g. the DataParallel API in PyTorch, or Ray, to distrubute the batches across GPUs. If you manage to do it, it would be great if you can share your process and results in a separate issue. Cheers |
Hi @thomeou and @karnwatcharasupat, I would like to congratulate and thank you of your amazing work on this project. I am a computer science student at The German University in Cairo. Currently, I am in the bachelor semester, and my bachelor project is one of the approaches to solve the dcase2021 named "Spectrotemporally-Aligned Features and Long Short-Term Memory for Sound Event Localization and Detection", which is yours. My work should be understanding challenge, your approach, and if possible, help improve it. As I am new to the world of deep learning and pytorch, I am having problems running/training the model. Thanks to @andres-fr I progressed a lot in setting the stage for doing so, but, still, I am not able to run the model yet. So, here is what I have done, and please help filling the gaps if possible:
I understand that this a useful option, but I don't understand should I replace those lines with all the file or just a specific part? These are the things I did. After this, as I understand, I should execute the That's it. |
@andres-fr |
@Peter-72 Everything you've done should be right.However, I haven't found a better way to solve the problem that GPU memory may not be enough. All I can think of so far is using multi-card training, but I don't know how to change the code. (for Pytorch-lightning code) |
Try a smaller batch size |
@karnwatcharasupat , @thomeou , @andres-fr , @xieyin666
Each make has a slightly different message in terms of the
I have checked everything in my settings including: the environmental variables and vscode python interpreter path, and both lead to the salsa env created for the project. I have run out of solutions; any help please? Also, is there something in the code that targets this issue? |
@Peter-72 try removing the space after PYTHONPATH=. Sorry i dont have a computer atm |
@andres-fr
|
I switched to pycharm as many people online suggested, but the problem still persist. |
Dear @Peter-72 , To your original questions
To reduce gpu ram usage, reduce batch size in the config files (look for conf inside the repo). Again, sorry no pc here. Cheers! |
@andres-fr @thomeou @karnwatcharasupat @Peter-72 |
@andres-fr The author's code was used to train three times in a row and test the best model for three times, but the values of the three groups of results were very different. That is, the model is not stable. |
@thomeou @karnwatcharasupat @andres-fr
Notice that the numbers don't change how many times I change the chunk size. |
Hi @Peter-72 Glad to see you made it to training! Maybe the others have some idea but let me give you my 2c: What you describe sounds like a memory leak: in Python we usually don't worry about memory management because the garbage collector takes care of "removing" unused objects, but leaks that lead to RAM bloating still can and do happen mainly when our code is constantly creating new objects, and keeping references to them even after they aren't needed anymore (e.g. some inner variable in a for loop is being added to a datastructure). This happens mainly through 2 patterns: if we do it explicitly by using deep recursion, or collecting some datastructure (rather unusual), or if we use some library that implicitly collects our computations into a global memory scope, without telling us. Whenever I have leaks, that is usually the case, e.g. if you are generating large matplotlib renderings using the global context (things like Maybe the others can chip in and have a better idea (since I didn't write the orignal code), but to fix a leak I'd comment out the whole loop and then add/remove parts of the code until you find the ones that lead to RAM bloating. Cheers! |
@Peter-72 After more careful looking, it could be that you just don't have enough RAM. If I recall correctly, the script tries to load all 400 chunks at the same time, which would mean you need to do some fundamental changes in the code and/or here |
@andres-fr Hey man, thanks for replying.
I made sure it reached that condition. But after getting inside the condition, it stays and keeps computing this line: |
Hi @Peter-72, @dududukekeke, would you mind creating separate issues for each of your concerns so we can address them more cleanly. Thank you! As for the memory issues by @Peter-72, our version was trained on a very large server (a few hundred GB of RAM) so we admittedly didn't quite optimize that part of the code. I will have to take a closer look and do proper optimization on it but a very ducttape fix you can try now would be to do the following:
This should help to reduce to RAM requirement during the concat ops. I will write a more proper edit to that code when I have more time. |
Shameless plug: Whenever you are dealing with large amounts of numerical data that don't fit in RAM, the HDF5 format is a very flexible way of storing them in persistent memory, while allowing for relatively fast read/write operations and leaving most your RAM untouched. I developed this (IMO extremely useful) class to create and parse incremental HDF5 files in Python, which I've been using in PyTorch dataloaders on large datasets. If the "chunk size" is properly chosen, this can lead to huge speedups as compared to regularly loading from filesystem + precomputing: https://gist.github.com/andres-fr/3ed6b080adafd72ee03ed01518106a15 Cheers |
Hey @karnwatcharasupat @andres-fr, Thanks for both of your replies. @karnwatcharasupat I have implemented the helper method that you suggested, and it worked like a charm. But I don't understand the importance of the
@thomeou @karnwatcharasupat I want to ask about your choice of @andres-fr Cheers for your work on the class you created. If only can you tell me how to add/use it in this project :) |
@Peter-72 Thanks a lot. We can continue the discussion in #7. With regards to Anyway, you can read up more about using HDF5 here: https://docs.h5py.org/en/stable/. They have quite a decent explanation. Lots of other resources on the internet as well. |
|
请问您,最后是怎么解决得,修改了那些部分的代码吗 @andres-fr @Peter-72 |
Hello, i have a question about how to calculate phase_vector. |
Hi everyone, If you have a specific question that hasn't been tackled yet, kindly open a new issue, making sure that you provide us with all the necessary info to help you. This may include:
Note that these are not compulsory, you can still but vastly improve the chances that we are able to help you. For example, regarding the question above, normalized complex numbers only differ in their angles, or "phases". And conjugating a complex number amounts to multiplying its angle by -1. Finally, multiplying complex numbers causes their angles to add up. Probably this was a way to obtain the phase differences for all different channels. But it's been a long time and without a link to the code I don't recall the context... so please follow the guidelines above for more info. In the meantime I will close this issue. Cheers! |
Hi! many congratulations for this outstanding line of work, and thanks a lot for sharing it.
I am running this repo on Ubuntu20.04+CUDA, and gathered a few notes on the process, with the hope that they are helpful to others.
I also encountered a few minor issues, and I also have some open questions that I couldn't answer reading the paper&docs, I was wondering if someone could take a look at them.
As for the changes I propose, I'll be happy to submit a PR if appropriate.
Cheers!
Andres
Installation:
Although mentioned in the README, there is no pip-compatible
requirements.txt
file, and therequirements.yml
imposes more constraints than needed. The following minimal list worked for me:Then, the environment can be initialized as follows (inside of
<REPO_ROOT>
):Precomputing SALSA features:
The readme specifies that the dataset should be found inside of
<REPO_ROOT>/dataset/data
. For that reason, we can get rid of the absolute paths in config files and replace them with the following relative paths. Intnsse2021_salsa_feature_config.yml
:Then, running
make salsa
from<REPO_ROOT>
(with the env activated) works perfectly, and yields results inside<REPO_ROOT>/dataset/features
. In my case, bothdata
andfeatures
were softlinks to an external memory drive, and it still worked fine.Computing the SALSA features for the 600 MIC wav files (1 minute each, 4 channels, 24kHz, 6.9GB total) on an
[email protected]
CPU took ca. 35 minutes and 21.5GB with the default settings:Precomputing SALSA-Lite features:
Analogous remarks as with SALSA. Computation took 2 minutes and 20.5GB.
Here, the question is how does the SALSA-Lite dedicated repo interact with this one. Will both be maintained, or is this the "main" one and the other was for publication purposes?
Training:
Question: Any pretrained models available? I couldn't find any upon a brief search
Regarding config, here we can also replace absolute with relative paths:
Training setup had a couple of minor issues:
In the
README
we can currently see the following instruction:This should be updated, because the training script also expects the metadata
.csv
files to be on the outer folder, so those have to be moved as well. Otherwise we get a file not found error.Side note for further readers: The "train/val split" information gets lost when mixing all files, but the repo actually has this information in the form of CSV files, stored at
dataset/meta/dcase2021/original
. So mixing is fine; still, it is probably not a bad idea to make a backup of the original dev metadata before mixing everything together (it is not very large).As for
make train
, it is currently hardcoded tosalsa
, the instructions to train onsalsa-lite
didn't work for me. I changed the "Training and inference" section inMakefile
to the following, so that we can train on both via eithermake train-salsa
ormake train-salsa-lite
:After a few epochs, the models seemed to converge well, so I believe all the above modifications were successful. Let me know if I am missing something!
The text was updated successfully, but these errors were encountered: