Error when running Evaluate : axis 1 is out of bounds for array of dimension 0 #230

Julmatap opened this issue Aug 10, 2020 · 21 comments


Hello everyone,

Firstly I would like to say thank you for this amazing project and the release of your code.
I am very new to programming and ML and I wanted to try your project.

I downloaded the data, installed the environments on latest versions, followed the reproduced results and looked at the issues and tried to solve the errors I got by myself as much as possible, but right now i'm stucked.

I first prepared masks & metadata then I downloaded your best models weights, created the transformers folder in data_experiments/mapping_challenge_baseline and copied "unet" and "scoring_model" there. I then changed the values in neptume.yaml as suggested and tried to evaluate by running : "python evaluate --pipeline_name unet"

Here is what I got :

Hope I gave you enough informations, and that you can help me resolve this issue.

Again, thank you guys !

PS : I saw that issue #228 has this problem also, but i don't have the "valid data is None" as you can see on screenshots here attached

Hi @Julmatap, thank for the nice words.

Could you please paste your directory structure (data paths) and the content of the neptune.yaml file (data paths) ?

Julmatap commented Aug 10, 2020

Sure ! (Thanks for your quick reply !)

Based on yours, here's mine :

|-- ...
|-- data_raw
    |-- train 
         |-- images 
         |-- annotation.json
    |-- val 
         |-- images 
         |-- annotation.json
    |-- test (and not test_images) 
         |-- img1.jpg
         |-- img2.jpg
         |-- ...
|-- data_meta
    |-- masks_overlayed_eroded_{}_dilated_{} 
         |-- train 
             |-- distances 
             |-- masks 
             |-- sizes 
         |-- val 
             |-- distances 
             |-- masks 
             |-- sizes 
    |-- metadata.csv
|-- data_experiments
    |-- mapping_challenge_baseline 
         |-- transformers
         |-- outputs 
         |-- tmp

And the parameters content of my neptune.yaml looks like this :
project: shared/showroom

name: mapping_challenge_baseline
tags: [solution_5]


Data Paths

data_dir: data_raw
meta_dir: data_meta
masks_overlayed_prefix: masks_overlayed
experiment_dir: data_experiments/mapping_challenge_baseline

Hope I understood what you asked me.

I'll try to use the exact same directory structure as you did in your example and correct the neptune.yaml accordingly to see if it resolves the problem. I didn't find where in the code it would pose a problem to have this kind of structure but since i'm a beginner it may be normal lmao.

Mhm, so do you have data_raw data_meta and others inside of the data directory or a the same level as ?

Dear Jakub,
I had data_raw and data_meta folders at the same level as .
After you answered I made the directory structure exactly like what you suggested which now looks like this :

|-- ...
|-- data
    |-- raw
         |-- train 
            |-- images 
            |-- annotation.json
         |-- val 
            |-- images 
            |-- annotation.json
         |-- test_images 
            |-- img1.jpg
            |-- img2.jpg
            |-- ...
    |-- meta
         |-- masks_overlayed_eroded_0_dilated_0 
            |-- train 
                |-- distances 
                |-- masks 
                |-- sizes 
            |-- val 
                |-- distances 
                |-- masks 
                |-- sizes 
    |-- experiments
        |-- mapping_challenge_baseline
            |-- checkpoints 
            |-- transformers 
            |-- outputs 

I changed my neptume.yaml to look like this :

Data Paths

data_dir: data/raw
meta_dir: data/meta
masks_overlayed_prefix: masks_overlayed
experiment_dir: data/experiments/mapping_challenge_baseline

However, when i run evaluate, it still throws me the exact same error.

Could you try and run the training pipeline first?

python train --pipeline_name unet

I did it as soon as you told me to, but it doesn't seem like it's running, here's a screenshot :

Could it be linked to my environment ? I had to change versions and some stuff for it to work (i'm on windows10 x64). I couldn't install Torch v 0.3.1 and others via your environment.yml.

Here is my complete env for information :

$ conda list

Ok, I see.

I think that is the problem.
Everything was written for torch==0.3.1 and newer releases (torch==1.4 for sure) have changed things.
Could you try and install it by hand:

pip install torch==0.3.1

I tried again installing torch==0.3.1 and here's what I got

Copy link

jakubczakon commented Aug 11, 2020

I think you need to downgrade python for this.
As environment.yml suggest it should be python=3.6.8
The easiest way to do it is to go:

conda env create -f environment.yml

but as I understand that isn't working.
In that case I'd just create a clean conda environment with python 3.6:

conda create -n py_36_env python=3.6

activate it

conda activate py_36_env

and then install the dependencies from environment.yml

Julmatap commented Aug 12, 2020

Dear @jakubczakon thank you for your time helping me !

It was indeed a problem with Torch version.

However I still couldn't install Torch==0.3.1 even after downgrading python and so on... I tried manually installing it, but I still had the same error as per my previous screenshot, I also tried finding a .whl but there was none for v 0.3.1 win64.
So I tried using the closest version I could find and ended up using Torch version 0.4.1, and it seems to be working just fine.

Right now I'm still evaluating since more than 5 hours, and I can tell it's running with neptune and my memory usage. (But it's not using GPU despite my GPU being cuda compatible, is it normal ?)



Once it is finished and I am sure it is working, I will post my conda list, if it can help someone else facing this issue.

Again, thank you very much for your time 👍

That is awesome thank you @Julmatap!

You can see if it is running but going to the Charts or Logs section of the UI.
You should see some activity there.

Also, it seems that there is a message in the terminal that explains why the GPU is not logged -> sending GPU metrics was blocked by the system. This is very much unexpected.

Unfortunately there is nothing "No charts here" and "waiting for data", and I think there's a problem, someone else launched an experiments 3mn and he has already some stuffs.

Copy link

jakubczakon commented Aug 12, 2020

You are running the evaluation now correct?
If so can you try running training and see if something is happening?

Also you can go to your terminal and run:


to see if your GPU is actually doing something.

Copy link

Hi Jakub,

Yes I was running the evaluation, I tried running the training today, and there's still no charts.
I runned nvidia-smi before and today again and it says that no processes is running.

I guess I'll just start again on a Linux VM. Has anyone already suceeded in running it on windows ? The operating system blocking the request is something I don't find any informations on.

Copy link

I see @Julmatap,

Unfortunately, I don't know if anyone succeeded on Windows -> everyone I know who used this repo was using Linux.

I confirm it is working on Linux, I just did an evaluation and it worked, just had to wait a long time at "steps >>> step unet transforming...".

Again, thanks for your time @jakubczakon ! 👍

That is awesome!

I only wish I could be of more help but I am proud of you getting it done.

Julmatap commented Aug 24, 2020

Dear Jakub,

One last question :

I have my metrics which looks like this :

However, when I run the notebook "results on exploration" my predictions are blank :

Edit : I did open predictions.json and manually added the first image_id I saw on the notebook and the prediction runs fine.
However it seems that all the val folder has not been predicted and so when I run random choices it usually doesn't work.

