an error with the training set direction #22

i-ting4931 · 2024-10-09T10:48:16Z

Hello, I recently installed this model to train a custom dataset. The environment setup is complete, and I first attempted to use the MOT17 dataset to test whether the training process works properly. However, during the training, I encountered some abnormal data, and I was wondering if you could provide any guidance on how to resolve this issue.

Currently, I have downloaded both the Crowdhuman and MOT17 datasets. However, while training, I noticed that all the loss values are zero, which seems to suggest that the data is not being properly loaded. To check the data loading path, I added the following line of code: print(f"Frame path: {frame_path}"). The result shows that the dataset is loading Crowdhuman, but when I issued the command, I set the dataset to MOT17. I'm not entirely sure where the problem lies—could you kindly take a look? Thank you very much.

Also, just to mention, my computer has only one GPU: an NVIDIA GeForce RTX 3060. Since my GPU is limited, do I need to modify any lambda functions in the code? I appreciate your help.

If there's anything that I didn't explain clearly, please feel free to let me know, and I will provide any additional details you may need.
Thank you again.

Here is the related information regarding the failed training.

HELLORPG · 2024-10-10T11:06:07Z

The data loading process is not related to the GPU. Therefore, I think you should not modified any data loading function on your GPU (3060).

I have never seen this issue before. Have you run the data preparation scripts (like ./data/gen_crowdhuman_gts.py) before running the training script?

i-ting4931 · 2024-10-15T09:45:41Z

Hello, considering that there might have been some mistakes in my previous operations, I deleted the previously cloned model and cloned it again. This time, I only modified the GPU quantity, downloaded the pre-trained weights and datasets, and placed them in the designated locations. I also used the gen_mot17_gts.py and gen_crowdhuman_gts.py files to generate the necessary files.

The training process is now running, but the loss values seem abnormal (most loss values start at 0 during the initial training). I suspect that the images might not have been successfully read by the model for training. Could you kindly advise if I made any mistakes? If the training were running correctly, what would the expected behavior look like? Thank you very much.

HELLORPG · 2024-10-16T09:05:12Z

To be honest, this is really weird. I need to wait until I have a spare GPU server to show what the correct logging looks like.

You could check the content of the data loaded into the training loop, after these lines:

MeMOTR/train_engine.py

Lines 192 to 199 in 7de13f4

    
           tracks = TrackInstances.init_tracks(batch=batch, 
        
                                               hidden_dim=get_model(model).hidden_dim, 
        
                                               num_classes=get_model(model).num_classes, 
        
                                               device=device, use_dab=use_dab) 
        
           criterion.init_a_clip(batch=batch, 
        
                                 hidden_dim=get_model(model).hidden_dim, 
        
                                 num_classes=get_model(model).num_classes, 
        
                                 device=device)

You can add code like these:

print(batch["infos"][0])                          # the GTs
print(batch["imgs"][0][0].shape)                  # the image's shape
# Or others

You could analyze it based on the results or upload it here.

i-ting4931 · 2024-10-17T07:52:11Z

Hello, I followed your suggestion and added the print statement. The current output is shown in the image below (Image 1). The prediction result tensor([], dtype=torch.int64) shows 0, and ids, areas, and labels all indicate that no objects were detected. I'm not quite sure why this is happening.

I noticed that in the train_mot17.yaml file, there is a setting "USE_CROWDHUMAN: True". I initially suspected that the issue might be due to training with MOT17 while having Crowdhuman included in the configuration. So, I changed "USE_CROWDHUMAN: True" to false, but this resulted in an error (Image 2).

I also tried some commands related to Submit and Evaluation, but I encountered a small issue. When using eval mode, I got the following error (Image 3), even though I don't have any related files. Could you please advise if this file is supposed to be generated automatically? If so, did I make a mistake somewhere?

Sorry for the multiple questions, and I truly appreciate your help.

Thank you very much.

(Image 1)

(Image 2)

(Image 3)

HELLORPG · 2024-10-18T15:59:21Z

According to (Image 2), it seems that you did not successfully load any image and annotation from MOT17. You can add some breakpoints during the data loading process to determine where the problem is.

For example, here:

MeMOTR/data/mot17.py

Lines 59 to 68 in 7de13f4

    
           for vid in self.mot17_seq_names: 
        
               mot17_gts_dir = os.path.join(self.mot17_gts_dir, vid, "img1") 
        
               mot17_gt_paths = [os.path.join(mot17_gts_dir, filename) for filename in os.listdir(mot17_gts_dir)] 
        
               for mot17_gt_path in mot17_gt_paths: 
        
                   for line in open(mot17_gt_path): 
        
                       _, i, x, y, w, h, v = line.strip("\n").split(" ") 
        
                       i, x, y, w, h, v = map(float, (i, x, y, w, h, v)) 
        
                       i, x, y, w, h = map(int, (i, x, y, w, h)) 
        
                       t = int(mot17_gt_path.split("/")[-1].split(".")[0]) 
        
                       self.mot17_gts[vid][t].append([i, x, y, w, h])

all GTs should be loaded.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

an error with the training set direction #22

an error with the training set direction #22

i-ting4931 commented Oct 9, 2024 •

edited

Loading

HELLORPG commented Oct 10, 2024

i-ting4931 commented Oct 15, 2024

HELLORPG commented Oct 16, 2024

i-ting4931 commented Oct 17, 2024

HELLORPG commented Oct 18, 2024

an error with the training set direction #22

an error with the training set direction #22

Comments

i-ting4931 commented Oct 9, 2024 • edited Loading

HELLORPG commented Oct 10, 2024

i-ting4931 commented Oct 15, 2024

HELLORPG commented Oct 16, 2024

i-ting4931 commented Oct 17, 2024

HELLORPG commented Oct 18, 2024

i-ting4931 commented Oct 9, 2024 •

edited

Loading