Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train_topo_loss is Nan.0 #35

Open
Teassassin opened this issue Oct 9, 2024 · 0 comments
Open

train_topo_loss is Nan.0 #35

Teassassin opened this issue Oct 9, 2024 · 0 comments

Comments

@Teassassin
Copy link

I tried to train this model (vit-b) on spacenet on one GPU, but I got the log below during training.

Epoch 0:  21%|██        | 1108/5292 [05:18<20:01,  3.48it/s, v_num=ej4y, train_mask_loss=0.605, train_topo_loss=nan.0, train_loss=nan.0]

I changed the default batch_size to fit my GPU with 16G Mem and kept the other settings. This is the yaml.

DATASET: 'spacenet'

# IN1k + MAE only
NO_SAM: False

SAM_VERSION: 'vit_b'
SAM_CKPT_PATH: 'sam_ckpts/sam_vit_b_01ec64.pth'
PATCH_SIZE: 256
BATCH_SIZE: 16
DATA_WORKER_NUM: 1
TRAIN_EPOCHS: 30
BASE_LR: 0.001
FREEZE_ENCODER: False
ENCODER_LR_FACTOR: 0.1
ENCODER_LORA: False
FOCAL_LOSS: False
USE_SAM_DECODER: False

# TOPONET
# sample per patch
TOPO_SAMPLE_NUM: 128
TOPONET_VERSION: 'normal'

# Inference
INFER_BATCH_SIZE: 2
SAMPLE_MARGIN: 0
INFER_PATCHES_PER_EDGE: 16

# ======= keypoint ======
# Best threshold 0.1949462890625, P=0.34380707144737244 R=0.326823890209198 F1=0.3351004719734192
# ======= road ======
# Best threshold 0.3408203125, P=0.6585257053375244 R=0.7146456837654114 F1=0.6854389309883118
# ======= topo ======
# Best threshold 0.705078125, P=0.9746968150138855 R=0.9701263904571533 F1=0.9724062085151672

ITSC_THRESHOLD: 0.195
ROAD_THRESHOLD: 0.341
TOPO_THRESHOLD: 0.705
# pixels
ITSC_NMS_RADIUS: 8
ROAD_NMS_RADIUS: 16
NEIGHBOR_RADIUS: 64
MAX_NEIGHBOR_QUERIES: 16

Looking forward to your reply. Ths!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant