Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error at def expBaseAE(adata, exp_metrics): #4

Open
Roger-GOAT opened this issue Apr 1, 2022 · 1 comment
Open

error at def expBaseAE(adata, exp_metrics): #4

Roger-GOAT opened this issue Apr 1, 2022 · 1 comment

Comments

@Roger-GOAT
Copy link

Hi dear team, thanks for the package. I get an error could you mind give some tips.

def expBaseAE(adata, exp_metrics):
    n_cells, n_genes = adata.X.shape
    in_dim = n_genes
    z_dim = args.z_dim
    h_dim = args.h_dim
    
    model = get_baseline_AE(in_dim, z_dim, h_dim).to(device)
    model = main_AE(args, model, save_name=f"baseAE_{args.model_name}")
    model.eval()
    with torch.no_grad():
        x = model.encoder(tensor_x)
        s = model.encoder(tensor_s)
        u = model.encoder(tensor_u)
        
        v = estimate_ld_velocity(s, u, device=device).cpu().numpy()
        x = x.cpu().numpy()
        s = s.cpu().numpy()
        u = u.cpu().numpy()
        
    adata = new_adata(adata, x, s, u, v, g_basis=args.nb_g_src)
    scv.tl.velocity_graph(adata, vkey='new_velocity')

    scv.pl.velocity_embedding_stream(adata, vkey="new_velocity", basis='X_umap', color='leiden',
                                    title="Baseline AutoEncoder",
                                    )  
    scv.tl.velocity_confidence(adata, vkey='new_velocity')
    exp_metrics['Baseline AutoEncoder'] = evaluate(adata, cluster_edges, 'leiden', "new_velocity")
    
expBaseAE(adata, exp_metrics)

Train Epoch: 100/20000 	Loss: 58.533680
Train Epoch: 200/20000 	Loss: 58.349663
Train Epoch: 300/20000 	Loss: 58.188026
Train Epoch: 400/20000 	Loss: 58.048077
Train Epoch: 500/20000 	Loss: 57.929665
.......
Train Epoch: 11400/20000 	Loss: 20.946295
Train Epoch: 11500/20000 	Loss: 20.501766
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [10], in <module>
     26     scv.tl.velocity_confidence(adata, vkey='new_velocity')
     27     exp_metrics['Baseline AutoEncoder'] = evaluate(adata, cluster_edges, 'leiden', "new_velocity")
---> 29 expBaseAE(adata, exp_metrics)

Input In [10], in expBaseAE(adata, exp_metrics)
      5 h_dim = args.h_dim
      7 model = get_baseline_AE(in_dim, z_dim, h_dim).to(device)
----> 8 model = main_AE(args, model, save_name=f"baseAE_{args.model_name}")
      9 model.eval()
     10 with torch.no_grad():

Input In [6], in main_AE(args, model, lr, weight_decay, save_name)
      9 while i < args.n_epochs:
     10     i += 1
---> 11     loss = train_step_AE([tensor_s, tensor_u], model, optimizer, xyids=[0, 1], device=device)
     12     losses.append(loss)
     13     if i % args.log_interval == 0:

File ~/miniconda3/envs/velo/lib/python3.8/site-packages/veloproj/util.py:370, in train_step_AE(Xs, model, optimizer, xyids, device, aux_weight, rt_all_loss, perc, norm_lr)
    367     lr_loss = vloss.item()
    368     loss += vloss
--> 370 loss.backward()
    371 optimizer.step()
    372 if rt_all_loss:

File ~/miniconda3/envs/velo/lib/python3.8/site-packages/torch/_tensor.py:307, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    298 if has_torch_function_unary(self):
    299     return handle_torch_function(
    300         Tensor.backward,
    301         (self,),
   (...)
    305         create_graph=create_graph,
    306         inputs=inputs)
--> 307 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)

File ~/miniconda3/envs/velo/lib/python3.8/site-packages/torch/autograd/__init__.py:154, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    151 if retain_graph is None:
    152     retain_graph = create_graph
--> 154 Variable._execution_engine.run_backward(
    155     tensors, grad_tensors_, retain_graph, create_graph, inputs,
    156     allow_unreachable=True, accumulate_grad=True)

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100]], which is output 0 of IndexPutBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
@qiaochen
Copy link
Owner

qiaochen commented Apr 8, 2022

Thank you!
It seems the vanila AE is projecting the input matrices into spurious representations that resulted in "NaN" values when fitting the low-dimensional linear regression.
The code in current line 427 of the model.py filters the nan values by assigning 0 values:
offset[torch.isnan(offset)], gamma[torch.isnan(gamma)] = 0, 0

I guess the error is raised by this line of code.
The good news is that a safer nan filtering function (i.e., torch.nan_to_num) has been provided by pytorch since version >= 1.8, updated in veloAE with the hope to tackle the issue.

Updated the codes with the following nan filtering operations in model.py:

 if torch.any(nans_offset) or torch.any(nans_gamma):
        version_1_8 = sum([int(this) >= that for this,that in zip(torch.__version__.split('.')[:2], [1, 8])]) == 2
        if version_1_8:
            offset = torch.nan_to_num(offset)
            gamma  = torch.nan_to_num(gamma)
        else:
            offset = torch.where(nans_offset, torch.zeros_like(offset), offset)
            gamma  = torch.where(nans_gamma, torch.zeros_like(gamma), gamma)

Hope it helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants