Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One error #12

Open
mumuyeye opened this issue Oct 3, 2023 · 3 comments
Open

One error #12

mumuyeye opened this issue Oct 3, 2023 · 3 comments

Comments

@mumuyeye
Copy link

mumuyeye commented Oct 3, 2023

I sincerely hope you could tell me how to handle this error?
Traceback (most recent call last):
File "/home/lyn/miniconda3/envs/memit/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/lyn/miniconda3/envs/memit/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/lyn/memit/experiments/evaluate.py", line 299, in
main(
File "/home/lyn/memit/experiments/evaluate.py", line 146, in main
edited_model, weights_copy = apply_algo(
File "/home/lyn/memit/memit/memit_main.py", line 44, in apply_memit_to_model
deltas = execute_memit(model, tok, requests, hparams, cache_template=cache_template)
File "/home/lyn/memit/memit/memit_main.py", line 196, in execute_memit
adj_k = torch.linalg.solve(
torch._C._LinAlgError: linalg.solve: The diagonal element 2 is zero, the solve could not be completed because the input matrix is singular.
I just run with this command:
CUDA_VISIBLE_DEVICES=2 python3 -m experiments.evaluate --alg_name=MEMIT --model_name=/home/lyn/EleutherAI/gpt-j-6B --hparams_fname=EleutherAI_gpt-j-6B.json --num_edits=1

@dtamayo-nlp
Copy link

dtamayo-nlp commented Oct 9, 2023

I ran into the same problem when using a different LLM. The problem you are finding is related to equation (14) of the paper MEMIT, in my case the problem I had was that the "aggregate statistic $C_0$" had rows and columns with zeros, and even summing $K_1 K_1^T$ those rows were still zero. When you have any row of zeros, the matrix is "singular", which implies that you cannot compute its inverse. If you look at the construction of these matrices, the existence of zeros implies that there are coordinates in the hidden states unused. However, how can you solve it?

I found two real solutions:

  1. Easy solution. Do not retrain the layers that are having these problems. If you go to "hparams/MEMIT/EleutherAI_gpt-j-6B.json" you'll see that the layers that are being trained are:
    "layers": [ 3, 4, 5, 6, 7, 8 ],
    Change the matrices that are causing these problems, if you look at the causal trace you will see that you have some freedom to choose between them.
  2. Hard solution: Remove the rows/columns that are full of zeros, compute the inverse of the matrix, and add the rows/columns of zeros again. Note that here you will not be computing the inverse, since some columns will be zero, but it will be an approximation that do not add noise. The problem I experimented with this solution is that even removing the zero row/columns, there were still some "unimportant" coordinates that were raising the norm of my delta matrix, which is making me cautious.
    To implement this go to memit_main and add these lines to the beginning:
def make_null_i(matrix, i):
    new_matrix = matrix.clone()
    new_matrix[:,i] = new_matrix[:,i]*0
    new_matrix[i,:] = new_matrix[i,:]*0
    return new_matrix
def identify_null_cols(matrix):
    row_sums = matrix.clone().sum(dim=1)
    zero_rows = torch.nonzero(row_sums == 0).squeeze()
    return zero_rows.numel(), zero_rows.tolist()

def remove_column(matrix, i):
    new_matrix = matrix.clone()
    new_matrix = torch.cat((new_matrix[:i], new_matrix[i+1:]), dim=0)
    new_matrix = torch.cat((new_matrix[:, :i], new_matrix[:, i+1:]), dim=1)
    return new_matrix

def add_zero_column(matrix, i):
    new_matrix = matrix.clone()
    new_row = torch.zeros(1, matrix.shape[1], device=matrix.device, dtype=matrix.dtype)
    new_col = torch.zeros(matrix.shape[0] + 1, 1, device=matrix.device, dtype=matrix.dtype)
    new_matrix = torch.cat((new_matrix[:i], new_row, new_matrix[i:]), dim=0)
    new_matrix = torch.cat((new_matrix[:, :i], new_col, new_matrix[:, i:]), dim=1)
    return new_matrix

def compute_pseudoinverse_matrix(matrix):
    n, ids = identify_null_cols(matrix)
    print(f"There are {n} columns with zeros")
    if n==0:
        return torch.linalg.inv(matrix)
    # Remove the zero columns that are causing our matrix to be singular
    new_matrix = matrix.clone()
    for id_ in ids[::-1]:
        new_matrix = remove_column(new_matrix, id_)
    # Computing inverse
    new_matrix = torch.linalg.inv(new_matrix)
    # Rescaling the matrix
    for id_ in ids:
        new_matrix = add_zero_column(new_matrix,id_)
    return new_matrix

and then change the lines 196-199 to:

matrix = hparams.mom2_update_weight * cov.double().detach().cpu()+layer_ks.detach().cpu()@layer_ks.T.detach().cpu()
n_nul_cols,_ = identify_null_cols(matrix)
if n_nul_cols != 0:
    adj_k = compute_pseudoinverse_matrix(matrix) @ layer_ks.detach().cpu()
else:
    adj_k = torch.linalg.solve(matrix,layer_ks.detach().cpu())
  1. Extra possible solution? Increment number of edits.

I hope it helps. Good luck!

@mumuyeye
Copy link
Author

mumuyeye commented Oct 9, 2023

I found your response to be quite valuable. Thank you very much!

@mumuyeye
Copy link
Author

mumuyeye commented Oct 9, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants