Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mash fix #263

Merged
merged 5 commits into from
Oct 3, 2024
Merged

Mash fix #263

merged 5 commits into from
Oct 3, 2024

Conversation

rfeng2023
Copy link
Collaborator

No description provided.

R/mash_wrapper.R Outdated
# remove the data driven U in U list, and recalculate w
U[phenos_out] <- NULL
w <- w[!names(w) %in% phenos_out]
sum_w = sum(w)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sum_w has to be computed before removing any elements from w

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ie, before the for loop where w gets changed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is related to the below comments, if we removed more context by doing your suggestion the sum of w won't be 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sum of w don't have to be 1. but it is important to maintain the same summed value as original in your output w. Suppose here is the input:

U = [U1, U2,U3]
w = [0.1, 0.2,0.1]

also suppose U3 is removed, so the updated is

U = [U1, U2]

define sum_w = 0.1 + 0.2 + 0.1 = 0.4. In this case you can understand it as the remaining 0.6 are the NULL

When U3 is removed, the sum of the remaining weights becomes:

sum_w_new = 0.1 + 0.2 = 0.3

To maintain the original total weight sum of 0.4, adjust the weights proportionally:

w1 = (0.1 / 0.3) * 0.4 = 0.1333
w2 = (0.2 / 0.3) * 0.4 = 0.2667

So the updated weights are:

w = [0.1333, 0.2667]

My initial suggestion on slack was not correct eitehr so here i flush out it more refined fashion.

Copy link
Collaborator Author

@rfeng2023 rfeng2023 Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can switch to this version, but just a reminder: the sum_w is close to 1. So, even though we normalize with (w1 / sum_w_new) * sum_w, it essentially simplifies to w1 / sum_w_new, giving us the same value as the current version.
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i understand. my proposal is for when null components are also considered, as a more generic case and I can anticipate that usage.

R/mash_wrapper.R Outdated
message(paste(length(U), "components of matrices remained after filtering. "))
return(list(U = U, w = w))
# remove the data driven U in U list, and recalculate w
U[phenos_out] <- NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line does not look correct to me. After subsetting U (line 58 to 65), if the remaining elements in U are all zeros, as in the case of a singleton matrix removed from the mixture, then we should already set those matrices NULL.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original U is like below:

'XtX''tFLASH_default''FLASH_default_1''FLASH_default_2''FLASH_default_3''FLASH_default_4''FLASH_default_5''FLASH_default_6''FLASH_default_7''FLASH_default_8''FLASH_default_9''FLASH_default_10''FLASH_default_11''tFLASH_nonneg''FLASH_nonneg_1''FLASH_nonneg_2''FLASH_nonneg_3''FLASH_nonneg_4''FLASH_nonneg_5''FLASH_nonneg_6''FLASH_nonneg_7''FLASH_nonneg_8''FLASH_nonneg_9''FLASH_nonneg_10''FLASH_nonneg_11''FLASH_nonneg_12''PCA_1''PCA_2''tPCA''identity''Knight_eQTL_brain''MiGA_GFM_eQTL''MiGA_GTS_eQTL''MiGA_SVZ_eQTL''MiGA_THA_eQTL''BM_10_MSBB_eQTL''BM_22_MSBB_eQTL''BM_36_MSBB_eQTL''BM_44_MSBB_eQTL''monocyte_ROSMAP_eQTL''Mic_DeJager_eQTL''Ast_DeJager_eQTL''Oli_DeJager_eQTL''Exc_DeJager_eQTL''Inh_DeJager_eQTL''DLPFC_DeJager_eQTL''PCC_DeJager_eQTL''AC_DeJager_eQTL''Mic_Kellis_eQTL''Ast_Kellis_eQTL''Oli_Kellis_eQTL''OPC_Kellis_eQTL''Exc_Kellis_eQTL''Inh_Kellis_eQTL''Ast_mega_eQTL''Exc_mega_eQTL''Inh_mega_eQTL''Oli_mega_eQTL''STARNET_eQTL_Mac''DLPFC_Bennett_pQTL''OPC_DeJager_eQTL''OPC_mega_eQTL''Ast_10_Kellis_eQTL''Mic_12_Kellis_eQTL''Mic_13_Kellis_eQTL''Mic_mega_eQTL''STARNET_eQTL''equal_effects''simple_het_1''simple_het_2''simple_het_3'

as you can see, there included some data driven U from each context. To my understanding, we should not keep the filter out contexts in this list, so this line is going to remove the U such as BM_22_MSBB_eQTL when we decide to look into ROSMAP data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. Still, my reply is "After subsetting U (line 58 to 65), if the remaining elements in U are all zeros, as in the case of a singleton matrix removed from the mixture, then we should already set those matrices NULL." To elaborate, if a matrix is BM_22_MSBB_eQTL it would be non-zero only at the BM_22_MSBB_eQTL entry. Once we subset and removed that context, this entire compoment BM_22_MSBB_eQTL will be all zeros and therefore get removed.

Copy link
Collaborator Author

@rfeng2023 rfeng2023 Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got your problem. The issue is, for the U as shown in the figure, even though we subset the contexts which is not Knight, there could not be all zeros because of a much smaller value than Knight itself in diagonal. that is why it is not removed.
image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we have another option here: to either check if all elements are zero to if all elements, except for the diagonal, are zero.

Copy link
Contributor

@gaow gaow Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this totally makes sense now for what @Chunmingl observed those redundant diagonal matrices!

you are right, i forgot how it works ... it was using context specific matrices only as initial values into the EM algorithm but the output will actually change the matrices as it sees fit, possibly slightly, like the above. Perhaps just to make it full rank. But this creates an issue for us.

We cannot simply remove diagonal matrices because signals on the diagonal can be strong and relevant.

If we stick to exactly our pipeline with all the column name conventions being consistent with the context names and singleton matrices also consistent, then perhaps what you did makes the most sense after all, to get things roughly working .... let's do that. My only request then is to add some comment strings in the code to point to this discussion here, for future references.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing, will do

R/mash_wrapper.R Outdated
return(list(U = U, w = w))
# remove the data driven U in U list, and recalculate w
U[phenos_out] <- NULL
w <- w[!names(w) %in% phenos_out]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here -- w is already taken care of in the for loop above. I dont think it is correct to have this line here. Unless am missing someting?

@gaow gaow merged commit 0602cd4 into cumc:main Oct 3, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants