-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mash fix #263
Mash fix #263
Conversation
R/mash_wrapper.R
Outdated
# remove the data driven U in U list, and recalculate w | ||
U[phenos_out] <- NULL | ||
w <- w[!names(w) %in% phenos_out] | ||
sum_w = sum(w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sum_w has to be computed before removing any elements from w
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ie, before the for loop where w
gets changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is related to the below comments, if we removed more context by doing your suggestion the sum of w won't be 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sum of w don't have to be 1. but it is important to maintain the same summed value as original in your output w. Suppose here is the input:
U = [U1, U2,U3]
w = [0.1, 0.2,0.1]
also suppose U3
is removed, so the updated is
U = [U1, U2]
define sum_w = 0.1 + 0.2 + 0.1 = 0.4. In this case you can understand it as the remaining 0.6 are the NULL
When U3
is removed, the sum of the remaining weights becomes:
sum_w_new = 0.1 + 0.2 = 0.3
To maintain the original total weight sum of 0.4, adjust the weights proportionally:
w1 = (0.1 / 0.3) * 0.4 = 0.1333
w2 = (0.2 / 0.3) * 0.4 = 0.2667
So the updated weights are:
w = [0.1333, 0.2667]
My initial suggestion on slack was not correct eitehr so here i flush out it more refined fashion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes i understand. my proposal is for when null components are also considered, as a more generic case and I can anticipate that usage.
R/mash_wrapper.R
Outdated
message(paste(length(U), "components of matrices remained after filtering. ")) | ||
return(list(U = U, w = w)) | ||
# remove the data driven U in U list, and recalculate w | ||
U[phenos_out] <- NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line does not look correct to me. After subsetting U
(line 58 to 65), if the remaining elements in U are all zeros, as in the case of a singleton matrix removed from the mixture, then we should already set those matrices NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original U is like below:
'XtX''tFLASH_default''FLASH_default_1''FLASH_default_2''FLASH_default_3''FLASH_default_4''FLASH_default_5''FLASH_default_6''FLASH_default_7''FLASH_default_8''FLASH_default_9''FLASH_default_10''FLASH_default_11''tFLASH_nonneg''FLASH_nonneg_1''FLASH_nonneg_2''FLASH_nonneg_3''FLASH_nonneg_4''FLASH_nonneg_5''FLASH_nonneg_6''FLASH_nonneg_7''FLASH_nonneg_8''FLASH_nonneg_9''FLASH_nonneg_10''FLASH_nonneg_11''FLASH_nonneg_12''PCA_1''PCA_2''tPCA''identity''Knight_eQTL_brain''MiGA_GFM_eQTL''MiGA_GTS_eQTL''MiGA_SVZ_eQTL''MiGA_THA_eQTL''BM_10_MSBB_eQTL''BM_22_MSBB_eQTL''BM_36_MSBB_eQTL''BM_44_MSBB_eQTL''monocyte_ROSMAP_eQTL''Mic_DeJager_eQTL''Ast_DeJager_eQTL''Oli_DeJager_eQTL''Exc_DeJager_eQTL''Inh_DeJager_eQTL''DLPFC_DeJager_eQTL''PCC_DeJager_eQTL''AC_DeJager_eQTL''Mic_Kellis_eQTL''Ast_Kellis_eQTL''Oli_Kellis_eQTL''OPC_Kellis_eQTL''Exc_Kellis_eQTL''Inh_Kellis_eQTL''Ast_mega_eQTL''Exc_mega_eQTL''Inh_mega_eQTL''Oli_mega_eQTL''STARNET_eQTL_Mac''DLPFC_Bennett_pQTL''OPC_DeJager_eQTL''OPC_mega_eQTL''Ast_10_Kellis_eQTL''Mic_12_Kellis_eQTL''Mic_13_Kellis_eQTL''Mic_mega_eQTL''STARNET_eQTL''equal_effects''simple_het_1''simple_het_2''simple_het_3'
as you can see, there included some data driven U from each context. To my understanding, we should not keep the filter out contexts in this list, so this line is going to remove the U such as BM_22_MSBB_eQTL
when we decide to look into ROSMAP data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. Still, my reply is "After subsetting U (line 58 to 65), if the remaining elements in U are all zeros, as in the case of a singleton matrix removed from the mixture, then we should already set those matrices NULL." To elaborate, if a matrix is BM_22_MSBB_eQTL it would be non-zero only at the BM_22_MSBB_eQTL entry. Once we subset and removed that context, this entire compoment BM_22_MSBB_eQTL
will be all zeros and therefore get removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we have another option here: to either check if all elements are zero
to if all elements, except for the diagonal, are zero
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, this totally makes sense now for what @Chunmingl observed those redundant diagonal matrices!
you are right, i forgot how it works ... it was using context specific matrices only as initial values into the EM algorithm but the output will actually change the matrices as it sees fit, possibly slightly, like the above. Perhaps just to make it full rank. But this creates an issue for us.
We cannot simply remove diagonal matrices because signals on the diagonal can be strong and relevant.
If we stick to exactly our pipeline with all the column name conventions being consistent with the context names and singleton matrices also consistent, then perhaps what you did makes the most sense after all, to get things roughly working .... let's do that. My only request then is to add some comment strings in the code to point to this discussion here, for future references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing, will do
R/mash_wrapper.R
Outdated
return(list(U = U, w = w)) | ||
# remove the data driven U in U list, and recalculate w | ||
U[phenos_out] <- NULL | ||
w <- w[!names(w) %in% phenos_out] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here -- w
is already taken care of in the for loop above. I dont think it is correct to have this line here. Unless am missing someting?
No description provided.