Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add missing-tolerant adjust method #117

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from
Open

add missing-tolerant adjust method #117

wants to merge 4 commits into from

Conversation

pdimens
Copy link

@pdimens pdimens commented Jul 20, 2020

This PR addresses issue #108 of including a method ignorant of missing values in a vector of P values. The reason this method is necessary/useful is that when using split-apply-combine methods for categorical data grouping when missing values are present, situations may arise where one of the grouped vectors contains only missing values and thus the resulting P value would be missing as well.

The introduced method to adjust creates a copy of the Pvalue vector, performs the correction method on all the non-missing Pvalues, and overwrites the original nonmissing Pvalues in the copied vector, returning the modified copy with all the missing values preserved in their original locations.

julia> a = [0.1, 0.2, missing, 0.4] ;

julia> adjust(a, Holm())
4-element Array{Union{Missing, Float64},1}:
 0.30000000000000004
 0.4
  missing
 0.4

This method should leave the original unaffected:

julia> b = adjust(rand(10), Holm())
10-element Array{Float64,1}:
 1.0
 1.0
 1.0
 1.0
 1.0
 0.18431430876447497
 1.0
 1.0
 1.0
 1.0

@nignatiadis
Copy link
Contributor

Thanks for this!

I have just one concern, that was already pointed out in #108 by @juliangehring. The behaviour here should depend a lot on how the missing came about. If, say the i-th test is null, then is the missingness independent of the p-value? If this is true for all nulls tests, then this PR is correct.

Otherwise, it is not. For example, iirc some eQTL software only computes p-values that are smaller than t for some small t; the rationale is that large p-values will never be rejected by a multiple testing procedure so it is not worth spending effort to compute them. In this case the missing p-values should be replaced by 1 and not filtered out ( or they can be filtered out and then one can use the adjust(pvals, n, PValueAdjustmentMethod) functionality in this package).

In any case, I do not mind having this PR as the default, but I think it should be made explicit to users.

@pdimens
Copy link
Author

pdimens commented Jul 20, 2020

That's a very good point I had not considered. It could, in theory, have a kwarg of a Bool to specify the handling of missing as the ignore or include-as-1 method, with a suitable docstring explaining the difference. What are your thoughts on that?

@pdimens
Copy link
Author

pdimens commented Jul 21, 2020

@nignatiadis the committed change suggests a method accounting for the scenarios your describe:

julia> a = [0.1, 0.2, missing, 0.4] ;

julia> adjust(a, Holm())
4-element Array{Union{Missing, Float64},1}:
 0.30000000000000004
 0.4
  missing
 0.4

julia> adjust(a, Holm(), include_missing = false)    # verbose version of default kwarg
4-element Array{Union{Missing, Float64},1}:
 0.30000000000000004
 0.4
  missing
 0.4

julia> adjust(a, Holm(), include_missing = true)
4-element Array{Float64,1}:
 0.4
 0.6000000000000001
 1.0
 0.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants