Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong probe types used for EPIC samples in preprocessNoob due to b2-based manifest and b4-based annotation #243

Open
maximilian-leitheiser opened this issue Mar 4, 2023 · 0 comments

Comments

@maximilian-leitheiser
Copy link

maximilian-leitheiser commented Mar 4, 2023

Thanks a lot for making minfi available to the scientific community! I have used it extensively in my research over the last few years. However, I think I might have come across a mistake in preprocessNoob that occurs for EPIC (v1) samples:

After reading such an EPIC sample from GEO

> library(GEOquery)
> library(minfi)
> library(stringr)

> # get IDAT files from GEO
> GEO_files <- getGEOSuppFiles("GSM5239506")
> sapply(rownames(GEO_files), gunzip, overwrite = TRUE)

> # read IDATs
> IDAT_basename <- rownames(GEO_files)[[1]]
> IDAT_basename <- paste(head(unlist(str_split(IDAT_basename, "_")), n = -1), collapse = "_")
> rgSet <- minfi::read.metharray(basenames = IDAT_basename_2)
> dim(rgSet)
[1] 1051815       1
> annotation(rgSet)
                         array                     annotation 
"IlluminaHumanMethylationEPIC"                 "ilm10b4.hg19"

and executing this first code chunk from preprocessNoob

> oob <- getOOB(rgSet)
> GreenOOB <- oob[["Grn"]]
> RedOOB <- oob[["Red"]]
> MSet <- preprocessRaw(rgSet)
> probe.type <- getProbeType(MSet, withColor = TRUE)
> Green_probes <- which(probe.type == "IGrn")
> Red_probes <- which(probe.type == "IRed")
> d2.probes <- which(probe.type == "II")
> Meth <- getMeth(MSet)
> Unmeth <- getUnmeth(MSet)

the number of CpGs in the Meth/Unmeth matrices and probe.type are not the same:

> nrow(Meth)
[1] 866091
> length(probe.type)
[1] 865859

Hence, probe.type does not correspond to the actual probe types of Meth and Unmeth and the following subsequent indexing of them in .preprocessNoob with Green_probes and so on will produce incorrect results:

# NormExp estimates for Green and Red
       dat <- list(
           Green = list(
               M =  Meth[Green_probes, , drop = FALSE],
               U =  Unmeth[Green_probes, , drop = FALSE],
               D2 = Meth[d2.probes, , drop = FALSE]),
           Red = list(
               M =  Meth[Red_probes, , drop = FALSE],
               U =  Unmeth[Red_probes, , drop = FALSE],
               D2 = Unmeth[d2.probes, , drop = FALSE]))

The original discrepency in CpG sites occurs because

  • to create MSet, preprocessRaw uses IlluminaHumanMethylationEPICmanifest (through getProbeInfo and getManifest), which is based on the B2 manifest file from Illumina
  • to create probe.type, getProbeType relies on IlluminaHumanMethylationEPICanno.ilm10b4.hg19 (through getAnnotation and due to .default.epic.annotation <- "ilm10b4.hg19" in utils.R), which is based on the B4 manifest file from Illumina

and these two versions of the manifest file have a differing number of probes.

For our EPIC sample from above, making a quick-and-dirty b2-based version of probe.info leads to the following results:

> # make probe.info version based on b2 manifest
> type_1 <- getManifest(rgSet)@data$TypeI
> type_1 <- type_1[, c("Name", "AddressA", "AddressB", "Color")]
> type_1$Type <- "I"
  
> type_2 <- getManifest(rgSet)@data$TypeII
> type_2 <- type_2[c("Name", "AddressA")]
> type_2$Color <- ""
> type_2$AddressB <- ""
> type_2$Type <- "II"
  
> type_df <- rbind(type_1, type_2)
> type_df <- type_df[match(rownames(MSet), type_df$Name), ]
> probe.type_b2 <- paste0(type_df $Type, type_df_fitted$Color)
> Green_probes_b2 <- which(probe.type_b2 == "IGrn")
> Red_probes_b2 <- which(probe.type_b2 == "IRed")
> d2.probes_b2 <- which(probe.type_b2 == "II")

> # check probes the original method is actually indexing
> table(probe.type_b2[Green_probes])

 IGrn  IRed 
17984 31955 
> table(probe.type_b2[Red_probes])

 IGrn  IRed 
31956 60242 
> table(probe.type_b2[d2.probes])

  IGrn     II   IRed 
    14 723678     30 

I'd be curious to hear your thoughts! Am I missing something, or did I setup something incorrectly?

Kind regards,
Max

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant