A question about general mode in logistic mode #89

shangguandong1996 · 2021-06-11T02:47:43Z

Hi, deal developer

I have a question about general mode when traing logistic mode. In my thinking, the TF-specific mode is based on 237 TFs from REMAP. I take TF A as a example.
The 1 represent TF A ChIP-Seq bind in databse, 0 represent other TF bind in other peak in all enhancer peak stie. So using logistic mode will produce TF-A specific β parameters for motif zscore, socore of ATAC, H3K27ac, ChIP. Then I can predict probabilty of TF-A binding in my enhacer site using above paramter.
But If I just have motif databse in other species other than human or mouse, how can I train mode using logistic mode. After all, I have no training data. I noticed you have offer general mode for this condition, but I am confused the theory behind.

Best wishes

Guandong Shang

simonvh · 2021-06-11T06:38:20Z

Hi @shangguandong1996, great question! There are two possibilities:

General model
This is a model based on taking all 237 TFs together (in practice: a randomly sampled subset of peaks) and training a model based on these data. The result is that the model will capture parameters that work for predicting TF binding in general, based on motif, ATAC-seq and/or H3K27ac ChIP-seq. This works relatively well, but for some factors it will not be as good, as it won't learn TF-specific features. Take CTCF for instance, which is not well-predicted by H3K27ac. However, in general this works better than taking motif score alone, as you incorporate some measure of enhancer activity. This is the default mode, which will work for any motif database in any species.
Mapping the motif database to your species of interest.
Here, you can use the new gimme motif2factors command from GimmeMotifs to map the motif database that has been used for training (gimme.vertebrate.v5.0) to TFs in your new species. If you use this newly mapped database, TFs can be mapped to similar TFs in human. For instance, Xenopus tropicalis has an POU5F1 (OCT4) ortholog called pou5f3.1. This TF will share motifs with POU5F1 (see Fig. 3D in the preprint), and therefore it will use the model parameters from POU5F1. Now, we only tested this with other vertebrates. If you are interested in plants, for instance, you are probably best of using the general model.

shangguandong1996 · 2021-06-11T07:21:51Z

Thanks for your reply :) Actually, I am interested in plants .

According to the

taking all 237 TFs together (in practice: a randomly sampled subset of peaks) and training a model based on these data

Is it mean that all TF will share same parameter(or I can say weight) about motif, ATAC-Seq, H3K27ac ?
And in my thinking, if I taking all TF together, all peaks of 273 TF will be treated 1, so which part will 0? In my thinking, the y of logistic should have some 1 and 0. Or you sample genomic interval and treat these data as 0?

Please forgive me if I misunderstand something ：）

Guandong Shang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about general mode in logistic mode #89

A question about general mode in logistic mode #89

shangguandong1996 commented Jun 11, 2021

simonvh commented Jun 11, 2021

shangguandong1996 commented Jun 11, 2021

A question about general mode in logistic mode #89

A question about general mode in logistic mode #89

Comments

shangguandong1996 commented Jun 11, 2021

simonvh commented Jun 11, 2021

shangguandong1996 commented Jun 11, 2021