Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about general mode in logistic mode #89

Open
shangguandong1996 opened this issue Jun 11, 2021 · 2 comments
Open

A question about general mode in logistic mode #89

shangguandong1996 opened this issue Jun 11, 2021 · 2 comments

Comments

@shangguandong1996
Copy link

Hi, deal developer

I have a question about general mode when traing logistic mode. In my thinking, the TF-specific mode is based on 237 TFs from REMAP. I take TF A as a example.
The 1 represent TF A ChIP-Seq bind in databse, 0 represent other TF bind in other peak in all enhancer peak stie. So using logistic mode will produce TF-A specific β parameters for motif zscore, socore of ATAC, H3K27ac, ChIP. Then I can predict probabilty of TF-A binding in my enhacer site using above paramter.
But If I just have motif databse in other species other than human or mouse, how can I train mode using logistic mode. After all, I have no training data. I noticed you have offer general mode for this condition, but I am confused the theory behind.

Best wishes

Guandong Shang

@simonvh
Copy link
Member

simonvh commented Jun 11, 2021

Hi @shangguandong1996, great question! There are two possibilities:

  1. General model
    This is a model based on taking all 237 TFs together (in practice: a randomly sampled subset of peaks) and training a model based on these data. The result is that the model will capture parameters that work for predicting TF binding in general, based on motif, ATAC-seq and/or H3K27ac ChIP-seq. This works relatively well, but for some factors it will not be as good, as it won't learn TF-specific features. Take CTCF for instance, which is not well-predicted by H3K27ac. However, in general this works better than taking motif score alone, as you incorporate some measure of enhancer activity. This is the default mode, which will work for any motif database in any species.

  2. Mapping the motif database to your species of interest.
    Here, you can use the new gimme motif2factors command from GimmeMotifs to map the motif database that has been used for training (gimme.vertebrate.v5.0) to TFs in your new species. If you use this newly mapped database, TFs can be mapped to similar TFs in human. For instance, Xenopus tropicalis has an POU5F1 (OCT4) ortholog called pou5f3.1. This TF will share motifs with POU5F1 (see Fig. 3D in the preprint), and therefore it will use the model parameters from POU5F1. Now, we only tested this with other vertebrates. If you are interested in plants, for instance, you are probably best of using the general model.

@shangguandong1996
Copy link
Author

Thanks for your reply :) Actually, I am interested in plants .

According to the

taking all 237 TFs together (in practice: a randomly sampled subset of peaks) and training a model based on these data

Is it mean that all TF will share same parameter(or I can say weight) about motif, ATAC-Seq, H3K27ac ?
And in my thinking, if I taking all TF together, all peaks of 273 TF will be treated 1, so which part will 0? In my thinking, the y of logistic should have some 1 and 0. Or you sample genomic interval and treat these data as 0?

Please forgive me if I misunderstand something :)

Guandong Shang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants