Fail to reproduce Fig.5 results for human data #65

jonathan-f · 2022-04-11T14:28:53Z

Hello,
First thanks for your impressive work!

I am facing issues in reproducing the results of Fig.5 of the paper for the human datasets using the STRING ground truth net, while I manage to reproduce the results for the mouse datasets.

I used the script generateExpInputs.py, getting the following statistics:

hHep STRING: #TFs: 409, #Genes: 656, #Edges: 15046, Density: 0.056
hESC STRING: #TFs: 343, #Genes: 517, #Edges: 8514, Density: 0.048

These densities are different from fig. 5, why?

I ran PIDC several times using BLRunner.py and then I used BLEvaluator.py to compute the early precision (EP), I always get:

hHep STRING: EP=0.08, while it should be 0,105 according to Fig. 5 (multiplying the EPR by the net density from fig. 5=0.03).
hESC STRING: EP=0.074, which seems correct with the density indicated in Fig.5, but not with the one reported above.

Also for the other algorithms I get an EP smaller than that shown in Fig.5. Do you have an idea of the reason? Could you provide your rankedEdges.csv file for PIDC for the hHep dataset for comparison please?

Thank you
Best wishes
Jonathan

adyprat · 2022-04-12T23:54:38Z

Hi @jonathan-f,

Thank you for bringing this to our notice. For some strange reason, all the lines in STRING-network.csv are duplicated and the #Edges is 2x / the density is 2x (i.e., EPR is 0.5x) the one you see in Fig 5. While we make changes to generateExpInputs.py to compute the correct number of of edges; please use len(netDF.drop_duplicates()) instead of net.shape[0] to get the correct "#Edges".

Best,
Aditya

tmmurali · 2022-04-13T00:49:34Z

@adyprat thanks for looking into this issue. Can we upload the correct version of the STRING network to prevent this issue for recurring again? I presume this file is on Zenodo.

jonathan-f · 2022-04-13T08:49:46Z

Dear @adyprat , thanks for the swift reply and for finding the error.

Best
Jonathan

JaneJiayiDong · 2022-05-09T09:22:37Z

Hello, sorry for bothering. I am also facing some issues in reproducing the results of Fig.5 of the paper. I downloaded the data (BEELINE-data and Networks) from Zenodo and used the the generateExpInputs.py.

I used the expression data (mESC) and the network(Non-Specific-ChIP-seq-network.csv), and set other parameter as default. The mistake is as follows:

Traceback (most recent call last):
  File "generateExpInputs_raw.py", line 171, in <module>
    print("\n#TFs: %d, #Genes: %d, #Edges: %d, Density: %.3f" % (nTFs,nGenes,netDF.shape[0],netDF.shape[0]/((nTFs*nGenes)-nTFs)))
ZeroDivisionError: division by zero

I found that the Gene names in Non-Specific-ChIP-seq-network.csv are uppercase, which is different from ExpressionData.csv, so I add
expr_df.index = expr_df.index.to_series().apply(lambda x:x.upper())
before
expr_df.to_csv(opts.outPrefix+'-ExpressionData.csv')
The result is:
#TFs: 27, #Genes: 144, #Edges: 264, Density: 0.068

After looking this page, I try to reproduce the results for the hESC datasets using the STRING ground truth net, and the result is:
#TFs: 28, #Genes: 82, #Edges: 112, Density: 0.049

I need some help for these problems. Maybe there are some steps for data preprocessing and I ignore them, please give me some advice.

Thank you
Best wishes
Jiayi Dong

tmmurali assigned ktakers and adyprat Apr 11, 2022

JaneJiayiDong mentioned this issue May 11, 2022

Some problem of reproducing Fig.5 results #67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to reproduce Fig.5 results for human data #65

Fail to reproduce Fig.5 results for human data #65

jonathan-f commented Apr 11, 2022

adyprat commented Apr 12, 2022

tmmurali commented Apr 13, 2022

jonathan-f commented Apr 13, 2022

JaneJiayiDong commented May 9, 2022 •

edited

Loading

Fail to reproduce Fig.5 results for human data #65

Fail to reproduce Fig.5 results for human data #65

Comments

jonathan-f commented Apr 11, 2022

adyprat commented Apr 12, 2022

tmmurali commented Apr 13, 2022

jonathan-f commented Apr 13, 2022

JaneJiayiDong commented May 9, 2022 • edited Loading

JaneJiayiDong commented May 9, 2022 •

edited

Loading