Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving cluster size threshold choice #328

Open
leoisl opened this issue Feb 23, 2023 · 1 comment
Open

Improving cluster size threshold choice #328

leoisl opened this issue Feb 23, 2023 · 1 comment

Comments

@leoisl
Copy link
Collaborator

leoisl commented Feb 23, 2023

We could either automatically choose a cluster size threshold or at least provide cluster size histogram for user. Right now cluster sizes can be retrieved by parsing debugging files, but it might be worth it to upgrade it to a histogram and created by default? See mbhall88/drprg-paper#2

@mbhall88
Copy link
Member

As an example, here is the cluster size distribution for a HiSeq 2000 run with 75bp reads

    176 1
    194 2
    324 3
    399 4
    647 5
    927 6
   2747 7
   5190 8
   5987 9
   5047 10
   2236 11
    727 12
    328 13
    135 14
     51 15
      6 16

and now a 250bp Illumina sample for the same region

81806 1
  22335 2
   1485 3
    693 4
    382 5
    374 6
    455 7
    520 8
    434 9
    487 10
    441 11
    539 12
    541 13
    643 14
    504 15
    615 16
    696 17
    578 18
    713 19
    698 20
    641 21
    723 22
    674 23
    728 24
    673 25
    717 26
    697 27
    749 28
    746 29
    767 30
    761 31
    836 32
    949 33
   1312 34
   1495 35
   1772 36
   2021 37
   2358 38
   2402 39
   2235 40
   1875 41
   1358 42
    848 43
    666 44
    518 45
    355 46
    271 47
    148 48
    150 49
     41 50
     35 51
     10 52
      5 53
     17 54
      2 55
      7 56
      9 57
      7 58
      2 59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants