Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nanostring update 20181010 #50

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mollie-barnard
Copy link

The following pull request adds, fixes, and/or modifies:

  1. Insert text here referencing the exact scripts that update
  2. If possible, I will make sure that the pull request only updates one feature at a time
  3. Add to this list if necessary

I performed the following prior to filing this pull request:

  • Tested that my change does not break the analysis pipeline
  • Ran a linter through my code
  • Update environment.yml if my code introduces new packages.

Lastly, I will assign individuals to review my code and be proud of making the pull request!

Copy link
Collaborator

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on the first pull request! 🎉 🎉 looks great - just a couple of comments/questions

dplyr::arrange(desc(rfFreq)) %>%
dplyr::top_n(n = top_n_genes) %>%
dplyr::mutate(genes = toupper(genes))
dplyr::filter(genes == "BOP1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets do it this way instead - I think it is cleaner:

capture_genes <- c("BOP1", "DNAI1", "HSF1", "LRRC50", "MS4A3",
                   "NTN2L", "SHARPIN", "SLC12A3", "SOX10", "TSNAXIP1")

classifier_df <- readr::read_csv(file) %>%
    dplyr::filter(genes %in% capture_genes)

@@ -14,12 +14,40 @@ library(dplyr)

file <- file.path("7.Nanostring", "data", "overallFreqs.csv")

# OPTION 1: Classifier genes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the plan currently with this? Are we going to go with the different options and run independently?

@mollie-barnard
Copy link
Author

mollie-barnard commented Oct 10, 2018 via email

@gwaybio
Copy link
Collaborator

gwaybio commented Oct 11, 2018

I made this choice because
I was unsure of what steps in parts B-F I would need to change if I had all
of the gene sets lead to separate outputs (e.g., would I need to have 4
separate pipelines or somehow further streamline the output file naming
process?). With this "quick fix" all I have to do is rename the "figures"
and "results" folders at the completion of each full-pipeline run and run
again with the next gene set. I'm open to more elegant solutions.

This sounds good if you're looking to get results and iterate quickly. If this is the case, I am on board. However, it is not a good long term solution from a maintainability perspective.

I think we should run the pipeline through with an alternative set (like the extra 10 genes) and see what other issues we run into and in which scripts. When I wrote this I originally didn't intend for it to be run using different genes, and we're paying the price now! We should probably address any issues we run into.

For this particular solution, we may want to consider switching to an optparse solution. See here for an example. With this functionality we can also programmatically change the results and figures directories too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants