-
Notifications
You must be signed in to change notification settings - Fork 628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using custom objects with paths #2085
Comments
I wonder if it would be possible by using a different type as a member of the import nextflow.io.ValueObject
import nextflow.processor.TaskPath
@ValueObject
class SampleData {
/** The sample name */
String name
/** The path to the read one FASTQ */
TaskPath r1
/** The path to the read two FASTQ */
TaskPath r2
} Or maybe |
Would just |
I'd love to have things properly typed (eg. |
In principle |
Does nextflow inspect the object for members of type |
What do you mean for localize? |
If I pass in a
|
Files being localized from the working directory to the analysis node (mainly for cloud workflows) |
we are entering in a minefield .. 😄 no, the file staging (ie localisation) is determined by the |
There's a variety of situations where this capability would make it less error prone to deal with collections of genomic files, including VCF (+ index), BAM (+ index), FASTA (+ index). In the above example, you could imagine additional behaviors such as "is_paired" to allow scripts to handle single- or paired-end data in a more expressive way. Is this something that would completely break the process handling design or is there an opportunity for a user contribution in this area? |
This comment was marked as outdated.
This comment was marked as outdated.
Still relevant, here’s what I do: import groovy.transform.Immutable
import nextflow.io.ValueObject
import nextflow.util.KryoHelper
@ValueObject
@Immutable(copyWith=true, knownImmutables = ['path'])
class Metadata {
static {
// Register this class with the Kryo framework that serializes and deserializes objects
// that pass through channles. This allows for caching when this object is used.
KryoHelper.register(Metadata)
}
Path path
} |
@nh13 Do you have an example of a pipeline you've used that in? |
@emiller88 not in one I can share publicly, happy to help with a small hello world example. |
I figured 😞. Not sure if that was you offering to write it or answer questions. I'm guessing you toss the |
I was offering, but it’s not high on my list. |
@nh13 , I was hoping that a fully worked example wouldn't be necessarily be needed if it is just a question of how to structure the "output" line in the process definition such that the custom class will be appropriately instantiated by the framework, but perhaps that is wishful thinking. Also, one additional request for clarification, are you saying in the approach that you use above that process results are appropriately cached and any suitably annotated paths in the custom class are also localized? |
|
@nh13 Okay I threw together an example
|
@emiller88 , I suspect compilation of your custom class is failing on resolution of the "Path" type (needs to be imported?). Maybe can strip down example to use native types as a first pass. |
This was it. I've pushed up a working test, but the class just replaces the map, so it's not really doing much. |
@nh13 , I'm assuming that in your previous example, the described workaround only allows for definition of the custom class, but not any associated localization of the files? If that is the case, I tried an alternate approach that would require some modification to the underlying Nextflow framework. Briefly, the custom class must be treated as a path input and provides a collection of associated paths that must be staged. While this addresses the file staging need, none of the custom attributes are accessible. For that, there is an additional interface generated, CustromFileAttributesProvider that the custom class can implement. If available, this interface method is used to define additional properties on the object exposed during script generation. The changes to Nextflow are summarized in a PR here, ryankelley-lm#1, with an associated example leveraging the interface at https://github.com/ryankelley-lm/hello-custom-objects-1/tree/custom-object-download but perhaps you were able to achieve this in a simpler way? @pditommaso , the PR above is a rough first pass, but do you think the general concept is something that could be reworked into an acceptable PR? CC @emiller88 |
@ryankelley-lm if you use a path it will work just fine for local paths. I’m not sure how or if remote files will work. |
I think to handle this properly it's required to introduce a new input/output qualifier A tentative syntax could be as shown below: @ValueObject
class SampleData {
String name
Path r1
Path r2
}
process foo {
output:
record SampleData(name: 'some_name', r1: '*_1.bam', r2: '*_2.bam')
'''
some_command
'''
}
process bar {
input:
record data: SampleData()
"""
other_command --sample $data.name --reads $data.r1 $data.r2
"""
}
workflow {
foo | bar
}
The idea is to continue to have tuple-like semantics both for the input and output declaration to have the ability to use custom record types |
@pditommaso , At some point, would you still require some contract to be implemented by the custom class, such that the framework knows what paths the custom object is dealing with? |
@pditommaso , or is the idea that a "record" is essentially a named tuple, such that the framework knows each of the data members and for path, can handle them appropriately? |
FWIW, I believe most of the desired behavior here is achievable with the current release of Nextflow if you
The idea is that Nextflow will see your custom object as a tuple and treat it accordingly. In your own scripts, you can use the specific behaviors of your custom object. Since Nextflow knows how to stage tuple appropriately, all of the staging will be handled correctly. This idea is captured in the updated example here, https://github.com/ryankelley-lm/hello-custom-objects-1/tree/custom-object-download nextflow.enable.dsl = 2
process STEPONE_PROCESS {
input:
tuple val(myname), path(r1), path(r2)
output:
tuple val("${myname}"), path("output1.txt"), path("output2.txt"), emit: paths
script:
"""
cat ${r1} ${r2} >> output1.txt
cat ${r2} ${r1} >> output2.txt
"""
}
workflow STEPONE_WORKFLOW {
take: workflowInput
main:
processResult = STEPONE_PROCESS(workflowInput).map {
name, path1, path2 ->
return new CustomObject(name, path1, path2)
}
emit:
processResult
} import groovy.transform.Immutable
import nextflow.io.ValueObject
import nextflow.util.KryoHelper
import java.nio.file.Path
import nextflow.util.ArrayBag
import java.util.List
import nextflow.file.FileHolder
import java.util.Map
class CustomObject extends ArrayBag {
static {
// Register this class with the Kryo framework that serializes and deserializes objects
// that pass through channles. This allows for caching when this object is used.
KryoHelper.register(CustomObject)
}
String name
Path r1
Path r2
CustomObject(String name, Path r1, Path r2)
{
super()
this.name = name
this.r1 = r1
this.r2 = r2
this.target.add(name)
this.target.add(r1)
this.target.add(r2)
}
public String toString()
{
return this.name + " " + this.r1.toString() + " " + this.r2.toString()
}
} |
Now that we can pass around custom (value/metadata) classes and have them cached, it would be wonderful if Nextflow could recognize
Path
members/properties so they are localized and available for the process that uses the class. For example:There could be a way annotate
r1
andr2
(or return the list of paths etc) for nextflow to localize them.The text was updated successfully, but these errors were encountered: