You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As encountered by @Andrew42, when running with MultiProductionDataset Lobster blithely decides that it should stream gridpack files, even though CMSSW doesn't know how to do that. This leads to the gridpack file being passed into the config as root://deepthrought.crc.nd.edu://.... A workaround is to disable streaming, but if we are doing multistage production (i.e. GEN-SIM+DIGI-RECO+MiniAOD) that will mean none of the steps can stream inputs, since disable_input_streaming is a global parameter of StorageConfiguration. It would be nice to have finer grained control over XRootD streaming so that we could stream some input files but not others.
I can think of two options for accomplishing this:
Quick and Dirty: Provide an option for enabling streaming only for files that match a particular pattern. I would probably make the default value be .*\.root$ or something like that, so that only files that end in .root would be streamed unless the user changed that behavior.
Bigger Re-Engineering: We could make the StorageConfiguration object a property of the Workflow instead of the global Config for the whole Lobster run. This has the benefit of providing a lot more flexibility as each Workflow can have a separate input and output config, but I think this would require a major re-engineering of Lobster because every time files were being accessed (e.g. even in the master) you'd need to know which Workflow those files were coming from and load the appropriate config.
Although I like the thought of being more flexible, I'm leaning towards the "Quick and Dirty" solution. I suppose another response would be that nothing's broken so don't fix it. It's a feature; not a bug. Input welcome, especially from @annawoodard and @matz-e!
The text was updated successfully, but these errors were encountered:
One alternative quick and dirty approach that would be more flexible than pattern matching but similarly simple would be to make disable_input_streaming a property of the Workflow (passed as an argument in the constructor) instead of the StorageConfiguration (so do not completely re-engineer everything, just that one property). Then instead of setting parameters['disable streaming']here you would set it in Workflow.adjusthere. Note that if you go that route, it would probably make sense to also make disable_stage_in_acceleration a property of the workflow.
I think that would completely solve this specific problem. So the next question would be: what are the other use cases of the bigger re-engineering approach, and are they worth the development effort?
@annawoodard: I like that suggestion. That's what I'll plan to do, unless I run into a problem when I start working out the implementation. Regarding the more expansive solution, no one is asking for this. The only use case I can dream up is one where, in a single Lobster project, you'd be coordinating a multistage/multisite production where, for example, you want to store GEN-SIM at ND, DIGI-RECO at the LPC CAF, and mini-AOD/nano-AOD at UVa, or something crazy like that. I think we can safely defer any idea of doing that until someone actually asks whether such a thing would be feasible.
As encountered by @Andrew42, when running with
MultiProductionDataset
Lobster blithely decides that it should stream gridpack files, even though CMSSW doesn't know how to do that. This leads to the gridpack file being passed into the config asroot://deepthrought.crc.nd.edu://...
. A workaround is to disable streaming, but if we are doing multistage production (i.e. GEN-SIM+DIGI-RECO+MiniAOD) that will mean none of the steps can stream inputs, sincedisable_input_streaming
is a global parameter ofStorageConfiguration
. It would be nice to have finer grained control over XRootD streaming so that we could stream some input files but not others.I can think of two options for accomplishing this:
.*\.root$
or something like that, so that only files that end in.root
would be streamed unless the user changed that behavior.StorageConfiguration
object a property of theWorkflow
instead of the globalConfig
for the whole Lobster run. This has the benefit of providing a lot more flexibility as eachWorkflow
can have a separate input and output config, but I think this would require a major re-engineering of Lobster because every time files were being accessed (e.g. even in the master) you'd need to know whichWorkflow
those files were coming from and load the appropriate config.Although I like the thought of being more flexible, I'm leaning towards the "Quick and Dirty" solution. I suppose another response would be that nothing's broken so don't fix it. It's a feature; not a bug. Input welcome, especially from @annawoodard and @matz-e!
The text was updated successfully, but these errors were encountered: