"What is a sample?" #50

misialq · 2023-07-21T14:47:58Z

misialq
Jul 21, 2023
Maintainer

Hey @bokulich-lab/moshpit-team,

This topic is inspired by #49 (or is rather a follow up). @nbokulich and I were discussing about going from MAGs to annotations, and more specifically about generating some form of a "feature table" which would represent counts of observed annotations per genome (MAG). We felt that calling it a FeatureTable may not exactly be appropriate, as traditionally one axis in those is supposed to represent samples. This brought about a question: what actually is a sample? Perhaps we could consider a genome as a kind of a sample (representing a collection of annotations)? What do you all think?

Now, what I would like to propose here is the following:

given an input FeatureData[MAG], representing a collection of dereplicated MAGs, the eggnog-annotate action could produce a GenomeData[NOG] artifact (since we effectively obtain a set of annotations per genome, and that's exactly what we designed that type for)
we introduce a new type: GenomeTable[Frequency] which works in the very similar way to how a typical FeatureTable works but instead of the sample axis we have a genome axis (using a separate type allows us to have a distinction between samples and genomes for the purpose of not using genome table where it simply should not be used)
we introduce a new action which will accept the GenomeTable[Frequency] and the corresponding FeatureTable[Frequency] (representing frequencies of MAGs per sample; to be figured out still) and collapses those two together to produce a new FeatureTable[Frequency], this time representing samples vs. functional features/annotations

That's just a dump of some "initial" thoughts. Please let me know what you all think - thanks! 🙏

gregcaporaso · 2023-07-21T16:54:22Z

gregcaporaso
Jul 21, 2023
Collaborator

Interesting! Here's my initial thoughts.

In this context I would consider a genome to be a sample. "Sample" and "feature" are intentionally generic, and a lot of the downstream things we'd do would make perfect sense in this context (e.g., many alpha/beta diversity metrics would stand to provide useful information for comparing genomes, as would "taxa" barplots illustrating functional composition by genome).

using a separate type allows us to have a distinction between samples and genomes for the purpose of not using genome table where it simply should not be used

What are you thinking falls in that category of things we'd do with a FeatureTable that you shouldn't do with a GenomeTable?

0 replies

nbokulich · 2023-07-21T18:28:03Z

nbokulich
Jul 21, 2023
Maintainer

In this context I would consider a genome to be a sample

Yeah this is what @misialq and I were discussing... that a sample is basically just a collection of observations (perhaps an oversimplification), in which case a genome fits the bill.

What are you thinking falls in that category of things we'd do with a FeatureTable that you shouldn't do with a GenomeTable?

This is what it all comes down to, of course, and so far we don't have any such actions. All current actions that operate on a feature table should probably also work with genomes... e.g., filtering, alpha and beta diversity, differential abundance tests, supervised classification... probably even rarefaction and grouping. In the end, it feels like there may just be some theoretical not-yet-extant edge cases, for which @misialq proposed that we could always use properties to control.

Maybe the one case in which a GenomeTable would be distinct from a FeatureTable is for the action that @misialq mentions above, and other cases where a GenomeTable (but not a FeatureTable) should be passed as input. So I suppose this could be a property. 🤷

I personally would prefer using FeatureTable, to avoid heaps of work. But GenomeTable just feels safer. (perhaps too safe)

0 replies

misialq · 2023-07-24T10:00:45Z

misialq
Jul 24, 2023
Maintainer Author

I also felt that having a separate GenomeTable type would nicely play with GenomeData as it has the obvious resemblance to FeatureData/FeatureTable pair of artifacts (I know, that's not a reason on its own to justify a new type but rather a cherry on top). Also, could we actually make it inherit from the FeatureTable, in which case it could still work in many places which require an actual FeatureTable? Or would this not work for a type?

0 replies

gregcaporaso · 2023-07-28T21:28:04Z

gregcaporaso
Jul 28, 2023
Collaborator

As far as I know there is not a mechanism for type inheritance. I'm feeling like a property makes the most sense here too - like @nbokulich mentions to avoid the work with updates to lots and lots of relevant actions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"What is a sample?" #50

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

"What is a sample?" #50

misialq Jul 21, 2023 Maintainer

Replies: 4 comments

gregcaporaso Jul 21, 2023 Collaborator

nbokulich Jul 21, 2023 Maintainer

misialq Jul 24, 2023 Maintainer Author

gregcaporaso Jul 28, 2023 Collaborator

misialq
Jul 21, 2023
Maintainer

gregcaporaso
Jul 21, 2023
Collaborator

nbokulich
Jul 21, 2023
Maintainer

misialq
Jul 24, 2023
Maintainer Author

gregcaporaso
Jul 28, 2023
Collaborator