Discrepancy between metadata search results & piped fetch results #125

nvpatin · 2023-02-13T16:24:04Z

I am trying to download a set of samples based on metadata information. When I search with my parameters, I find a certain number of samples; but when I pipe those results into 'redbiom fetch' (with a particular context) it downloads a different number of samples. I think there is a similar problem when I pipe the search results into 'redbiom summarize contexts'; it shows a list of contexts, some of which are associated with my samples but some of which are not, and I have to guess which one I have to use for fetching. So I have two questions: 1) How can I see the contexts associated only with my searched samples? and 2) How can I only fetch the samples associated with my metadata search? See below for the problems associated with question 2.

Looking for marine water samples within the EMP

% redbiom search metadata "where qiita_study_id == 13114 and empo_4 == 'Water (saline)'" | wc -l
39

Defining a context based on previous search results (it took several attempts to find one that worked)

% echo $CTX
Deblur_2021.09-Illumina-16S-V4-150nt-ac8c0b

Fetching samples based on metadata and context

% redbiom search metadata "where qiita_study_id == 13114 and empo_4 == 'Water (saline)'" | redbiom fetch samples --context $CTX --output EMP_marine_samples.biom
38 sample ambiguities observed. Writing ambiguity mappings to: EMP_marine_samples.biom.ambiguities

Data summary shows many more samples than metadata search originally found

% biom summarize-table -i EMP_marine_samples.biom | head
Num samples: 97
Num observations: 16,547
Total count: 1,354,853
Table density (fraction of non-zero values): 0.030

Counts/sample summary:
Min: 4,111.000
Max: 38,769.000
Median: 12,268.000
Mean: 13,967.557

nvpatin · 2023-02-13T16:42:23Z

Update: I see that the list of samples found in the metadata search and the list of samples in the downloaded biom table do match, but the biom table seems to have sub-set the samples. For example, "13114.palenik.42.s001" in the sample list corresponds to the sample IDs "13114.palenik.42.s001.134469" and "13114.palenik.42.s001.134523" in the biom table. The sample IDs in the metadata table match the list of sample IDs in the biom table, but all the metadata values are identical within each sample "grouping", e.g. "13114.palenik.42.s001.134469" and "13114.palenik.42.s001.134523" have exactly the same metadata.

Is there documentation about how and why that sub-sampling was done? I guess I can combine sample replicates (if that's what they are).

antgonza · 2023-02-13T17:11:26Z

@nvpatin; thank you for the question and update. I think @justinshaffer might be able to answer your question.

wasade · 2023-02-15T17:03:24Z

Hi @nvpatin, sorry for a brief delay, I was OOO the last few days.

For (1), that is an excellent idea and is not currently something that is exposed to the user, but would be a great addition. I would be happy to propose a suggestion to do this via bash script or python as a stop gap.

For (2), the issue is that the same physical sample has been sequenced multiple times. The command shown is correct, but each individual sequencing run is differentiated. These "ambiguities" are expressed in the resulting ambiguity map. You can get around this by specifying --resolve-ambiguities with the call to fetch. For redbiom fetch samples, I usually do --resolve-ambiguities merge which combines the sample data from multiple runs together.

If you haven't seen it, there is a longer tutorial on use on the QIIME 2 forum.

nvpatin · 2023-02-16T23:57:43Z

Thank you @wasade that's very helpful! I will check back for future functionality that provides contexts associated with samples in the metadata search results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy between metadata search results & piped fetch results #125

Discrepancy between metadata search results & piped fetch results #125

nvpatin commented Feb 13, 2023 •

edited

Loading

nvpatin commented Feb 13, 2023 •

edited

Loading

antgonza commented Feb 13, 2023

wasade commented Feb 15, 2023

nvpatin commented Feb 16, 2023

Discrepancy between metadata search results & piped fetch results #125

Discrepancy between metadata search results & piped fetch results #125

Comments

nvpatin commented Feb 13, 2023 • edited Loading

Looking for marine water samples within the EMP

Defining a context based on previous search results (it took several attempts to find one that worked)

Fetching samples based on metadata and context

Data summary shows many more samples than metadata search originally found

nvpatin commented Feb 13, 2023 • edited Loading

antgonza commented Feb 13, 2023

wasade commented Feb 15, 2023

nvpatin commented Feb 16, 2023

nvpatin commented Feb 13, 2023 •

edited

Loading

nvpatin commented Feb 13, 2023 •

edited

Loading