Skip to content
This repository has been archived by the owner on Feb 8, 2024. It is now read-only.

feature_id from aligned_features not translated to NeatMS output #6

Open
drewszabo opened this issue Oct 31, 2022 · 6 comments
Open
Assignees
Labels
feature New feature or request

Comments

@drewszabo
Copy link

Hi there,
I have exported XCMS features from patRoon (rickhelmus/patRoon) and created an aligned feature list. After running NeatMS, the feature_id values that were originally in the aligned_features.csv are overridden with sequential numbers. Is it possible to keep the feature_id values throughout the analysis?

Thanks,

@drewszabo
Copy link
Author

aligned_feature_table.csv
Example of patRoonData extracted and aligned with XCMS. mzML files available rickhelmus/patRoonData

@yoglo
Copy link
Collaborator

yoglo commented Oct 31, 2022

Hi Drew,
This is unfortunately not supported yet. I'll add it to the list of features for the next release (out by the end of this year).
As a temporary fix, you could retrieve the feature_id column by merging the original dataframe to the output df using 'maxo' (height), 'into' (area), 'intb' (baseline corrected area) as joining columns (NeatMS does not touch those values). Make sure to export those in NeatMS (export_properties = ["rt", "mz", "height", "area", "area_bc", "label" ...]). Let me know if this does not work and I'll be happy to help.
Please leave this issue open so I make sure this feature is included in next version.

@yoglo yoglo self-assigned this Oct 31, 2022
@yoglo yoglo added the feature New feature or request label Oct 31, 2022
@drewszabo
Copy link
Author

Good to know, thanks! Ill use the workaround in the meantime.

@margotbligh
Copy link

Hi Yoann,
I have a couple of somewhat related questions so I thought I would ask them here, if you prefer me to open a separate issue please just say. My original aligned feature table from XCMS contains 334,675 peaks (rows) from 28 samples. The feature table that I exported after running predictions with NeatMS (own model generated by TL from default model) contains information on 297,551 peaks (all classes exported without filtering). I joined the NeatMS table to my original feature dataframe as you described above. The joined table has NA values in the peak_id column for 136,216 rows (~45%). The input aligned feature table contains no NA values for peak_id. My questions are:

  1. Is the decrease in number of peaks between the input feature dataframe and the NeatMS output table only due to the minimum scan number filtering during prediction or is there another process that could result in a "loss" of peaks?

  2. How is it possible that I end up with ~45% of peaks in my NeatMS output which do not match to any peaks (based on sample name, height, area, baseline corrected area) in my input? I am relatively confident that the joining 'worked' as the other ~55% of peaks in the output table are successfully matched and peaks were matched for all samples. The 'successful' annotations also seem to make sense based on m/z and rt values.

@yoglo
Copy link
Collaborator

yoglo commented Dec 7, 2022

Hi Margot,

As this is related to this issue, it is fine, we can continue discussing this here. To answer your first question, the minimum scan number is the only filter that is applied automatically indeed. Many other filters can be applied at the export stage, but the default behaviour won't filter anything out (I just realised the doc is not fully up to date on this and does not reflect the true default behaviour of the function, I will update it shortly).

Assuming that you use default export parameters, the join you make should cover more than 45%. The only explanation I can think of without seeing the data itself would be the floating point precision. NeatMS uses float 32 to optimize memory usage. If R uses a higher precision, then the table join won't work. Could you check out the precision of the peaks (of the joining column) that cannot be matched and see if there is a difference? If you can send me the two dataframes I would be happy to look into it on my side as well.

@margotbligh
Copy link

Hi Yoann,

For the first question, thank you this makes sense, I just wanted to double check for my understanding.

And to the second point I think you are correct and it is an issue of precision. When I round maxo, into and intb in both tables to 5 decimal places before matching only 53 out of 297,551 peaks in the NeatMS output are not matched (i.e. peak_id == NA). I checked a couple of the peaks that were not matched before which are now matched and they seem to make sense (i.e. m/z and rt are similar to input peak values). So I think this solves it, thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants