Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in extracting sample id from script aggregate_rsem_results.py #524

Open
mw201608 opened this issue Feb 8, 2023 · 1 comment
Open

Comments

@mw201608
Copy link

mw201608 commented Feb 8, 2023

In the container
The rsem aggregation step failed to extract the proper sample ids when sample names contained character ., and may even crashed (error message shown below) when the partially extracted sample ids are not unique.

Traceback (most recent call last):
  File "/opt/src/aggregate_rsem_results.py", line 90, in <module>
    df_dict[c].to_hdf(tmp_store.name, '{0:s}{1:d}'.format(c,k))
  File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/core/generic.py", line 2490, in to_hdf
    pytables.to_hdf(
  File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/io/pytables.py", line 282, in to_hdf
    f(store)
  File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/io/pytables.py", line 265, in <lambda>
    f = lambda store: store.put(
  File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/io/pytables.py", line 1030, in put
    self._write_to_group(
  File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/io/pytables.py", line 1697, in _write_to_group
    s.write(
  File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/io/pytables.py", line 3093, in write
    raise ValueError("Columns index has to be unique for fixed format")
ValueError: Columns index has to be unique for fixed format

This is because the sample ids are extracted through string split by .,
sample_id = filename.split('.')[0]

It might be better to use a regular expression to extract the complete sample ids, like
re.sub(".rsem.[a-z]+.results", "", filename)

Minghui

@gaow
Copy link
Contributor

gaow commented Feb 8, 2023

Thanks @mw201608 ! @hsun3163 please correct me if I'm wrong but that files was from GTEx repository right? I wonder if we should just change it and maintain a local copy of it along the lines @mw201608 suggested or if you can check they have an update? I say we start changing it locally and then we send PR on their repo as we see fit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants