Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different format of snp for Knight(WashU) genotype data #525

Open
zq2209 opened this issue Feb 14, 2023 · 2 comments
Open

Different format of snp for Knight(WashU) genotype data #525

zq2209 opened this issue Feb 14, 2023 · 2 comments
Assignees

Comments

@zq2209
Copy link
Contributor

zq2209 commented Feb 14, 2023

When running cis for Knight data, I am getting the error.

Traceback (most recent call last):
File "/home/zq2209/.sos/e0257c72c6594a84/singularity_run_4940.py", line 61, in
pairs_df = pairs_df.assign(
File "/opt/conda/lib/python3.8/site-packages/pandas/core/frame.py", line 3827, in assign
data[k] = com.apply_if_callable(v, data)
File "/opt/conda/lib/python3.8/site-packages/pandas/core/common.py", line 329, in apply_if_callable
return maybe_callable(obj, **kwargs)
File "/home/zq2209/.sos/e0257c72c6594a84/singularity_run_4940.py", line 63, in
ref = lambda dataframe: dataframe['variant_id'].map(lambda variant_id:variant_id.split("_")[-2])).assign(
File "/opt/conda/lib/python3.8/site-packages/pandas/core/series.py", line 3879, in map
new_values = super()._map_values(arg, na_action=na_action)
File "/opt/conda/lib/python3.8/site-packages/pandas/core/base.py", line 937, in _map_values
new_values = map_f(values, mapper)
File "pandas/_libs/lib.pyx", line 2467, in pandas.libs.lib.map_infer
File "/home/zq2209/.sos/e0257c72c6594a84/singularity_run_4940.py", line 63, in
ref = lambda dataframe: dataframe['variant_id'].map(lambda variant_id:variant_id.split("
")[-2])).assign(
IndexError: list index out of range

After running line by line, this is because snp column in Knight genotype file has a different format compared to ROSMAP genotype data.

In ROSMAP, the format of snp is likechr1:248945797_G_C, but in Knight, it is chr1:732994:G:A. Knight use : instead of _. Details shown in attached screenshots. This cause the pipeline return that index error.
Screen Shot 2023-02-13 at 9 31 09 PM
Screen Shot 2023-02-13 at 9 31 44 PM

Two possible solutions:

  1. Munually change the format of snp in Knight genotype file.
  2. Incorporate some changes in our qc_no_prune in the GWAS_QC pipeline.

I will use 1 for now. And I will make some changes to pipeline later.

@hsun3163
Copy link
Collaborator

@zq2209 Actually, sorry, I didn't realize last night that the mechanism necessary to fix both the FID and the SNP names are all in the qc_no_prune already. Instead of changing them, can you try to do item 1 via the qc_no_prune command?

@zq2209
Copy link
Contributor Author

zq2209 commented Feb 14, 2023

I have tried qc_no_prune for Knight. It have solved the SNP name. But the FID was not solved. Should we incorporate some changes to deal with FID?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants