-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataprep error and empty output files #170
Comments
Hi Michael (@Michael-m6A), Do you mind sharing the download links to the Best wishes, |
Hi Yuk (@yuukiiwa), Index of /pub/metazoa/release-55/fasta/aedes_aegypti_lvpagwg/cdna Index of /pub/metazoa/release-55/fasta/aedes_aegypti_lvpagwg/ncrna Please note: I have concatenated the cdna and ncrna fasta file in order to obtain one reference fasta file. Index of /pub/metazoa/release-55/gtf/aedes_aegypti_lvpagwg Thanks, |
Hi Michael (@Michael-m6A), Here is a python script that will convert your merged fasta file into a format that
Thanks! Best wishes, |
Hi Yuk (@yuukiiwa), Thank you very much for the python script to make my fasta files usable for xpore. Please see fasta file before and after applying the python script below: fasta file before grep ">" /scratch/project_mnt/S0081/Aedes_aegypti_lvpagwg.AaegL5.cdna.ncrna.all.fa | head -10
fasta file after applying python script grep ">" /scratch/project_mnt/S0081/Aedes_aegypti_lvpagwg.AaegL5.cdna.ncrna.all.xpore.modified.fa | head -10
The fasta file looks good and I have set up a new xpore dataprep genomic coordinate run using the xpore modified fasta and original ENSEMBL gtf file. I will let you know how the run goes. Thanks again! Regards, |
Hi Yuk (@yuukiiwa), The current xpore dataprep run finished but unfortunately gives a different error in line 419 and it also exceeded 40 hours of walltime. Please see the error message below. Error message cat xpore-dataprep-ENSEMBL-55-genomic-gtf-xpore-modified-PNXP22239.o985129 A value is trying to be set on a copy of a slice from a DataFrame. See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy Process Consumer-2: Many thanks again. Regards, |
Hi Michael (@Michael-m6A), Do you mind trying to strip off the
Thanks! Best wishes, |
Hi Yuk (@yuukiiwa), Thanks for your suggestion, but I am a bit reluctant to delete first column of the eventalign.txt -XX in AAEL000000-XX (the -RA in AAEL023294-RA). The reason being is that the -RA, -RB etc. are the identifier for the different transcripts/isoforms of a gene. My thinking is that if I delete the -XX in AAEL000000-XX of the eventalign.txt file, then I would need to delete the-XX of the fasta and gtf file as well. However, in my view that would mean I am deleting all the transcripts/isoforms identifiers. Can you think of another possible solution for the error in line 419 of the xpore python script? Could it help applying the python script you previously provided to clean up the fasta file and do the same with the gtf file? Please see head of gtf file below. head /scratch/project_mnt/S0081/Aedes_aegypti_lvpagwg.AaegL5.55.gtf Thanks again! |
Hi Yuk (@yuukiiwa), I have been going through all my scripts from the beginning and just realised that I obviously used the unmodified reference fasta file to align the basecalled fastq reads using minimap2, and then Nanopolish to generate the eventalign files. Would I be correct in saying that I need to repeat the minimap2 and Nanopolish steps using the xpore_modified reference fasta file. Best wishes, |
Hi Michael (@Michael-m6A), You don't have to rerun your samples from the beginning using the newly modified reference fasta file. Regarding not stripping the Thanks! Best wishes, |
Hi Yuk (@yuukiiwa), I am very much appreciating you taking the time to explain and troubleshoot this fasta and eventalign file compatibility issue with me. Your explanation totally makes sense to me and we’ve applied the python script you provided to delete the (-XX in AAEL000000-XX) in the eventalign file. Job log head /scratch/project_mnt/S0081/PNXP22239-WB-1-ENSEMBL-55-eventalign.txt head /scratch/project_mnt/S0081/PNXP22239-WB-1-ENSEMBL-55-xpore-modified-eventalign.txt I have also just submitted a new xpore dataprep run using the xpore-modified fasta and eventalign file. I should have an answer by Monday and will update you on the outcome. Have a good weekend. Cheers, |
Hi Yuk (@yuukiiwa), The xpore-dataprep run finished without giving me an error message, only listed the usual performance warnings. Please see job log below. cat xpore-dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-modified-PNXP22239.o1042590 /sw/QFAB/miniconda3/envs/xpore_2.1/lib/python3.9/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy However, I've continued checking the outcome files of which the data.index data.json data.log data.readcount files are empty and only the eventalign.index contains the information of transcript_id,read_index,pos_start,pos_end, see below. ls /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-modified-PNXP22239 head /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-modified-PNXP22239/data.index more /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-modified-PNXP22239/data.index tail /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-modified-PNXP22239/data.index head /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-modified-PNXP22239/data.json tail /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-modified-PNXP22239/data.json head /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-modified-PNXP22239/data.log tail /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-modified-PNXP22239/data.log head /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-modified-PNXP22239/data.readcount tail /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-modified-PNXP22239/data.readcount head /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-modified-PNXP22239/eventalign.index tail /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-modified-PNXP22239/eventalign.index It looks like that the xpore dataprep step worked but data.index data.json data.log data.readcount files being empty makes me a bit suspicious. Can you please tell me what you think and if this is still an issue point me to a possible solution. Many thanks again! Cheers, |
Hi Michael (@Michael-m6A), Do you mind using the following GTF to run Best wishes, |
Hi Yuk (@yuukiiwa), Absolutely, happy to do so. I have submitted a new xpore dataprep run using your modified gtf file. I am currently in a queue on our HPC waiting for my job submission to run. We should have an outcome by tomorrow which I’ll post again. Thank you! Cheers, |
Hi Yuk (@yuukiiwa), I am afraid but it looks like using the modified gft file you provided did not solved the issue. The xpore dataprep run using the modified gtf file produced the following error messages (line 10, 67, 753, 342, and 88) and the output files are empty. Please see below. Job log cat xpore-dataprep-ENSEMBL-55-genomic-xpore-modified-gtf-file-PNXP22239.o1055583 /sw/QFAB/miniconda3/envs/xpore_2.1/lib/python3.9/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy /sw/QFAB/miniconda3/envs/xpore_2.1/lib/python3.9/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. Inspected output files ls /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-xpore-modified-gtf-file-PNXP22239 head /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-xpore-modified-gtf-file-PNXP22239/data.index tail /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-xpore-modified-gtf-file-PNXP22239/data.index head /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-xpore-modified-gtf-file-PNXP22239/data.json head /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-xpore-modified-gtf-file-PNXP22239/data.log Usually here we see head /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-xpore-modified-gtf-file-PNXP22239/eventalign.index Thank you again. Cheers, |
Hi Michael (@Michael-m6A), I have tried running xpore dataprep with the 10-line eventalign.txt you provided above and the following modified fasta and gtf. Here are the top 10 lines of the files:
My Thanks! Best wishes, |
Hi Yuk (@yuukiiwa), Thanks very much for running xpore dataprep with the 10-line eventalign.txt file. I have kind of good news with one remaining barrier to overcome (error in line 419), please see below. Fasta file head /scratch/project_mnt/xx/Aedes_aegypti_lvpagwg.AaegL5.cdna.ncrna.all.xpore.modified.fa
We modified your python script and removed the -RA, see below. file=open('/scratch/project_mnt/xx/Aedes_aegypti_lvpagwg.AaegL5.cdna.ncrna.all.fa','r') xpore fully modified fasta file head /scratch/project_mnt/xx/Aedes_aegypti_lvpagwg.AaegL5.cdna.ncrna.all.xpore.modified.fa
Xpore dataprep run – genomic coordinates using the fully modified fasta, gtf, and eventalign files Now to the kind of good news, following the xpore-dataprep run using the fully xpore-modified fasta, gtf, and eventalign files all the outcome files contain expected data, see below. However, the xpore-dataprep run still gets stuck at around 6 hours and in the job, log gave me this error message, also see below. ls /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-xpore-modified-All-files-PNXP22239 head /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-xpore-modified-All-files-PNXP22239/data.index head /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-xpore-modified-All-files-PNXP22239/data.json head /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-xpore-modified-All-files-PNXP22239/data.log head /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-xpore-modified-All-files-PNXP22239/data.readcount head /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-xpore-modified-All-files-PNXP22239/eventalign.index Running time when xpore dataprep job got stuck Job id Name User Time Use S Queue 1067207.tinmgr2 xpore-dataprep- xx 06:27:24 R General Job log /sw/QFAB/miniconda3/envs/xpore_2.1/lib/python3.9/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy It appears that there is still an issue with the genomic coordinate conversion Given that it specifically listed KeyError: ('AAEL024786', 2984), I have inspected the modified gtf and eventalign.txt files. grep AAEL024786 /scratch/project_mnt/xx/xpore_modified_Aedes_aegypti_lvpagwg.AaegL5.55.gtf Contains AAEL024786 Data log grep AAEL024786 /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-xpore-modified-All-files-PNXP22239/data.log wc -l /scratch/project_mnt/xx/dataprep-ENSEMBL-55-genomic-xpore-modified-All-files-PNXP22239/data.log modified-eventalign – search for AAEL024786 (mentioned in error message) grep AAEL024786 /scratch/project_mnt/xx/PNXP22239-WB-1-ENSEMBL-55-xpore-modified-eventalign.txt AAEL024786 304 TCGGT 1708964 t 9 96.18 3.014 0.00500 TCGGT 95.29 4.98 0.16 52031 52046 AAEL024786 3900 GTATA 1709033 t 1308 86.73 1.652 0.00700 GTATA 89.66 2.63 -0.87 2336 2357 Contains AAEL024786, however, data.log shows that it only progressed until 133 and then produced the error message. I am very appreciative of all your hep and time invested in resolving the xpore dataprep genomic coordinate issue, and we’re getting close. Please have a look at the error message and my attempted file investigation, I am hopeful that the error message in line 419 is resolvable. Thanks again! Cheers, |
Hi Michael (@Michael-m6A), Sorry for the delayed reply! (It was Chinese New Year out here) The transcript_id
I think this is more of a 0-base vs 1-base problem, which doesn't quite exist in widely used genomes (e.g. human, mouse, and Arabidopsis) from the same reference source. I have edited the function in the
The annotation for
Thanks! Best wishes, |
Hi Yuk (@yuukiiwa), No problem at all! My apologies for not responding to you for the last 3 weeks. I was on leave as we welcomed a new baby girl into our family. I am incredible thankful to you for providing me with this edited function (michael branch). I have requested this edited function to be installed on our institutional HPC server, unfortunately, that is not the fastest process. I will keep you up to date as soon as I have any developments. Thanks again, |
Hi Michael (@Michael-m6A), Congratulations!! Take your time! Best wishes, |
Hi Yuk (@yuukiiwa), I will post here again once I have made actual progress with the analysis using the xPore edited function (michael branch) you kindly provided. Thanks, |
Hi Yuk (@yuukiiwa), We could finally install your modified xpore version on the new HPC infrastructure. I have used fasta, eventalign, and gtf (containing cDNA and ncRNA information) files without (-RA) and submitted the xpore-developer dataprep job. However, it did produce a similar error message like we got previously, with the difference that it is now the position leading to the error in the gene 'AAEL024786' is 2985 instead of previously 2984, please see below. Error message modified xpore developer File "/home/s4303883/.local/lib/python3.9/site-packages/xpore-2.1-py3.9.egg/xpore/scripts/dataprep.py", line 419, in preprocess_gene Error message standard xpore Error message modified xpore developer in detail -rw-r--r--. 1 xx qris-uq 1433120 Apr 16 04:34 s3535379_job.xpore-dev-dataprep-ENSEMBL-55-genomic-gtf.error cat xx_job.xpore-dev-dataprep-ENSEMBL-55-genomic-gtf.error [xx@bunya3 ~]$ cat xx_job.xpore-dev-dataprep-ENSEMBL-55-genomic-gtf.error See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy Error message standard xpore in detail Traceback (most recent call last): I have also inspected all output files and all of them contain data outputs, please see below. Output:modified xpore developer run on 17-04-2023 drwxr-sr-x. 2 user Q5334RW 4096 Apr 14 16:58 dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-dev-modified-PNXP22239 ls /scratch/project_mnt/user/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-dev-modified-PNXP22239 data.index data.json data.log data.readcount eventalign.index head /scratch/project_mnt/user/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-dev-modified-PNXP22239/data.index idx,start,end head /scratch/project_mnt/user/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-dev-modified-PNXP22239/data.json {"AAEL017263":{"78078919":{"GGAAA":[119.4,115.5,71.2,116.0,104.5,106.4,68.5,107.8,109.7,104.6,105.7,113.8,114.0,102.8,109.9,108.3,108.4,111.4,90.6,110.3,112.1,111.5,102.0,111.2,110.9,111.5,108.6,108.7,85.0,111.6,109.6,107.9,109.9,107.0,108.1,100.4,112.4,107.3,111.4,86.5,109.9,107.9,111.4,106.1,110.9,110.9,113.9,105.8,110.1,90.9,101.3,86.8,109.4,110.8,85.7,108.1,108.0,110.7,109.1,70.0,117.4,113.6,111.0,106.0,93.5,108.7,110.4,111.5,93.4,88.2,68.2,108.3,111.6,112.1,112.8,109.4,111.3,108.4,109.7,91.1,113.5,116.7,70.6,108.9,111.7,109.4,105.5,91.4,110.1,108.5,100.0,103.7,108.5,121.4,110.5,103.7,91.5,86.5,107.7,109.0,108.9,92.1,103.3,112.3,106.6,94.1,85.5,110.8,108.9,73.7,107.8,110.0,108.8,91.3,110.5,111.7,112.9,109.1,110.8,64.4,108.3,109.0,106.6,109.9,103.3,108.4,107.6,108.0,115.1,113.1,110.6,108.0,106.3,111.9,87.6,105.6,109.4,111.5,107.5,108.6,110.6,110.9,97.4]},"78078920":{"TGGAA": etc. TAT":[84.1]},"84837":{"GAATA":[111.4]},"84838":{"AGAAT":[128.7]},"84839":{"CAGAA":[121.1]},"84840":{"GCAGA":[95.8]},"84841":{"CGCAG":[92.5]},"84842":{"CCGCA":[84.0]},"84843":{"GCCGC":[76.5]},"84844":{"TGCCG":[99.9]},"84845":{"ATGCC":[83.0]},"84846":{"CATGC":[74.0]},"84847":{"ACATG":[80.4]},"84848":{"AACAT":[89.0]},"84849":{"CAACA":[91.6]},"84850":{"GCAAC":[83.1]},"84851":{"GGCAA":[110.9]},"84852":{"GGGCA":[105.4]},"84853":{"TGGGC":[113.5]},"84854":{"ATGGG":[97.8]},"84855":{"TATGG":[84.7]},"84856":{"ATATG":[85.9]},"84857":{"AATAT":[92.1]},"84858":{"CAATA":[112.4]},"84859":{"CCAAT":[90.0]},"84860":{"GCCAA":[74.3]},"84861":{"GGCCA":[103.8]},"84862":{"TGGCC":[110.0]},"84863":{"CTGGC":[104.3]},"84864":{"GCTGG":[89.3]}}} head /scratch/project_mnt/user/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-dev-modified-PNXP22239/data.log AAEL017263: True head /scratch/project_mnt/user/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-dev-modified-PNXP22239/data.readcount idx,n_reads head /scratch/project_mnt/user/dataprep-ENSEMBL-55-genomic-gtf-fasta-eventalign-xpore-dev-modified-PNXP22239/eventalign.index transcript_id,read_index,pos_start,pos_end Any help will be greatly appreciated again. Many thanks, |
Hi Yuk (@yuukiiwa), Would it be possible to get your help on this xpore dataprep genomic coordinate error? I have used the modified version (michael branch) you kindly provided to me in January but still get an error message one position further down the line. Error message modified xpore developer Please see further details and the previous error message in the post from the 19th of April above. Many thanks in advance, |
Hi Ploy (@ploy-np), Yuk (@yuukiiwa), and Jonathan (@jonathangoeke), I am writing to all of you in the hope to get your help with the xpore dataprep genomic coordinate error. Please see the error message in full in my previous post from 19th of April above where I used the modified version (michael branch) you kindly provided. Error message modified xpore developer Thank you in advance, |
Hi Michael, I am wondering if you have resolved this issue or have any update on it. I am currently running Xpore and experiencing a very similar issue as you. Thanks, |
Hi Laur (@lyj95618) and Yuk (@yuukiiwa), Unfortunately, I have not been able to resolve the xPore dataprep converting transcriptome positions to genomic coordinates issue. However, I did investigate if they are variations between the reference annotation files and the number of exons from two different databases. I have compared the Aedes aegypti annotation files from ENSEMBL and VectorBase for the gene The location coordinates and number of exons for “AAEL024786” are identical in both ENSEMBL and VectorBase database annotation files. My conclusion so far is that the source of the problem is within the xPore transcriptome to genomic coordinates (t2g_mapping) code, see error message which is the same as your error below. File "/sw/QFAB/miniconda3/envs/xpore_2.1/lib/python3.9/site-packages/xpore/scripts/dataprep.py", line 419, in preprocess_gene Please let me know if you find a way to overcome this xPore dataprep genomic coordinates problem. Many thanks, |
Hi Michael @Michael-m6A, I have also done some investigations on the errors and found that it was the issue in the transcriptome to genomic coordinates conversion code. I tried to run Xpore without the I might go to try to download the most recent version of Ensembl cdna and GTF and see if that helps to overcome the genomic coordinate issue. Thanks! |
Hi Laur (@lyj95618), Thank you very much for sharing! I also run xPore without the -–genome flag which worked well and gave me the locations within the transcriptome. However, being able to get the genomic coordinates of the modifications would be really valuable. Hopefully, the @GoekeLab can fix the bug/error in their xPore transcriptome to genomic coordinates conversion code. Thanks, |
Hi,
I keep getting an error message and empty output files (data.index, data.json, data.log, data.readcount eventalign.index) when running the xpore dataprep step for genomic coordinates.
I have downloaded the latest reference fasta and gtf files for Aedes aegypti from ENSEMBL as recommended.
Please see the header of the gtf file, script, and error messages below.
Head of gtf file
head /scratch/project_mnt/S0081/Aedes_aegypti_lvpagwg.AaegL5.55.gtf
#!genome-build AaegL5
#!genome-version AaegL5
#!genome-build-accession GCA_002204515.1
2 VectorBase gene 97401212 97402380 . + . gene_id "AAEL020088"; gene_source "VectorBase"; gene_biotype "protein_coding";
2 VectorBase transcript 97401212 97402380 . + . gene_id "AAEL020088"; transcript_id "AAEL020088-RB"; gene_source "VectorBase"; gene_biotype "protein_coding"; transcript_source "VectorBase"; transcript_biotype "protein_coding"; tag "Ensembl_canonical";
2 VectorBase exon 97401212 97401577 . + . gene_id "AAEL020088"; transcript_id "AAEL020088-RB"; exon_number "1"; gene_source "VectorBase"; gene_biotype "protein_coding"; transcript_source "VectorBase"; transcript_biotype "protein_coding"; exon_id "AAEL020088-RB-E1"; tag "Ensembl_canonical";
2 VectorBase CDS 97401561 97401577 . + 0 gene_id "AAEL020088"; transcript_id "AAEL020088-RB"; exon_number "1"; gene_source "VectorBase"; gene_biotype "protein_coding"; transcript_source "VectorBase"; transcript_biotype "protein_coding"; protein_id "AAEL020088-PB"; tag "Ensembl_canonical";
2 VectorBase start_codon 97401561 97401563 . + 0 gene_id "AAEL020088"; transcript_id "AAEL020088-RB"; exon_number "1"; gene_source "VectorBase"; gene_biotype "protein_coding"; transcript_source "VectorBase"; transcript_biotype "protein_coding"; tag "Ensembl_canonical";
2 VectorBase exon 97401632 97401925 . + . gene_id "AAEL020088"; transcript_id "AAEL020088-RB"; exon_number "2"; gene_source "VectorBase"; gene_biotype "protein_coding"; transcript_source "VectorBase"; transcript_biotype "protein_coding"; exon_id "AAEL020088-RB-E2"; tag "Ensembl_canonical";
2 VectorBase CDS 97401632 97401925 . + 1 gene_id "AAEL020088"; transcript_id "AAEL020088-RB"; exon_number "2"; gene_source "VectorBase"; gene_biotype "protein_coding"; transcript_source "VectorBase"; transcript_biotype "protein_coding"; protein_id "AAEL020088-PB"; tag "Ensembl_canonical";
Script
#!/bin/bash
#PBS -l walltime=80:00:00
#PBS -N xpore-dataprep-ENSEMBL-55-genomic-gtf-PNXP22239
#PBS -j oe
#PBS -A UQ-SCI-BiolSci
#PBS -l select=1:ncpus=24:mem=80G
-location of scratch project space-
dir=/scratch/project/m6a-ml-2022
-location for eventalign file on scratch project space-
eventalign=${dir}/PNXP22239-WB-1-ENSEMBL-55-eventalign.txt
-location of ENSEMBL release 55 gtf reference file to work on scratch project space-
gtf=${dir}/Aedes_aegypti_lvpagwg.AaegL5.55.gtf
-location of ENSEMBL release 55 cDNA and ncRNA concatenated reference fasta file to work on scratch project space-
fasta=${dir}/Aedes_aegypti_lvpagwg.AaegL5.cdna.ncrna.all.fa
-xpore loading module-
module load xpore/2.1
-xpore command-
xpore dataprep
--eventalign ${eventalign}
--gtf_or_gff ${gtf}
--transcript_fasta ${fasta}
--out_dir ${dir}/dataprep-ENSEMBL-55-genomic-gtf-PNXP22239
--genome
Job log error message
/sw/QFAB/miniconda3/envs/xpore_2.1/lib/python3.9/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()
/sw/QFAB/miniconda3/envs/xpore_2.1/lib/python3.9/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()
/sw/QFAB/miniconda3/envs/xpore_2.1/lib/python3.9/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()
/sw/QFAB/miniconda3/envs/xpore_2.1/lib/python3.9/site-packages/xpore/scripts/dataprep.py:72: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
…………………………..
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
chunk_split['line_length'] = np.array(lines)
/sw/QFAB/miniconda3/envs/xpore_2.1/lib/python3.9/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()
/sw/QFAB/miniconda3/envs/xpore_2.1/lib/python3.9/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()
/sw/QFAB/miniconda3/envs/xpore_2.1/lib/python3.9/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()
Traceback (most recent call last):
File "/sw/QFAB/miniconda3/envs/xpore_2.1/bin/xpore", line 10, in
sys.exit(main())
File "/sw/QFAB/miniconda3/envs/xpore_2.1/lib/python3.9/site-packages/xpore/scripts/xpore.py", line 67, in main
options.func(options)
File "/sw/QFAB/miniconda3/envs/xpore_2.1/lib/python3.9/site-packages/xpore/scripts/dataprep.py", line 753, in dataprep
parallel_preprocess_gene(eventalign_filepath,fasta_dict,annotation_dict,is_gff,out_dir,n_processes,readcount_min,readcount_max,resume)
File "/sw/QFAB/miniconda3/envs/xpore_2.1/lib/python3.9/site-packages/xpore/scripts/dataprep.py", line 342, in parallel_preprocess_gene
n_reads, tx_ids, t2g_mapping = t2g(gene_id,fasta_dict,annotation_dict,g2t_mapping,df_eventalign_index,readcount_min)
File "/sw/QFAB/miniconda3/envs/xpore_2.1/lib/python3.9/site-packages/xpore/scripts/dataprep.py", line 88, in t2g
tx_seq = fasta_dict[tx][0]
KeyError: 'AAEL010314-RA'
=>> PBS: job killed: walltime 288068 exceeded limit 288000
########################### Job Execution History #############################
JobId:970386.tinmgr2
UserName:
GroupName:qris-uq
JobName:xpore-dataprep-ENSEMBL-55-genomic-gtf-PNXP22239
SessionId:35971
ResourcesRequested:mem=80gb,ncpus=24,place=free,walltime=80:00:00
ResourcesUsed:cpupercent=185,cput=06:15:32,mem=2491616kb,ncpus=24,vmem=9301936kb,walltime=80:01:08
QueueUsed:General
AccountString:UQ-SCI-BiolSci
ExitStatus:271
###############################################################################
I’m not worried about the performance warning, however, the errors in line 10, 67, 753, 342, and 88 which seem to be the source of the error I don’t understand.
Any guidance would be greatly appreciated.
Many thanks,
Michael
The text was updated successfully, but these errors were encountered: