Releases: uclahs-cds/package-moPepGen
v0.4.1
Changed
- Fixed the problem that in
summarizeFasta
output the order of variant sources in the same group is not consistent across runs. #428 - Argument
--ignore-missing-source
added tosummarizeFasta
so sources not present in any GVF can be ignored without raising any error. #436 - In
filterFasta
, when filter with expression table, changed to filter out peptides smaller than, instead of smaller or equal to, the value of--quant-cutoff
. - Fixed the issue that in
splitFasta
, variant sources are not grouped as they are specified by--group-source
#439
Added
- Resources usage including memory, CPU and time is now printed to stdout in the end of all command line programs.
Fixed
- Fixed issue that
--additional-split
not recognized properly insplitFasta
. #443
v0.4.0
Added
- Added CLI command
summarizeFasta
to output a summary table of the variant peptide FASTA file output bycallVariant
.
Changed
-
Attribute key for transcript ID is fixed from 'TRANSCRIPT' to 'TRANSCRIPT_ID' in circRNA's GVF files output by
parseCIRCExplorer
to be the same as other GVF files. -
Genomic position for each record is added to the GVF file output by
parseCIRCExplorer
.
v0.3.1
v0.3.0
Added
-
Enable
filterFasta
to filter by number of miscleavages per peptide. #382 -
Added CLI command
mergeFasta
to merge multiple variant peptide database Fasta files into one. This could be useful when working with multiplexed proteomic experiments such as TMT. #380 -
Added CLI command
decoyFasta
to generate decoy database by shuffling or reversing each sequence. #386 -
Added parameter
--min-coverage-rna
toparseREDItools
to filter by total RNA reads at a given position. #392 -
Added CLI command
encodeFasta
to replace the variant peptide headers with UUIDs. The original FASTA headers are stored in a text file together with the UUIDs. This is to make the FASTA header short enough for library search engines. #389
Changed
-
Donor and accepter transcript IDs are now explicitly included in the variant IDs of fusion in both GVFs and variaint peptide FASTA headers. Closed #376 via #377
-
For fusion,
callVariant
now looks at the entire accepter sequence for potential variant peptides, rather than only the peptides that contains the breakpoint. #377 -
filterFasta
updated to support filter by number of miscleavages. #383 -
In
parseVEP
, chromosome seqname for each record is now read directly from the gene annotation, to avoid the 'chr' prefix issue. #391 -
The
--transcript-id-column
parameter ofparseREDItools
is changed to take 1-based index. #392 -
Changed
splitDatabase
tosplitFasta
for consistency. #397 -
Updated
generateIndex
to reduce the size of genomic annotation data and the memory usage when loaded. #395
v0.2.0
This is the first unstable release of moPepGen, the graph based multi-omics peptide generator. Below is what got updated since v0.1.0-beta.1
Added
-
Multi-threading is enabled for
callVariant
to run in parallel. -
CLI command
indexGVF
added to generate a index file for quickly access variant data from the corresponding GVF file. Noted that this command is not required to run.
Changed
-
To solve the complexity of subgraphs introduced by fusion and especially alternative splicing insertion and substitution, the
SubgraphTree
class is added to keep the graph-subgraph relationship between nodes. -
Variant records are now kept on disk rather than reading the entire GVF file(s) into memory, and only the file pointers to variant records are kept in memory. This significantly reduces the memory usage of
callVariant
. -
The command line arguments are standardized across all commands, for example '-i/--input-path' for inputs and '-o/--output-path' for outputs.
v0.1.0-beta.1
The first beta release for moPepGen includes:
- Graph-based data structure and algorithm for calling noncanonical peptides caused by genomic and transcriptional variants.
- Command line interface that parses genomic/transcriptional variant results into GVF, calls variant or noncoding peptides, splitting database, and filtering fasta.
- Util package not for general usage but are handy for development.