thoraxe
thoraxe is a tool to identify orthologous exonic regions (s-exons) in a set of transcripts from orthologous genes in a set of species.
usage: thoraxe [-h] [-i INPUTDIR] [-o OUTPUTDIR] [-a ALIGNER] [-s MAXTSL]
[-m MINLEN] [-g MINGENES] [-t MINTRANSCRIPTS] [-c COVERAGE]
[-p IDENTITY] [--gapopen GAPOPEN] [--gapextend GAPEXTEND] [-r]
[--padding PADDING] [-y] [--no_movements] [--no_disintegration]
[--plot_chimerics] [-l SPECIESLIST]
[--canonical_criteria CANONICAL_CRITERIA] [--version]
Named Arguments
- -i, --inputdir
Input directory. The input folder should have an Ensembl subfolder as the generated by transcript_query.
Default: “.”
- -o, --outputdir
Output directory, the indicated input directory is used by default.
Default: “”
- -a, --aligner
Path to ProGraphMSA.
Default: “ProGraphMSA”
- -s, --maxtsl
Maximum Transcript Support Level (TSL) to use when TSL is available for a transcript.
Default: 3
- -m, --minlen
Minimum exon length.
Default: 4
- -g, --mingenes
Minimum number of genes to consider a path in the splice graph.
Default: 1
- -t, --mintranscripts
Minimum number of transcripts to consider a path in the splice graph.
Default: 2
- -c, --coverage
Minimum alignment coverage of the shorter exon to include both exons in the same cluster.
Default: 80.0
- -p, --identity
Minimum percent identity to include exons in the same cluster.
Default: 30.0
- --gapopen
Penalty for a gap opening.
Default: -10
- --gapextend
Penalty for gap extensions.
Default: -1
- -r, --rescue_unaligned_subexons
The sub-exons that do not align against any other are deleted from their cluster, and they could be reassigned during the sub-exon rescue phase. By default, they are kept into their original exon cluster.
Default: False
- --padding
Length of padding, Xs, in the chimeric alignment.
Default: 10
- -y, --phylosofs
Save inputs to run PhyloSofS in the phylosofs folder.
Default: False
- --no_movements
Do not move one/two residue sub-exon blocks.
Default: False
- --no_disintegration
Do not disintegrate one-residue-length s-exons.
Default: False
- --plot_chimerics
Save plotly/html plot for the chimeric alignments in the _intermediate folder.
Default: False
- -l, --specieslist
It could be a list of more than one species separated by commas and without spaces, e.g. homo_sapiens,mus_musculus, or a single file with the species list (one species per line). If nothing is indicated, all the available species are used.
Default: “”
- --canonical_criteria
List of column names of the path_table separated by commas used to sort the row. If nothing is indicated, the following list is used: MinimumConservation,MinimumTranscriptWeightedConservation,MeanTranscriptWeightedConservation,TranscriptLength,TSL
Default: “”
- --version
show program’s version number and exit
It has been developed at LCQB (Laboratory of Computational and Quantitative Biology), UMR 7238 CNRS, Sorbonne Université.
Tip
It is possible to avoid filtering by Transcript Support Level (TSL) by setting –maxtsl to 5.