thoraxe

thoraxe is a tool to identify orthologous exonic regions (s-exons) in a set of transcripts from orthologous genes in a set of species.

usage: thoraxe [-h] [-i INPUTDIR] [-o OUTPUTDIR] [-a ALIGNER] [-s MAXTSL]
               [-m MINLEN] [-g MINGENES] [-t MINTRANSCRIPTS] [-c COVERAGE]
               [-p IDENTITY] [--gapopen GAPOPEN] [--gapextend GAPEXTEND] [-r]
               [--padding PADDING] [-y] [--no_movements] [--no_disintegration]
               [--plot_chimerics] [-l SPECIESLIST]
               [--canonical_criteria CANONICAL_CRITERIA] [--version]

Named Arguments

-i, --inputdir

Input directory. The input folder should have an Ensembl subfolder as the generated by transcript_query.

Default: “.”

-o, --outputdir

Output directory, the indicated input directory is used by default.

Default: “”

-a, --aligner

Path to ProGraphMSA.

Default: “ProGraphMSA”

-s, --maxtsl

Maximum Transcript Support Level (TSL) to use when TSL is available for a transcript.

Default: 3

-m, --minlen

Minimum exon length.

Default: 4

-g, --mingenes

Minimum number of genes to consider a path in the splice graph.

Default: 1

-t, --mintranscripts

Minimum number of transcripts to consider a path in the splice graph.

Default: 2

-c, --coverage

Minimum alignment coverage of the shorter exon to include both exons in the same cluster.

Default: 80.0

-p, --identity

Minimum percent identity to include exons in the same cluster.

Default: 30.0

--gapopen

Penalty for a gap opening.

Default: -10

--gapextend

Penalty for gap extensions.

Default: -1

-r, --rescue_unaligned_subexons

The sub-exons that do not align against any other are deleted from their cluster, and they could be reassigned during the sub-exon rescue phase. By default, they are kept into their original exon cluster.

Default: False

--padding

Length of padding, Xs, in the chimeric alignment.

Default: 10

-y, --phylosofs

Save inputs to run PhyloSofS in the phylosofs folder.

Default: False

--no_movements

Do not move one/two residue sub-exon blocks.

Default: False

--no_disintegration

Do not disintegrate one-residue-length s-exons.

Default: False

--plot_chimerics

Save plotly/html plot for the chimeric alignments in the _intermediate folder.

Default: False

-l, --specieslist

It could be a list of more than one species separated by commas and without spaces, e.g. homo_sapiens,mus_musculus, or a single file with the species list (one species per line). If nothing is indicated, all the available species are used.

Default: “”

--canonical_criteria

List of column names of the path_table separated by commas used to sort the row. If nothing is indicated, the following list is used: MinimumConservation,MinimumTranscriptWeightedConservation,MeanTranscriptWeightedConservation,TranscriptLength,TSL

Default: “”

--version

show program’s version number and exit

It has been developed at LCQB (Laboratory of Computational and Quantitative Biology), UMR 7238 CNRS, Sorbonne Université.

Tip

It is possible to avoid filtering by Transcript Support Level (TSL) by setting –maxtsl to 5.