Pipeline steps

Note

Note that the main script aspring runs the entire pipeline automatically.

step_01_preprocess

STEP 1 : Reformat s-exons fasta files to a2m

usage: step_01_preprocess [-h] --gene GENENAME --dataPATH DATAPATH
                          --path_hhsuite_scripts PATH_HHSUITE_SCRIPTS
                          [--len LEN] [--version]

Named Arguments

--gene

name of gene

--dataPATH

path to dir containing Thoraxe outputs

--path_hhsuite_scripts

path to the folder containing the scripts of hhsuite

--len

dont create profile for msa in which sequences are of length < X aa (def=5)

Default: 5

--version

show program’s version number and exit

step_02_hmm_maker

STEP 2 : Generates a Hidden Markov Model (HMM) profile for each s-exon.

usage: step_02_hmm_maker [-h] --gene GENENAME [--id ID] --path_data PATH_DATA
                         [--version]

Named Arguments

--gene

name of gene

--id

[0,100] maximum pairwise sequence identity (%) (def=100)

Default: 100

--path_data

path to dir containing Thoraxe outputs

--version

show program’s version number and exit

step_03_hmm_aligner

STEP 3 : HMM-HMM alignment of all the s-exons combinations.

usage: step_03_hmm_aligner [-h] --gene GENENAME [--id ID] --path_data
                           PATH_DATA --norealign NOREALIGN --glo_loc GLO_LOC
                           --mact MACT [--version]

Named Arguments

--gene

name of gene

--id

[0,100] maximum pairwise sequence identity (%) applied to query MSA, template MSA, and result MSA (def=100)

Default: 100

--path_data

path to dir containing Thoraxe outputs

--norealign

bool, 1 if norealign else 0, do NOT realign displayed hits with Maximum Accuracy algorithm (MAC) (def=0)

Default: 0

--glo_loc

bool, 1 if global else 0, use global/local alignment mode for searching/ranking (def=local)

Default: 0

--mact

[0,1[ posterior prob threshold for MAC realignment controlling greediness at alignment ends: 0:global >0.1:local (default=0.35)

Default: 0.35

--version

show program’s version number and exit

step_04_gettable

STEP 4 : Parses the alignment files and creates a table.

usage: step_04_gettable [-h] --gene GENENAME --path_data PATH_DATA [--version]

Named Arguments

--gene

name of queried gene

--path_data

path to dir containing Thoraxe outputs

--version

show program’s version number and exit

step_05_filter

Filter the table to keep gene duplication pairs based on identity, coverage, p-value and number of species in the MSAs

usage: step_05_filter [-h] --gene GENENAME --dataPATH DATAPATH --id_pair
                      ID_PAIR --idCons_pair IDCONS_PAIR --pval PVAL --nbSpe
                      NBSPE --cov COV [--version]

Named Arguments

--gene

name of gene

--dataPATH

path to dir containing Thoraxe outputs

--id_pair

Identity percentage threshold between first sequence in msa of s-exon for each s-exon in a pair

--idCons_pair

Identity percentage threshold between consensus sequence of msa of s-exon for each s-exon in a pair

--pval

p-value threshold for HMM-HMM alignment of a s-exons pair

--nbSpe

minimum number of species in msa for s-exons in the pair

--cov

Threshold for coverage of s-exon A and B in alignment of A and B

--version

show program’s version number and exit

step_06_stats

Generates statistics on the filtered duplicated regions.

usage: step_06_stats [-h] --gene GENE --path_data PATH_DATA [--version]

Named Arguments

--gene

Gene name

--path_data

Path to dir containing Thoraxe outputs

--version

show program’s version number and exit

step_07_reformat

Reformat the previous outputs to add the information about the duplicated regions.

usage: step_07_reformat [-h] --gene GENENAME --dataPATH DATAPATH [--version]

Named Arguments

--gene

name of gene

--dataPATH

path to dir containing Thoraxe outputs

--version

show program’s version number and exit

step_08_ASRUs

Identifies the Alternative Splicing Repetitive Units (ASRUs) on the gene.

usage: step_08_ASRUs [-h] --gene GENENAME --dataPATH DATAPATH [--version]

Named Arguments

--gene

name of gene

--dataPATH

path to dir containing Thoraxe outputs

--version

show program’s version number and exit

step_09_clean

Removes the intermediate files generated during the pipeline.

usage: step_09_clean [-h] --gene GENENAME --path_data PATH_DATA [--version]

Named Arguments

--gene

name of queried gene

--path_data

path to dir containing Thoraxe outputs

--version

show program’s version number and exit

step_10_struct

STEP 10 : Generates PyMOL scripts to visualize protein structures with highlighted s-exons and ASRUs.

usage: step_10_struct [-h] --gene GENENAME --dataPATH DATAPATH

Named Arguments

--gene

name of gene

--dataPATH

path to dir containing Thoraxe outputs