Pipeline steps¶
Note
Note that the main script aspring runs the entire pipeline automatically.
step_01_preprocess¶
STEP 1 : Reformat s-exons fasta files to a2m
usage: step_01_preprocess [-h] --gene GENENAME --dataPATH DATAPATH
--path_hhsuite_scripts PATH_HHSUITE_SCRIPTS
[--len LEN] [--version]
Named Arguments¶
- --gene
name of gene
- --dataPATH
path to dir containing Thoraxe outputs
- --path_hhsuite_scripts
path to the folder containing the scripts of hhsuite
- --len
dont create profile for msa in which sequences are of length < X aa (def=5)
Default: 5
- --version
show program’s version number and exit
step_02_hmm_maker¶
STEP 2 : Generates a Hidden Markov Model (HMM) profile for each s-exon.
usage: step_02_hmm_maker [-h] --gene GENENAME [--id ID] --path_data PATH_DATA
[--version]
Named Arguments¶
- --gene
name of gene
- --id
[0,100] maximum pairwise sequence identity (%) (def=100)
Default: 100
- --path_data
path to dir containing Thoraxe outputs
- --version
show program’s version number and exit
step_03_hmm_aligner¶
STEP 3 : HMM-HMM alignment of all the s-exons combinations.
usage: step_03_hmm_aligner [-h] --gene GENENAME [--id ID] --path_data
PATH_DATA --norealign NOREALIGN --glo_loc GLO_LOC
--mact MACT [--version]
Named Arguments¶
- --gene
name of gene
- --id
[0,100] maximum pairwise sequence identity (%) applied to query MSA, template MSA, and result MSA (def=100)
Default: 100
- --path_data
path to dir containing Thoraxe outputs
- --norealign
bool, 1 if norealign else 0, do NOT realign displayed hits with Maximum Accuracy algorithm (MAC) (def=0)
Default: 0
- --glo_loc
bool, 1 if global else 0, use global/local alignment mode for searching/ranking (def=local)
Default: 0
- --mact
[0,1[ posterior prob threshold for MAC realignment controlling greediness at alignment ends: 0:global >0.1:local (default=0.35)
Default: 0.35
- --version
show program’s version number and exit
step_04_gettable¶
STEP 4 : Parses the alignment files and creates a table.
usage: step_04_gettable [-h] --gene GENENAME --path_data PATH_DATA [--version]
Named Arguments¶
- --gene
name of queried gene
- --path_data
path to dir containing Thoraxe outputs
- --version
show program’s version number and exit
step_05_filter¶
Filter the table to keep gene duplication pairs based on identity, coverage, p-value and number of species in the MSAs
usage: step_05_filter [-h] --gene GENENAME --dataPATH DATAPATH --id_pair
ID_PAIR --idCons_pair IDCONS_PAIR --pval PVAL --nbSpe
NBSPE --cov COV [--version]
Named Arguments¶
- --gene
name of gene
- --dataPATH
path to dir containing Thoraxe outputs
- --id_pair
Identity percentage threshold between first sequence in msa of s-exon for each s-exon in a pair
- --idCons_pair
Identity percentage threshold between consensus sequence of msa of s-exon for each s-exon in a pair
- --pval
p-value threshold for HMM-HMM alignment of a s-exons pair
- --nbSpe
minimum number of species in msa for s-exons in the pair
- --cov
Threshold for coverage of s-exon A and B in alignment of A and B
- --version
show program’s version number and exit
step_06_stats¶
Generates statistics on the filtered duplicated regions.
usage: step_06_stats [-h] --gene GENE --path_data PATH_DATA [--version]
Named Arguments¶
- --gene
Gene name
- --path_data
Path to dir containing Thoraxe outputs
- --version
show program’s version number and exit
step_07_reformat¶
Reformat the previous outputs to add the information about the duplicated regions.
usage: step_07_reformat [-h] --gene GENENAME --dataPATH DATAPATH [--version]
Named Arguments¶
- --gene
name of gene
- --dataPATH
path to dir containing Thoraxe outputs
- --version
show program’s version number and exit
step_08_ASRUs¶
Identifies the Alternative Splicing Repetitive Units (ASRUs) on the gene.
usage: step_08_ASRUs [-h] --gene GENENAME --dataPATH DATAPATH [--version]
Named Arguments¶
- --gene
name of gene
- --dataPATH
path to dir containing Thoraxe outputs
- --version
show program’s version number and exit
step_09_clean¶
Removes the intermediate files generated during the pipeline.
usage: step_09_clean [-h] --gene GENENAME --path_data PATH_DATA [--version]
Named Arguments¶
- --gene
name of queried gene
- --path_data
path to dir containing Thoraxe outputs
- --version
show program’s version number and exit
step_10_struct¶
STEP 10 : Generates PyMOL scripts to visualize protein structures with highlighted s-exons and ASRUs.
usage: step_10_struct [-h] --gene GENENAME --dataPATH DATAPATH
Named Arguments¶
- --gene
name of gene
- --dataPATH
path to dir containing Thoraxe outputs