Pipeline steps¶

Note

Note that the main script aspring runs the entire pipeline automatically.

step_01_preprocess¶

STEP 1 : Reformat s-exons fasta files to a2m

usage: step_01_preprocess [-h] --gene GENENAME --dataPATH DATAPATH
                          --path_hhsuite_scripts PATH_HHSUITE_SCRIPTS
                          [--len LEN] [--version]

Named Arguments¶

--gene

name of gene

--dataPATH

path to dir containing Thoraxe outputs

--path_hhsuite_scripts

path to the folder containing the scripts of hhsuite

--len

dont create profile for msa in which sequences are of length < X aa (def=5)

Default: 5

--version

show program’s version number and exit

step_02_hmm_maker¶

STEP 2 : Generates a Hidden Markov Model (HMM) profile for each s-exon.

usage: step_02_hmm_maker [-h] --gene GENENAME [--id ID] --path_data PATH_DATA
                         [--version]

Named Arguments¶

--gene

name of gene

--id

[0,100] maximum pairwise sequence identity (%) (def=100)

Default: 100

--path_data

path to dir containing Thoraxe outputs

--version

show program’s version number and exit

step_03_hmm_aligner¶

STEP 3 : HMM-HMM alignment of all the s-exons combinations.

usage: step_03_hmm_aligner [-h] --gene GENENAME [--id ID] --path_data
                           PATH_DATA --norealign NOREALIGN --glo_loc GLO_LOC
                           --mact MACT [--version]

Named Arguments¶

--gene

name of gene

--id

[0,100] maximum pairwise sequence identity (%) applied to query MSA, template MSA, and result MSA (def=100)

Default: 100

--path_data

path to dir containing Thoraxe outputs

--norealign

bool, 1 if norealign else 0, do NOT realign displayed hits with Maximum Accuracy algorithm (MAC) (def=0)

Default: 0

--glo_loc

bool, 1 if global else 0, use global/local alignment mode for searching/ranking (def=local)

Default: 0

--mact

[0,1[ posterior prob threshold for MAC realignment controlling greediness at alignment ends: 0:global >0.1:local (default=0.35)

Default: 0.35

--version

show program’s version number and exit

step_04_gettable¶

STEP 4 : Parses the alignment files and creates a table.

usage: step_04_gettable [-h] --gene GENENAME --path_data PATH_DATA [--version]

Named Arguments¶

--gene: name of queried gene
--path_data: path to dir containing Thoraxe outputs
--version: show program’s version number and exit

step_05_filter¶

Filter the table to keep gene duplication pairs based on identity, coverage, p-value and number of species in the MSAs

usage: step_05_filter [-h] --gene GENENAME --dataPATH DATAPATH --id_pair
                      ID_PAIR --idCons_pair IDCONS_PAIR --pval PVAL --nbSpe
                      NBSPE --cov COV [--version]

Named Arguments¶

--gene: name of gene
--dataPATH: path to dir containing Thoraxe outputs
--id_pair: Identity percentage threshold between first sequence in msa of s-exon for each s-exon in a pair
--idCons_pair: Identity percentage threshold between consensus sequence of msa of s-exon for each s-exon in a pair
--pval: p-value threshold for HMM-HMM alignment of a s-exons pair
--nbSpe: minimum number of species in msa for s-exons in the pair
--cov: Threshold for coverage of s-exon A and B in alignment of A and B
--version: show program’s version number and exit

step_06_stats¶

Generates statistics on the filtered duplicated regions.

usage: step_06_stats [-h] --gene GENE --path_data PATH_DATA [--version]

Named Arguments¶

--gene: Gene name
--path_data: Path to dir containing Thoraxe outputs
--version: show program’s version number and exit

step_07_reformat¶

Reformat the previous outputs to add the information about the duplicated regions.

usage: step_07_reformat [-h] --gene GENENAME --dataPATH DATAPATH [--version]

Named Arguments¶

--gene: name of gene
--dataPATH: path to dir containing Thoraxe outputs
--version: show program’s version number and exit

step_08_ASRUs¶

Identifies the Alternative Splicing Repetitive Units (ASRUs) on the gene.

usage: step_08_ASRUs [-h] --gene GENENAME --dataPATH DATAPATH [--version]

Named Arguments¶

--gene: name of gene
--dataPATH: path to dir containing Thoraxe outputs
--version: show program’s version number and exit

step_09_clean¶

Removes the intermediate files generated during the pipeline.

usage: step_09_clean [-h] --gene GENENAME --path_data PATH_DATA [--version]

Named Arguments¶

--gene: name of queried gene
--path_data: path to dir containing Thoraxe outputs
--version: show program’s version number and exit

step_10_struct¶

STEP 10 : Generates PyMOL scripts to visualize protein structures with highlighted s-exons and ASRUs.

usage: step_10_struct [-h] --gene GENENAME --dataPATH DATAPATH

Named Arguments¶

--gene: name of gene
--dataPATH: path to dir containing Thoraxe outputs

Pipeline steps¶

step_01_preprocess¶

Named Arguments¶

step_02_hmm_maker¶

Named Arguments¶

step_03_hmm_aligner¶

Named Arguments¶

step_04_gettable¶

Named Arguments¶

step_05_filter¶

Named Arguments¶

step_06_stats¶

Named Arguments¶

step_07_reformat¶

Named Arguments¶

step_08_ASRUs¶

Named Arguments¶

step_09_clean¶

Named Arguments¶

step_10_struct¶

Named Arguments¶

aspring

Navigation

Related Topics