thoraxe.transcript_query package

Submodules

thoraxe.transcript_query.transcript_query module

Created on Fri Apr 14 16:32:51 2017

@author: huguesrichard & diegozea

Un premier jeu de fonctions pour faire des requêtes directement avec l’API RESTfull de ENSEMBL afin de récupérer tous les gènes homologues à un gene donné (par son nom courant)

thoraxe.transcript_query.transcript_query.dictseq2fasta(dseq, geneid, out)

Write fasta sequences from the exons.

thoraxe.transcript_query.transcript_query.filter_ortho(dortho, species=None, relationship='1:n')

Filter the dictionary of orthologues according to the list of names.

thoraxe.transcript_query.transcript_query.generic_ensembl_rest_request(extension, params, header)

Perform a generic request.

thoraxe.transcript_query.transcript_query.get_biomart_exons_annot(species_name, geneid, header=True)

Return transcript information from a ensembl geneid and species name.

thoraxe.transcript_query.transcript_query.get_exons_sequences(listensexons)

Return exon sequences.

From a list of ensembl exons id, it gets the list of exons with their sequences.

thoraxe.transcript_query.transcript_query.get_geneids_from_symbol(species, symbol, **params)

Return gene ID from symbol.

From a species and a symbol, return the set of geneids corresponding to the gene symbol given. Uses the /xrefs/symbol RESTfull command example: get_geneids_from_symbol(“human”, “MAPK8”)

thoraxe.transcript_query.transcript_query.get_genetree(ensgeneid)

Return the gene tree.

Get the gene tree around the gene geneid as of now, the whole tree is returned.

thoraxe.transcript_query.transcript_query.get_listofexons(ensgeneid, **params)

Return list of exons.

From an ensembl gene id, gets the list of exons that are composing this gene by default restricted to the coding exons.

thoraxe.transcript_query.transcript_query.get_listoftranscripts(ensgeneid, species, **params)

Return list of transcripts.

From an ensembl gene id, gets the list of transcripts overlapping this gene.

thoraxe.transcript_query.transcript_query.get_orthologs(ensgeneid, **params)

Get the orthologs from the gene with id ensgeneid.

thoraxe.transcript_query.transcript_query.get_transcripts_orthologs(ensgeneid, lorthologs)

Return transcript list from orthologs.

Wrapper function to call multiple times get_listoftranscripts, given a ensembl geneid and a list of orthologs provided by get_orthologs Data structure for each ortholog is

{dn_ds : float, method_link_type : str,rop in
    source : {},  target : {}, taxonomy_level : str,
    type: Enum(ortholog_one2one,
               ortholog_one2many,
               within_species_paralog)}

The dicts for source and target store information about gene sequence, data structure:

{"align_seq" : str, "perc_pos" : float, "id" : str,
"protein_id" : str, "perc_id" : float, "cigar_line" : str,
"taxon_id" : int, "species" : str}
thoraxe.transcript_query.transcript_query.is_esemble_id(name)

It returns True if name is an stable Ensembl ID.

Stable IDs are created in the form ENS[species prefix][feature type prefix][a unique eleven digit number].

thoraxe.transcript_query.transcript_query.lodict2csv(listofdicts, out, fnames=None, header=True)

Write a dictionary list with csv formatting to the stream out.

Parameters:
  • fnames – when provided as a list, it is used as the column selection, otherwise all keys occuring at least once are used.

  • header – should the header be written

thoraxe.transcript_query.transcript_query.main()

Main script function to download transcript data from ENSEMBL.

thoraxe.transcript_query.transcript_query.parse_command_line()

Parse command line.

It uses argparse to parse transcript_query’ command line arguments and returns the argparse parser.

thoraxe.transcript_query.transcript_query.save_ensembl_version(output_folder)

Save a ensembl_version.txt file with the version number of Ensembl.

thoraxe.transcript_query.transcript_query.write_tsl_file(path, l_of_sptr)

Write a TSL file from a list of transcripts.

Module contents

transcript_query: Download transcript data from Ensembl.

Function to obtain information of gene transcripts and exons from the Ensembl database. It creates the folder tree needed for the thoraxe pipeline.