thoraxe.transcript_query package
Submodules
thoraxe.transcript_query.transcript_query module
Created on Fri Apr 14 16:32:51 2017
@author: huguesrichard & diegozea
Un premier jeu de fonctions pour faire des requêtes directement avec l’API RESTfull de ENSEMBL afin de récupérer tous les gènes homologues à un gene donné (par son nom courant)
- thoraxe.transcript_query.transcript_query.dictseq2fasta(dseq, geneid, out)
Write fasta sequences from the exons.
- thoraxe.transcript_query.transcript_query.filter_ortho(dortho, species=None, relationship='1:n')
Filter the dictionary of orthologues according to the list of names.
- thoraxe.transcript_query.transcript_query.generic_ensembl_rest_request(extension, params, header)
Perform a generic request.
- thoraxe.transcript_query.transcript_query.get_biomart_exons_annot(species_name, geneid, header=True)
Return transcript information from a ensembl geneid and species name.
- thoraxe.transcript_query.transcript_query.get_exons_sequences(listensexons)
Return exon sequences.
From a list of ensembl exons id, it gets the list of exons with their sequences.
- thoraxe.transcript_query.transcript_query.get_geneids_from_symbol(species, symbol, **params)
Return gene ID from symbol.
From a species and a symbol, return the set of geneids corresponding to the gene symbol given. Uses the /xrefs/symbol RESTfull command example: get_geneids_from_symbol(“human”, “MAPK8”)
- thoraxe.transcript_query.transcript_query.get_genetree(ensgeneid)
Return the gene tree.
Get the gene tree around the gene geneid as of now, the whole tree is returned.
- thoraxe.transcript_query.transcript_query.get_listofexons(ensgeneid, **params)
Return list of exons.
From an ensembl gene id, gets the list of exons that are composing this gene by default restricted to the coding exons.
- thoraxe.transcript_query.transcript_query.get_listoftranscripts(ensgeneid, species, **params)
Return list of transcripts.
From an ensembl gene id, gets the list of transcripts overlapping this gene.
- thoraxe.transcript_query.transcript_query.get_orthologs(ensgeneid, **params)
Get the orthologs from the gene with id ensgeneid.
- thoraxe.transcript_query.transcript_query.get_transcripts_orthologs(ensgeneid, lorthologs)
Return transcript list from orthologs.
Wrapper function to call multiple times get_listoftranscripts, given a ensembl geneid and a list of orthologs provided by get_orthologs Data structure for each ortholog is
{dn_ds : float, method_link_type : str,rop in source : {}, target : {}, taxonomy_level : str, type: Enum(ortholog_one2one, ortholog_one2many, within_species_paralog)}
The dicts for source and target store information about gene sequence, data structure:
{"align_seq" : str, "perc_pos" : float, "id" : str, "protein_id" : str, "perc_id" : float, "cigar_line" : str, "taxon_id" : int, "species" : str}
- thoraxe.transcript_query.transcript_query.is_esemble_id(name)
It returns True if name is an stable Ensembl ID.
Stable IDs are created in the form ENS[species prefix][feature type prefix][a unique eleven digit number].
- thoraxe.transcript_query.transcript_query.lodict2csv(listofdicts, out, fnames=None, header=True)
Write a dictionary list with csv formatting to the stream out.
- Parameters:
fnames – when provided as a list, it is used as the column selection, otherwise all keys occuring at least once are used.
header – should the header be written
- thoraxe.transcript_query.transcript_query.main()
Main script function to download transcript data from ENSEMBL.
- thoraxe.transcript_query.transcript_query.parse_command_line()
Parse command line.
It uses argparse to parse transcript_query’ command line arguments and returns the argparse parser.
- thoraxe.transcript_query.transcript_query.save_ensembl_version(output_folder)
Save a ensembl_version.txt file with the version number of Ensembl.
- thoraxe.transcript_query.transcript_query.write_tsl_file(path, l_of_sptr)
Write a TSL file from a list of transcripts.
Module contents
transcript_query: Download transcript data from Ensembl.
Function to obtain information of gene transcripts and exons from the Ensembl database. It creates the folder tree needed for the thoraxe pipeline.