REDIdb Helper Page

HOW TO USE REDIdb

This is the main help page on how to search RNA Editing sites in REDIdb and use the embedded tools.

Searching RNA Editing sites
Browsing RNA Editing sites
REDIdb scripts download

Search RNA Editing sites in REDIdb

Searching into REDIdb is very intuitive and also users with no bioinformatics skills can perform accurate searches into our database. RNA editing sites can be retrieved by organism, by location or by gene. All these fields can be combined to get selective queries. Final result can be further filtered for "RefSeq", "Exons", "Full Orfs" by selecting their respective checkboxes on the query form.
Note: If the user does not make any choice, the search will be performed against the ENTIRE DATABASE.

Search RNA Editing sites by ORGANISM

RNA editing events by organism can be retrieved by selecting one of the voices in the Organism field.

Search RNA Editing sites by LOCATION

Search can be done also for organelle by choosing mitochondrion/chloroplast in the Location field.

Search RNA Editing sites by GENENAME

If the user is interested in all the editing events occurring in a specific gene, also searching by genename is possible.

Additional Filters

Once the user makes his choice, the search can be further refined using selected filters.
Note: Filters can be variously combined according to the user's preferences.
The following options are admitted:

Filter	Name	Effects
	RefSeq	If the RefSeq square is checked, the final results will include ONLY records from the non-redundant NCBI Reference Sequence Database.
	Exons	By selecting the Exons filter, the final result will also include exons. Note: Exons are available if present in the Genbank flatfile from which the Redidb record was obtained. Editing events are numbered according to the exonic order.
	Full Orfs	Trough the Full Orfs filter the query will take into account only complete genes.

Filter

Name

Effects

RefSeq

If the RefSeq square is checked, the final results will include ONLY records from the non-redundant NCBI Reference Sequence Database.

Exons

By selecting the Exons filter, the final result will also include exons. Note: Exons are available if present in the Genbank flatfile from which the Redidb record was obtained. Editing events are numbered according to the exonic order.

Full Orfs

Trough the Full Orfs filter the query will take into account only complete genes.

Table Report

Once a query has been submitted, the corresponding results will be displayed in a Table Report including the following columns:

Column Name	Description
Genbank	The Genbank ACCESSION number of the flatfile from which the editing event/s has/have been extracted. By clicking on this number the user will be redirected to the Genbank record's page.
Organism	The NAME (genus and species) of the organism corresponding to that accession number. The binomial nomenclature is the same adopted by the NCBI Taxonomy database.
Taxonomy	The relevant TAXONOMY according to the NCBI Taxonomy database. By clicking on the button a circular phylogenetic tree based on jsPhyloSVG is displayed. (See the following section for further details)
Genome	If the COMPLETE GENOMIC SEQUENCE is available on GenBank, a is shown, otherwise a . By clicking on the button the complete genome is graphically rendered, allowing the user to visualize the reciprocal order of the edited genes. (See below for more details.)
Location	The CELLULAR LOCALIZATION of the editing event/s (mitochondrion or chloroplast)
Source	The MOLECULAR SOURCE of the record (gene, tRNA, ecc.). If the annotated sequence is complete the molecular source is followed by an , otherwise by a .
Gene name	The GENE NAME. Please consider that some genes might be called in different ways and thus, before submitting a query on REDIdb, check for gene aliases according to this page. By clicking on the gene_name button the REDIdb RECORD PAGE relative to each entry will be shown. (See below for more details)
Editing type	THE TYPE OF THE OCCURRING EDITING PROCESS. It can assume the following values: "Substitution" , "Insertion" or "Deletion" , according to the fact that a specific sequence can be subjected to different RNA editing events at the same time.
Details	DETAILS contains all specific details for each editing type. In general, in the case of substitutions, they are showed as "genomic nucleotide-->modified cDNA nucleotide". For example, C to U or U to C substitutions are represented as "C-->U" or "U-->C", respectively.
Number of events	NUMBER OF EVENTS contains the total number of editing events for each editing type. The numbers are separated by commas in the case of different editing types.

All the data contained in the Table Report can be copied/pasted (1), printed (2), rearranged by hiding certain columns (3), or exported as Portable Document File) (4) or Comma Separated Values) (5). The latter format can be easily opened with spreadsheet softwares like LibreOffice^® or Ms Excel^®.
The number of rows displayed for each table can be set starting from 10 up to 100 entries (6); moreover, a search field (7) on the right corner of the page, allows the user to reduce the rows only to them matching a specific term.

REDIdb RECORD PAGE

By clicking on entries gene_name button the REDIdb RECORD PAGE will appear.

The page is organized in a Genbank-like style and composed by four main sections:

GENERAL INFORMATIONS. The header of the entry, containing its main descriptors:
- Organism. Genus and species: the binomial nomenclature is the same adopted by the NCBI Taxonomy database.
  By clicking on the organism name the user will be redirected to the Genbank record's main page.
- Division. The main taxonomy ranks according to the NCBI Taxonomy database.
- Location. Where the editing event/s has/have been reported (mitochondrion or chloroplast).
- Status. If the genome of that organism has been completely sequenced or not.
- Sequence. If the sequence in which the editing event/s has/have been reported is complete (from start to end) or not.
- Source.The type of the sequence (gene, tRNA, ecc.).
- Name. The standardized gene name, according to this page
- Genbank. The Genbank ACCESSION number of the flatfile from which the editing event/s has/have been retrieved.
- Pubmed. Pubmed accession numbers of papers relative to that organism.
  By clicking on each accession number the user will be redirected to the corresponding Pubmed page.

GO GENE ONTOLOGY. In case of protein coding genes, the Gene Ontology section describes the gene product properties.
According to the Gene Ontology Consortium, three main levels are covered:
- Molecular Function. The elemental activities of the protein at its molecular level (e.g. ion-binding, electron transport, ecc.);
- Biological Process. Defined biological events in which the gene product is involved.
- Cellular Component. The parts of a cell or its extracellular environment where the protein localizes;
For more detailed informations about Gene Ontologies, please visit the official Gene Ontology Consortium page.

Note. By clicking on the GO Accession Number (e.g. GO:0045156) the user will be redirected to the AMIGO^® page relative to that GO term.

EDITING FEATURES This section displays in a tabular form all the editing features that characterize the REDIdb record.
Each row is organized in four columns.
- GENOMIC POSITION. If the organism's genome has been completely sequenced, this column annotates the genomic index of the RNA editing event.
- CDNA POSITION. The position of the RNA editing event along the corresponding transcript.
- GENOMIC CODON. The unedited genomic codon; in case of double or triple editing affecting the same codon, all the possible nucleotidic combinations are examined.
- EDITED CODON. The corresponding edited codon (on retrotranscribed transcript).
- GENOMIC AA. The AA encoded by the genomic sequence.
- EDITED AA. The AA resulting by the RNA editing event.

SEQUENCES The last section contains the sequences relatives to the REDIdb record (each with lenght and base/AA count):
- GENOMIC SEQUENCE. The nucleotide genomic sequence.
- cDNA SEQUENCE. The corresponding transcript, annotated as cDNA.
- PROTEIN SEQUENCE. In case of protein coding genes, the edited protein sequence is included as well.

HOW TO DOWNLOAD SEQUENCES FROM REDIdb

Each sequence (genomic, edited transcript, protein) can be downloaded in a Fasta-like format by clicking on the button.

Browsing RNA Editing sites in REDIdb

Analysis of RNA EDITING CONSERVATION with MSA VIEWER^®

A major novelty in REDIdb is the inclusion of multiple-alignments between orthologous sequences. This aspect is very useful for assessing the editing conservation across the species annotated in the database.
Multiple sequence alignment for each gene (or protein) is generated by using the iterative progressive algorithm embedded with ClustalOmega^®. The graphic representation of the multi-alignment is achieved by means of MSAViewer^® a fast and lightweight Biojavascript component. By clicking on the button near the transcript or the protein sequence, the multiple alignment relative to orthologous of that sequence will be shown in a dedicated page.
Note:The button is available for complete sequences stored in the database. Each MSA box (shown below) contains:

REDIdb Accession Numbers relative to each aligned sequence.
Sequence logo with conservation patterns at each position in the MSA.
Bar chart showing nucleotide/AA conservation per position.
Vertical/horizontal scroll-bars.

Navigation through the alignment is possible by using the scroll-bars (4) or simply by dragging the mouse pointer over the sequences. By clicking on a REDIdb Accession Number its sequence it will be evidenced(5). Edited sites are highlighted in red (6).

Under each MSA box a table describes more in details each RNA editing site considering its position in the multialignment. The table contains the following columns:

Alignment position (Aln). The position in the aligned sequences of each editing site.
Number of sequences (N. Seqs). The total number of sequences contained in the multialignment.
Number of edited sequences (N. Edited Seqs). The number of sequences containing a particular editing site
e.g in the case of edited position 9, only EDI_000000860 shows the AA change R>C, so the N. Edited Seqs for that position is 1.
ID.REDIdb sequence id for each editing positon.
Organism. The Organism (genus/species) corresponding to that sequence id.
Seq Position. The position of each editing event in the multialignment.
Codon. The unedited genomic codon.
Edited codon. The corresponding edited codon (on retrotranscribed transcript).
AA. The AA encoded by the genomic sequence.
Edited AA. The AA resulting by the RNA editing event.

Sequences sharing the same editing position are coloured in the same manner on the first column. (e.g Aln 24).
Like the other REDIdb tables, the user can select how many rows wants to show per page.
A search filter on the right corner of the table, can also be used to reduce the rows only to them matching specific criteria.

Protein domain analysis with FeatureViewer^®

The distribution of RNA editing events along functional domains and predicted protein secondary structures are graphically rendered by mean of Nextprot^® Feature Viewer tool.
Protein domains have been detected using the InterPro engine, while secondary structures have been predicted using the stand-alone version of Spider2 program. By clicking on the button near a protein sequence, the PROTEIN DOMAINS AND STRUCTURE PAGE will appear.
This page contains three main sections:

A graphical box which displays the primary structure of the protein, its domains (inferred with InterProScan) and the editing events along the sequence.

A detailed table for each edited codon in the mature transcript, containing the following fields:

Gene name. The standardized gene name, according to this page.
AA position. The position of the RNA editing event along the protein sequence.
AA. The AA encoded by the genomic sequence.
Edited AA. The AA resulting by the RNA editing event.
Codon. The unedited genomic codon.
Edited codon. The corresponding edited codon (on retrotranscribed transcript).

A table for each protein domain which contains the following columns:

Domain DB. The database from which the domain has been retrieved.
Start. Position of the first AA beloning to the domain.
End. Position of the last AA beloning to the domain.
Accession. Accession number of the domain in the primary database from which has been extracted.

Like the other REDIdb tables, the user can select how many rows wants to show per page. A search filter on the right corner of the tables, can also be used to reduce the rows only to them matching specific criteria.

Taxonomy visualization with jsPhyloSVG^®

jsPhyloSVG^® is a javascript library specifically designed for rendering phylogenetic trees.
Its recent implementation in REDIDB, allows the user to visualize the taxonomy of each organism in a pleasant and interactive way.
A circular phylogenetic tree based on jsPhyloSVG is displayed by clicking on the in the Table Report page.

Taxonomy representation follows the Newick tree format. By moving the mouse cursor over a taxonomic rank, the corresponding tree node will appear highlighted in red.

Genome visualization

Another novelty introduced in this new version of REDIdb is the opportunity of studying the editing events in their genomic context.
By a set of custom python scripts complete genomes are graphically rendered in a dedicated web page.
The page is accessible by clicking on the button in the Table Report page.
Note: If the genome has not been completely sequenced, the same button will appear coloured in red .

The genome page contains three main sections:

The genome.

Every complete genome is automatically assembled through a series of custom python scripts according to the order of genes reported in the genbank flatfile.
If both RefSeq and non RefSeq data are available for the same organism, highest priority is given to RefSeqs.

(A)

(B)

The gene statistics.

Statistics such as the coding potential of the genome as well as the fraction of edited genes are available in a dedicated section under the genome graph. The section is organized in four main pie charts:

Coding bases. The percentage of coding vs non coding bases.
Main Functions. Functional distribution of the annotated sequences, according to the following voices:

ORFs
Energy
rRNAs
tRNAs
Other

Mt/Cp complexes. Electron transport chain proteins:

Mitochondrial complexes (I,II,III,IV)
Thylakoid membrane complexes (Photosystem II, Cytocrome b6/f, NADH dehydrogenase, ATP syntase, Photosystem I)

Edited/Unedited genes. The percentage of edited vs unedited genes.

The Gene table.

A table detailing each gene is available at the bottom of the genome page.
The table contains the following columns:

Organism. The name (genus and species) of the organism corresponding to the displayed genome.
The binomial nomenclature is the same adopted by the NCBI Taxonomy database.
Gene Name. The standardized gene name, according to this page.
Alternate Name. Alternative name with which the gene can be found (e.g. trnE/trnE-UUC).
Strand. The polarity of the strand from which the gene is transcribed (+/-).
Product. Protein products or RNA products that do not code for proteins (e.g. tRNA, rRNA) encoded by the gene.
Function. Biological function of the gene product.
N. Events. If the gene is subjected to RNA editing, the number of editing events is reported.
Editing Type. If the gene is subjected to RNA editing, the type of editing events is reported.

REDIdb scripts download

Following the pipeline used to generate the previous releases of REDIdb, all the records containing changes due to RNA editing are derived from GenBank.
The initial search is performed using the query '("RNA edited"[All Fields] OR "RNA editing"[All Fields])’ and positive entries are collected in textual format by using an ad hoc python script.
By using separate scripts, each entry is catalogued as mitochondrial or plastidial, and the corresponding feature table is parsed.
Each genbank record is dissected in its main features (CDS, intron, exon, tRNA, rRNA, misc_feature) using the “genbank” module contained in the Biopython SeqIO parser (e.g. SeqIO.parse(‘gb.record’,”genbank”)).
This allows the conversion of each record’s feature in a manageable sequence object with its main methods (e.g count, find, complement, transcribe, translate, extract, ecc.) and qualifiers (gene, product, strand, location, ecc).
A dictionary containing the nucleotide sequence, the genomic strand, the genomic coordinates and the sequence length is then generated for each type of record feature. In the specific case of the misc_features fields, under which the editing event are tipically reported, the associated dictionary contains the annotation (e.g. /note="C to U RNA editing"), the genomic position and the strand.
Taking advantage of the fact that a misc feature seq objects share the same qualifier with the feature to which it refers (CDS, tRNA, ecc.), by using this attribute as key of the aforementioned dictionaries, each editing event is uniquely mapped to its sequence.
Details like pubmed references, taxonomy, are obtained from each genbank record object's qualifiers generated by the SeqIO parser (e.g record.annotations['references'], record.annotations['taxonomy'], ecc.).
Due to the lack of control on the Biopython SeqIO nucleotide translation module (expecially in the case of truncated sequences and non standard codon tables) protein translation and aminoacidic changes are calculated by mean of a custom python module. Adhoc python code was also written for managing the remaining information appearing in REDIdb (gene ontologies, protein domains, ecc.). All the scripts used to generate the REDIdb database are open source and freely accesible upon request (at this page).
From this page you can download: