Page Reanotation of the genome of Carsonella Ruddii using non-collinear methods
Description will follow shortly
Layout of project
Project progress
Identification of the species set
- I will only used genomes from gammaproteobacteria. The set of species will be extended later
- For validation purposes I will analyze the species used in Moya et al. paper from 2007:
- U00096.2 (E.Coli)
- Four different str. of Buchnera aphidicola:
- BA000003
- AE013218
- AE016826
- CP000263
- In the original paper the authors do not consider any plasmids of the species.
- Set of species I have so far (all sequences were downloaded on the 01.06.10):
- Carsonella Ruddii PV (160 kb genome, 213 genes) (grammaproteobacteria) (AP009180.1)
- Buchnera aphidicola BCc (Cc) (+ a plasmid) : 450 kb. (grammaproteobacteria) (CP000263.1)
- Candidatus Blochmannia floridanus: 705 kb. (631 genes). (grammaproteobacteria) (BX248583.1)
- Wigglesworthia glossinidia (+ a plasmid): 698 kb. (651 genes) (gammaproteobacteria) (BA000021.3)
- Baumannia cicadellinicola str. Hc: 686 kb (651 genes) (grammaproteobacteria) (CP000238.1)
- To identify the phylogenetic tree of these species I intend to use the 16S-rRNA sequence.
Results of 16S-rRNA comparison
- I've extracted the 16S-rRNA sequence from all species. Then:
- I will use the ML-tree to guide the building of a progressive alignment (e.g. in S-LAGAN)
REINERT: Yes. Use S-Lagan or SuperMap. Use a working program do not spend to much time making others work. What are the next intended steps?
Think about one manageable outcome. You can certainly not re-evalutate the complete annotation.
Next steps (deutsch)
- Parsen von der aktuell annotierten Version des Carsonella Genoms und des ABA Ergebnisses.
- Statistische Auswertung des Ergebnisses mit R
- Finde ich dieselben Gene. Finde ich andere/neue Gene. Ist die Anzahl der gefundenen Gene signifikant anderes.
Problems and Questions
- Identification of a proper set of species appears to be not quite simple. C.Ruddii is classified as unclassified Gammaproteobacteria. I was told, that "normally" one can define a phylogenetic tree using the sequence of 16S rRNA, which has to be present in all bacteria in order for them to exist. (todo: check for a publication)
- Can anyone name me a alignment program that uses ABA (A-Bruijn Alignment)? I discovered AliWABA, but the webservice they provide is not available (
BIRTE: The authors provide an implementation for download. The link is given on page 2 of the paper:
IVAN: Thx. I've downloaded and installed it successfully.
- SuperMap:
- What is the CHAOS format; CHAins Of Seeds ?
- What format does the scoring file need?
- Supermap needs some GPDB config file… Genome Profile DataBase. Their website is sort of down right now
- ABA Problems:
- I do understand the output now, but it is still strange.
- Here (out1.pdf) and Here ( is the output of the example run of two small sequences (chloroplasts of two plants). In the nodes are the positions in each of the relevant sequence. The problem is that I only have two sequences here and in the output there are numbers ranging from 0 to 4…
- I don't understand the color of some edges => why are some edges colored? Do they represent a strongly supported "path"?