``Never trust a tall dwarf. He's lying about something.'' (Solomon Short)
The principles presented within this thesis have been implemented in the mira33 assembler. Development of the assembler started in 1997, the 1.2 version of mira containing the methods discussed in this thesis was used at the Institute of Molecular Biotechnology (IMB) Genome Sequencing Centre Jena (GSCJ) and several other public institutes or private companies after having passed intensive testing in fall 1999, during which the overall concept has been refined. Since November 1999, the assembler was subject to constant scrutiny, performance improvement algorithmic redesign, the current version 2.2.8 was released in May 2004.
As stated in chapter 1, the aim of this thesis was to reduce assembly errors caused by repetitive sequences as well as increase the reliability of consensus sequences derived from automatically assembled projects. During the course of this thesis, it became apparent that two type of assembly projects could be taken to evaluate whether the methods and algorithms developed for this thesis could meet the expectations: i) assembly of highly repetitive eukaryotic genome sequences and ii) assembly of non-normalised EST projects, which contain per se a high degree of very similar mRNA sequences.
This chapter presents the results in three sections: the first presents and discusses qualitative results of the mira assembler for genome assembly, the second section gives an overview on results achieved for EST assembly and SNP detection and the third section discusses the results obtained in general.