9  Sanger analysis

This is a section about perhaps one of the first generation sequencing technology, that is Sanger sequencing and its principles and evolution, strengths and limitations

This chapter will also cover the bioinformatics side of the method: what kind of files we get out Sanger, how we process them and an important paradigm that came out from this sequence method.

Finally we will dive into a study case that its very common in microbiology: the use of the 16S rRNA gene for taxonomy identification

9.1 The first sequencing methods

Back in 70s there two different methods aiming to determine the base sequences of DNA and they basically operated at the level of base per base reconstruction of the information. The most used during those days was so called :Maxam-Gilbert method and the second one was precisely the chain termination method or dideoxy sequencing or more commonly nowadays :Sanger sequencing. Both methods named after their creators. It was, however the Sanger method the one that gradually became the facto method and continue to be the standard for the next thirty years. In fact, assembled genomes before the 2000s were sequenced with Sanger method, which included the first draft of the Human genome and many other organism Brown (2018).

Figure 9.1: A globular structure of a bacterial DNA polymerase

9.1.1 The chain termination method

To under stand the chain termination method is important have a clear view of the DNA structure and chemical reaction that enable its polymerization. Whe the

Figure 9.2: The deoxy-ribose and dideoxy-ribose and consequences on the polymerization of a DNA strand
Figure 9.3: A step-by-step of the Sanger sequencing method or the chain termination method.

9.1.2 Sanger with capillary electrophoresis

9.1.3 Strengths and limitations of Sanger methods

9.2 Files from Sanger sequencing

Several files are generated from a modern Sanger sequencing project.

PHD.1
BEGIN_SEQUENCE H200824-010_A11_455-WT_spo0B_spo0B-R

BEGIN_COMMENT

CHROMAT_FILE: H200824-010_A11_455-WT_spo0B_spo0B-R
BASECALLER_VERSION: KB 1.4.0
TRACE_PROCESSOR_VERSION: KB 1.4.0
QUALITY_LEVELS: 99
TIME: Mon Aug 24 22:05:23 2020
TRACE_ARRAY_MIN_INDEX: 0
TRACE_ARRAY_MAX_INDEX: 18141
TRIM: -1 -1 -1.000000e+000
TRACE_PEAK_AREA_RATIO: -1.000000e+000
CHEM: term
DYE: big

END_COMMENT

BEGIN_DNA
C 7 3
C 2 17
T 3 44
…
END_DNA

END_SEQUENCE

However the most used file is the binary version AB1 which could be red out from different programs including cutepeaks. It displays the DNA sequence along with its quality peak in each position Fig. 9.4

Figure 9.4: A graphic of the AB1 file format from a Sanger sequencing project displayed in cutepeaks software

9.3 Sanger processing workflow

Despite being one of the oldest sequencing methods, Sanger sequencing has some remarkable strengths that are enable thanks to the automation era. For instance modern sequencers working in parallel can read almost 384 different sequences (of 750bp) in 1h (7Mb in 24h per machine), clearly automation introduces a second advantage and that is the low-price of sequencing small fragments Brown (2018). It is important to highlight some limitations as well like the necessity of round-the-clock technical support (this also leverages a little bit from robotic operators though). The error-free Sanger sequencing requires at least 5X of sample coverage, which could be seen as disadvantage too Brown (2018).

9.4 The 16S rRNA and its relevance for sequencing

Figure 9.5: Bacillus subtilis subsp. subtilis str. 168 bacterial SSU 16S rRNA
Challenge

Professor Valeska gives you a compressed file containing two genes (spo0B and rpoB) from three different strains (302, 321, 455). These genes were sequenced using pair end Sanger method, then the compressed file has the two raw sequences per gene. She tells you to process and analyse them using the Sanger sequencing pipeline analyis. Since she doesn’t know from which species they belong, she ask you to identify the organism to whom it belongs by using the resulting consensus sequence. She finally reminds you to document each step of the process including the identification step

See the solution :here