Brian Byrne works in the Dairy Science Laboratory of National Reference Laboratory Listeria monocytogenes. He recently attended 'The 5th Next Generation Sequencing Workshop: Listeria monocytogenes' at the Austrian Agency for Health and Food Safety (AGES).
Funding for this visit was provided under the Training and Mobility Funding Programme. More information on the programme is available here.
AGES are the leading expert organisation in risk minimisation in health, food safety, as well as consumer protection. There are six different AGES institutes in Austria, with 82 reference centres and laboratories providing scientific expertise in food testing, veterinary medicine and agriculture to the public.
In the area of microbiology AGES are the National Reference Laboratory (NRL) for a number of pathogenic bacteria, including the NRL for L. monocytogenes for both Clinical (Human) and Food (Animal) isolates. They are heavily involved in microbial research and subsequently have multiple publications within different areas of microbial research. One of the areas they have published extensively in is the molecular characterisation of L. monocytogenes using multiple-locus variable-number tandem-repeat analysis (MLVA), pulsed field electrophoresis (PFGE) and more recently whole Genome Sequencing (WGS).
In our laboratory (Listeria monocytogenes- National reference laboratories) we currently use PFGE, which is the current reference method for the molecular characterisation of L. monocytogenes. However, this method is slow, laborious and is not as discriminatory as other sequenced based methods.
The next step in the evolution of routine molecular characterisation of L. monocytogenes is the use of WGS. The WGS or next generation sequencing (NGS) method was initial very expensive and was not practical to use as a routine characterisation method. However, in recent years the cost of WGS has dramatically dropped, allowing the method to be used as a routine tool. The WGS is faster and offers much better discriminatory power when compared to PFGE or MLVA.
The method involves the sequencing of the majority (approx. 80%) of the bacterial genome. The sequence data is assembled, annotated and compared to public sequence databases that are populated with sequence information for specific target genes within a particular genome. The resulting sample sequence is assigned an identification code, to allow for the rapid identification of a test isolate within a population.
Aims and Objectives
AGES and partners (University of Muenster and Illumina) designed a WGS workshop that provided practical training on WGS molecular typing skills using a MiSeq sequencer. This training will aid laboratories with the development and implementation of a WGS typing method for L. monocytogenes in their own laboratories.
The workshop divided the WGS method in to five different steps. The WGS steps were:
1. DNA Extraction
The DNA extracted from the sample must be:
- High Molecular (~50-100kb)
- Highly pure - No contaminating RNA, proteins and/or reagents
- Nano Drop to check DNA purity
- DNA quantity
- Ion Torent PGM
- 1 flg
- 100ng (with extra amplification)
- Ulumina MiSeq
- 50ng (Nextera)
- 1ng (Nexteria XT)
- Fluorometer used to evaluate the DNA quantity
- Ion Torent PGM
2. Library Preparation
When producing a DNA library, the DNA for each sample must be first fragmented. There are many different mechanisms of fragmenting DNA, such as Hydordymanic shearing, Enzymatic, Nebulization, Forced acoustic, Ultrasonication and Tagmentation.
The method used is largely dependent on what platform you are going to sequence your fragments e.g. Ion-Proton or MiSeq.
The first step of library prep is to choose a fragmentation kit. On the Illumina website there's a fragmentation kit selection tool, which best suits the target DNA. The Nextera XT kit is considered to be the most appropriate kit when sequencing bacterial (small) genomes.
The sample DNA is Tagmented (fragmented and tagged) using transposons and a PCR adds adapters and indices to each fragment (Nextera XT kit). The fragments for each sample are normalised to ensure that each sample have a uniform concentration of DNA.
Normalisation can be achieved using two methods:
• Quantify DNA in triplicate (qPCR), calculate dilutions and manually dilute pool to desired cone. This method can be very time consuming if you have multiple samples.
• The second method uses magnets (MagAttract) to only extract a certain quantity of DNA. The magnets have a specific DNA capacity, when the capacity is full the remaining DNA is removed leaving each magnet with a uniform DNA concentration per sample.
• The PhiX quality control is added to the library pool at this point
3. Sequencing Run (MiSeq)
A sample sheet indicating sequencing parameters is setup to communicate the different sequencing, sample info and kit information to the Sequencer. Once the sequencing parameters are added the library pool (fragmented sample) is loaded into the cartridge. The cartridge and flow cell are loaded into the MiSeq and the sequencing can start.
4. Run Data Quality Control Does
A critical part of the WGS is the quality control checks associated with each step of the process. The following tools can be used to aid in this process:
- The PhiX control can determine where in the sequencing process (library preparation or sequencing run) an error occurred.
- The Illumina Sequence Analysis Viewer (SAV) checks:
- QScore distribution - ensures Q30 is within tolerance e.g. 75% Q30 at 2x250bp
- Cluster passing filter- cluster density must be greater than 80%
- Flow cell (FC) - the colour intensity of FC must go from light to intense colour
- Index - check the indexing of samples
- FastQC checks the quality of the sequencing run
- Ridom SeqSphere: checks the data to ensure the final (assembled and annotated) sequence is of a very high quality.
Hardware and Software requirements:
To analyse the WGS data the computer that runs the different Bioinformatics software must have:
• 64 bit (or greater) system
• RAM must be 32GB or more
• Hard drive storage of 1TB (Raid)
The Whole Genome data generated from the WGS run does not contain all the sequence data for the full (closed) genome; the actual sequence is closer to ~ 80% of the target genome. The gaps in data are known as scaffolding and are thought to be largely consisting of ribosomal RNA. The WGS (80% of genome) data generated from the sequence run is stored on FASTQ files. The files are later analysed using different bioinformatics tools to assemble, annotate and subsequently deliver a sequence type id for each sample analysed.
There are two mechanisms put forward to analyse raw WGS data:  Single nucleotide polymorphisms (SNP's or SPN calling) - look for every possible SNP in all sequencing data; and  whole genome multilocus sequence typing (wgMLST or allele calling) - looks for differences within a predetermined set of target loci (coding regions).
The difference between both, SNP’s detect more than one mutation within a given loci as a multiple polymorphism where wgMLST identify multiple SNP's within a target loci as one difference within a target loci.
Before the WGS raw data can be assembled, a decision must be taken on how the data is going to be assembled. There are two methods used to assemble the raw data:
1. De novo Assembly - where the raw reads are assembled using all known published reads. Sequence assembler software (SPAdes or Velvet) is used to assemble the sequence data. De novo assembly typically uses the gene by gene (wgMLST or core genome MLST (cgMLST)) approach.
2. Reference Assessed Assembly- uses one reference sequence to assemble raw data. It detects only what is present/absent in reference sequence. This method is frequently used in monomorphic bacteria (M. tuberculosis), where there is not a lot of difference within the genome. Typically used for SNP calling
The raw assembled reads are aligned to a reference genome(s) (that is annotated with target loci) to determine what loci each sequence codes for. The following tools can be used to align and annotate sequences:
• Aretemis- DNA sequence viewer and Anotation Tool
• BRIG- Blast Ring Image Generator
SeqSphere (Ridon) or BioNumerics (Applied Maths) are closed bioinformatics pipelines, both are used to analyse WGS data. They consist of computational algorithms that make it easier for the user to analyse WGS data. The pipelines compare the sample sequence information to both public and private servers, which are populated with WGS data. The sequence data can be compared to a number of different servers populated with different target sequences, such as MLST/rMLST, cgMLST, SNP, antibiotic resistant and or virulence targets.
Transfer of Knowledge
The information from this workshop will be used to develop a standard operating procedure (SOP) for the WGS of Listeria monocytogenes and other pathogens within the NRL and other Backweston laboratories.
Benefits of Visit
This information was extremely beneficial and gave me a better understanding of WGS of L. monocytogenes and will aid in the development of the molecular characterisation of L. monocytogenes within the Listeria monocytogenes NRL.