Simple Sequence Repeat Marker Loci Discovery Using SSR Primer

Simple Sequence Repeat Marker Loci Discovery Using SSR Primer
of 2
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  BIOINFORMATICS APPLICATIONS NOTE  Vol. 20 no. 9 2004, pages 1475–1476DOI: 10.1093/bioinformatics/bth104 Simple sequence repeat marker loci discovery using SSR primer   Andrew J. Robinson 1,2  , Christopher G. Love 1,2  , Jacqueline Batley  1  ,Gary Barker  3  and David Edwards 1,2 ∗ 1 Plant Biotechnology Centre, Primary Industries Research Victoria, La Trobe University,Bundoora 3086, Victoria, Australia, 2 Victorian Bioinformatics Consortium, Plant Biotechnology Centre, Primary Industries Research Victoria, La Trobe University,Bundoora 3086, Victoria, Australia and  3 School of Biological Sciences,University of Bristol, BS8 1UG, UK  Received on May 6, 2003; revised on August 10, 2003; accepted on December 17, 2003 Advance Access publication February 12, 2004 ABSTRACTSummary: Simple sequence repeats (SSRs) have becomeimportantmolecularmarkersforabroadrangeofapplications,such as genome mapping and characterization, phenotypemapping, markerassistedselectionofcropplantsandarangeof molecular ecology and diversity studies. With the increasein the availability of DNA sequence information, an automatedprocesstoidentifyanddesignPCRprimersforamplificationofSSRlociwouldbeausefultoolinplantbreedingprograms.Wereport an application that integrates SPUTNIK, an SSR repeatfinder, with Primer3, a PCR primer design program, into onepipeline tool, SSR Primer. On submission of multiple FASTAformatted sequences, the script screens each sequence forSSRs using SPUTNIK.The results are parsed to Primer3 forlocus-specific primer design.The script makes use of a Web-based interface, enabling remote use. Availability: This program has been written in PERL andis freely available for non-commercial users by request fromthe authors. The Web-based version may be accessed at  Contact: Simple sequence repeats (SSRs) or microsatellites have beenshown to be one of the most powerful genetic markers in bio-logy. Defined as runs of tandem repeated DNA, they exhibita high degree of polymorphism due to the mutation affectingthenumberofrepeatunits(Tautz,1989).Thishypervariabilityamong related organisms makes them excellent markers forgenotype identification, analysis of genetic diversity, pheno-type mapping and marker assisted selection of crop plants.Eukaryotic genomes contain a large number of SSRs. Thisabundanceallowstheirusefortheconstructionofhigh-densitygeneticmapsandenablesthemoleculartaggingofgenes. Thenature of SSRs gives them a number of advantages over othermolecular markers: (i) multiple SSR alleles may be detected ∗ To whom correspondence should be addressed. at a single locus using a simple PCR-based screen, (ii) SSRsare evenly distributed all over the genome, (iii) they areco-dominant, (iv) very small quantities of DNA are requiredfor screening and (v) analysis may be semi-automated.A common method for the discovery of SSR loci is con-structinggenomicDNAlibrariesenrichedforSSRsequences,followed by DNA sequencing (Edwards et al ., 1996). Thisproduction of enriched libraries is time-consuming, and thespecific sequencing required is expensive. Where abundantsequence data are already available, it is more economicaland efficient to use computational tools to identify SSR loci.Flanking DNA sequences may then be analysed for the pres-enceofsuitableforwardandreversePCRprimerstoassaytheSSR loci. Several computational tools are currently availablefor the identification of SSRs within sequence data as well asfor the design of PCR primers suitable for the amplificationof specific loci. We have integrated two such tools within onepackage, enablingthesimultaneousdiscoveryofSSRswithinbulk sequence data and the design of specific PCR primersfor the amplification of these marker loci. An integrated Webinterface further permits the remote use of this tool.Sequences are initially parsed to SPUTNIK (Abajian,1994,  /), which uses a recursivealgorithm to search for repeated patterns of nucleotides of length between 2 and 5. The output of SPUTNIK is thenparsed to Primer3 (Rozen and Skaletsky, 2000, primer design. Primers are designed to a defined set of constraints,suchasoligonucleotidemeltingtemperature,size,GCcontent,primer–dimerpossibilities,PCRproductsizeandpositional constraints around the SSR, to identify the optimalforward and reverse primers for the SSR flanking region. PROGRAM OPTIONS SSR Primer is a Web-based tool that may also be run on thecommand line. The input is in the form of multiple FASTA Bioinformatics  20(9) © Oxford University Press 2004; all rights reserved. 1475   b  y g u e  s  t   onM a r  c h 1 1  ,2  0 1  3 h  t   t   p :  /   /   b i   oi  nf   or m a  t  i   c  s  . oxf   or  d  j   o ur n a l   s  . or  g /  D o wnl   o a  d  e  d f  r  om  A.J.Robinson et al. format DNA sequences. Primer3 options are default, with thefollowing exceptions selected to increase primer specificity.One set of primer pairs is designed at least 10 bp distantfrom either side of the identified SSR. The optimum size forthe primers is 21 bases, with a maximum of 23 bases. Theoptimum melting temperature is 55 ◦ C, with a minimum of 50 ◦ C and a maximum of 70 ◦ C. The optimum GC contentis set to 50%, with a minimum of 30% and a maximum of 70%. While these and additional primer design options maybemodifiedwithinthescript, theauthorssuggestmaintainingthese strict criteria to ensure robust PCR amplification. PROGRAM FLOW AND DEPENDENCIES The input sequences are subdivided into groups of 10. EachgroupisparsedtoSPUTNIKandtheoutputparsedtoPrimer3.The results from SPUTNIK and Primer3 are combined andappended to a results file. The SSR primer discovery toolrequires SPUTNIK and Primer3 as well as PERL. Input of Web form data is limited to 256 KB or ∼ 200000 bp. PROGRAM OUTPUT Web version: The output of SSR Primer is a combined resultstable in HTML format providing summary information onthe SSRs identified and a listing of candidate PCR primers.The table includes the sequence ID (derived from the FASTAheader), SSR repeat sequence, statistical output from SPUT-NIK,forwardandreversePCRprimersequencesandstatisticsrelating to the designed primers. Command Line Version: The command line version pro-duces the same output as the Web version, though as a tabdelimited text file. PERFORMANCE Web version: 270 expressed sequence tags (ESTs) represent-ing 215437 bp of  Brassica napus sequence were processedthrough Primer iden-tified and designed PCR primers for 24 loci within 25 s. Command Line Version: The SSR Primer discovery toolwas executed on a Sun Solaris Ultra-250 400 MHZ with2 GB RAM. A FASTA file containing 397673 wheat ESTsequences (183 MB) was processed in 3 h 23 min anddesigned PCR primer pairs for a total of 70705 SSRs(5720 dinucleotide, 46508 trinucleotide, 10895 tetranuc-leotide and 7582 pentanucleotide). A further FASTA fileof 300870 Brassica oleracea genomic sequences (192 MB)was processed in 2 h 42 min and designed PCR primerpairs for a total of 46949 SSRs (18194 dinucleotide,14096 trinucleotide, 6252 tetranucleotide and 8407 penta-nucleotide). These and further datasets processed represent-ing vertebrate, fungal and plant genomes are available at  DISCUSSION The application of SSR Primer provides an unprecedentedavailabilityofcandidateSSRmolecularmarkers. Thisabund-ancepermitsselectionofmarkersthatmaybemostsuitableforspecific applications or particular organisms. Where a com-plete genome sequence is available for an organism, SSRsmay be annotated with their physical position on the genome.Markersmaythenbeselectedeitherfortheirlocationwithinaspecific region of interest or for their even distribution acrossregions. Where a full genome sequence is unavailable, thelocation may be predicted through synteny with a sequencedgenomeorthroughpreviousmappingexercises. Furthermore,for species that exhibit low levels of polymorphism at SSRloci, candidate polymorphic loci may be predicted throughmining large sequence datasets. The presence of SSR poly-morphismswithinalignedsequencesofdifferentsrcinwouldbe indicative of the level of polymorphism at that locus.These selection strategies could greatly reduce the time andcost associated with the development and application of SSRmarkers and provide public SSR marker resources to promotesharing of associated analysis data. Integration of these SSRdata with genome databases would provide further benefits togenome researchers. ACKNOWLEDGEMENTS D.E. and C.L. receive support from the Victorian Bioinform-atics Consortium. A.R. was funded by a scholarship fromLa Trobe University in conjunction with Plant BiotechnologyCentre and the Victorian Bioinformatics Consortium. REFERENCES Edwards,K.J., Barker,J.H.A., Daly,A., Jones,C. and Karp,A.(1996) Microsatellite libraries enriched for several microsatellitesequences in plants. Biotechniques , 20 , 758–760.Rozen,S. and Skaletsky,H.J. (2000) Primer3 on the WWW forgeneral users and for biologist programmers. In: Krawetz,S.and Misener,S. (eds), Bioinformatics Methods and Protocols: Methods in Molecular Biology. Humana Press, Totowa, NJ,pp. 365–386.Tautz,D. (1989) Hypervariability of simple sequences as a generalsource for polymorphic DNA markers. Nucleic Acids Res. , 17 ,6463–6471. 1476   b  y g u e  s  t   onM a r  c h 1 1  ,2  0 1  3 h  t   t   p :  /   /   b i   oi  nf   or m a  t  i   c  s  . oxf   or  d  j   o ur n a l   s  . or  g /  D o wnl   o a  d  e  d f  r  om
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks