Development and Characterization of EST-SSR Markers from NCBI and cDNA Library in Cultivated Peanut (Arachis hypogaea L.)

86 132 ESTs downloaded from GenBank in NCBI and 12 501 ESTs from cDNA library constructed by high-oil linoleic acid accession E12 were analysed. After the preprocession, there were 18 051 singletons and 9 972 contigs in the GenBank of NCBI and cDNA library. Totally 3 104 SSR locis had been screened by MISA software, accounting for 11.08% for these non-redundant ESTs. All SSR locis are divided into di-nucleotide, thi-nucleotide, tetra-nucleotide, penta-nucleotide, hexa-nucleotide and multi-nucleotide etc., and thi-nucleotide motif is the most motifs and the frequency was 43.0% and 56.8% in NCBI and cDNA libraray, respectively. The number of diand penta-nucleotide motifs were second and third in all motifs. And the hexa-nucleotide was the least motif both in NCBI and cDNA library. In all repeat motifs nucleotide, AG/TC was the most motifs and accounted for 8.65% and 13.42% in NCBI and cDNA library respectively. Among the tri-nucleotide repeats, CTT/GAA was the most frequent motif, accounting for 6.7% and 13.42%, respectively. The repeat unit number of SSR locis is between 4 and 51.


Background
Peanut, or groundnut (Arachis hypogaea L., 2n=4×= 40), as a source of oil and protein, is the second-most important grain legume crop after soybean in most tropical and subtropical areas of the world (Dwivedi et al., 2003). The seed is comprised of around 50% oil, of which approximately 80% consist of oleic acid (36%~67%) and linoleic acids (15%~43%) (Moore and nauft, 1989). Additionally, the largest use is for oil, with the meal being used as a high-protein dietary supplement for human and animal consumption. In China and other countries, the peanut seed oil is used mainly in the cooking. Meanwhile, peanut may be used for fodder, and the shells used for fuel or livestock feed (Savage and Keenan, 1994).
Cultivated peanut exhibits a considerable amount of variability for various morphological, physiological, and agronomic traits. However, the genetic diversity observed is much lower in DNA level by RAPD (random amplified polymorphic DNA), AFLP (amplified fragment length polymorphisms), RFLP (restricttion fragment length polymorphisms) (Kochert et al., 1996;Hilu et al., 1995;Herselman, 2003). The low level of variation in cultivated peanut has been attributed to the barriers to gene flow from related diploid species to domesticated peanut as a consequence of the polyploidization event (Young et al., 1996). And the reason that few elite breeding lines and litter exotic germplasm are used in the breeding programs is other cause of the narrow genetic base.
Simple sequence repeat (SSR) markers are microsatellite loci that can be amplified by polymerase chain reaction PCR) using primers designed for unique flanking sequences. Polymorphism is based on variation in the number of repeats in different genotypes owing to polymerase slippage and point mutations (Kruglyak et al., 1998). SSR markers are (i) highly informative, (ii) locus-specific and frequently show co-dominant inheritance, (iii) adaptable to high-throughput genotyping, and (iv) simple to maintain and distribute. In recent years, significant efforts have been made to develop the SSR markers in groundnut and more than 800 SSR markers have been gained (Hopkins et al., 1999;He et al., 2003;Ferguson et al., 2004;Cuc et al., 2008). SSR markers include genomic SSR markers and EST-derived SSR markers. Genomic SSR markers have some disadvantages. Firstly, genomic SSR markers are derived from genomic BAC library, most of which are come from the intergenic regions with no gene function. Secondly, the procedures for developing those markers are difficult, complex, high-cost. At the same time, large-scale sequencing projects have produced a large amount of single-pass sequences of complementary DNAs (cDNAs), more and more EST sequences have been developed from different plant species http://www.ncbi.nlm.nih.gov/). ESTs contain SSR sequences in both coding and noncoding regions (Temnykh et al., 2001), and SSRs have been successfully developed from ESTs in many species (Thiel et al., 2003;Gao et al., 2004;Nicot et al., 2004;Sera-pion et al., 2004;Perez et al., 2005;Wang et al., 2005). 31 Recently, more than 80 000 ESTs are now available for peanut in GenBank of NCBI, and the number of these ESTs is increasing year by year. But development of EST-SSR markers are lagged behind compared of other species. In this research, totally 86 132 peanut ESTs from NCBI and 12 501 ESTs in cDNA library derived from E12 which comes from a Chinese landrace contains high oleic acid are analyzed. Here we report the successful development and characterrization of EST-SSR markers in peanuts. These markers can enrich the molecular biological resources in peanut.

EST sequences screening
86 132 ESTs of the EST database in GenBank were found and 12 501 ESTs from cDNA library of E12 were obtained. All the 98 633 EST sequences from NCBI and cDNA library were used for searching singletons and contigs (Table 1). In the Genbank of NCBI, there were 14 141 singletons and 9 892 contigs. And in cDNA library derived from E12, the number of singletons and contigs was 3,910 and 80 respectively. After screening EST-SSRs using MISA software in all singletons and contigs from NCBI and cDNA library, 2 463 (9.52%) EST-SSRs in NCBI and 641 (16.4%) EST-SSRs in cDNA library were found (Table 1). In NCBI database 2,443 ESTs contained one EST-SSR, and 20 ESTs had more than two EST-SSRs. In cDNA library, 641 ESTs contained one SSR loci, 82 ESTs have two EST-SSRs and four ESTs had three EST-SSRs (Table 1).

Distribution and frequency of EST-SSRs
All these EST-SSRs could be divided into six kind motifs, such as di-nucleotide, thi-nucleotide, tetranu- cleotide, penta-nucleotide, hexa-nucleotide and multinucleotide etc., The tri-nucleotide motif was the most frequent motif in both NCBI and cDNA library. And the frequency of tri-nucleotide was 43.0% and 56.8% respectively. The number of di-and pentanucleotide motifs was second and third in all motifs. And the hexanucleotide was the least motif in NCBI and cDNA library (Figure 1). The number of multinucleotide motif was 329 and 53, accounting for 13.4% and 8.3%, respectively. In terms of nucleotide repeat motifs, the AG/TC was the most motifs in all repeat types nucleotide in NCBI and cDNA library, and accounted for 8.65% and 13.42%, respectively. The following di-nucleotide repeat motif was AT/TC and CT/GA motifs. Among the trinucleotide repeats, CTT/GAA was the most frequent motif, accounting for 6.7% and 13.42%, respectively. And this type motif repeat is almost six and four times than the other tri-nucleotide repeats in NCBI and cDNA library respectively. The number of tetra-, penta-and hexa-nucleotide motifs is less than 2% in both NCBI and cDNA library. The AAAT/TTTA motif accounted for 1.26% in NCBI but there was no this type motif in cDNA library (Table 2). The maximum repeat unit number of di-nucleotide repeat motifs of AG/TC and CT/GA were 25 and 51 units in NCBI, respectively. And the numbers were 21 and 25 unit in cDNA library. In fact, in some studies, the markers developed for longer repeat motifs were found more informative for detection of polymerphism in cultivated groundnut germplasm (Moretzsohn et al. 2005).

Discussion
Peanut is one of important crops for both direct human food and oil production in the world. One of the major factors influencing peanut oil quality is the composition of polyunsaturated fatty acid. The linoleic acid is one kind of polyunsaturated fatty acid, and its acyl residues are susceptible to oxidation, which adversely impacts on oil stability and increases development of off-flavors commonly associated with rancidity in stored oil (Patel et al., 2004). So validating the mechanism of the polyunsaturated fatty acid synthesis and metabolism is the central goal to increase the peanut quality. In the recent research, we made use of high-oil linoleic acid accession E12 to construct the peanut cDNA library. This library contained 12 501 ESTs and 4 074 Unigene, which took part in many biological processes, such as ransporting and metabolizing amino acid and carbohydrate, energy metabolism process, transcription, protein translation and modification et al., And 641 SSR loci had been screened in this library, of which 624 ESTs had been designed EST-SSR markers. The AG/TC, CT/GA and CTT/GAA repeat motifs are the most SSR motifs in all nucleotide repeat motifs.
ESTs are currently the most widely sequenced nucleotide commodity from plant genomes in terms of number of sequences and total nucleotide count. During the past few years a great deal of attention has been directed towards discovering and characterizing the range of protein-coding genes existing within the genome of plant species with large genomes. The larger size of the peanut genome is a result of polyploidy and the presence of regions with repeat motifs, both of which make it difficult to sequence the complete genome. One possible method that could be used to investtigate genome coding regions is cDNA sequencing, which may be considered to be an alternative to the complete sequencing of the genome in those plants with large genomes. The availability of ESTs in public databases provides the opportunity to identify SSRs and to develop molecular markers. Consequently, the large deal of peanut EST-SSRs available developed from public databases is an important research resource which can be used to analyze the functional portion of the genomes. In the present study, there were 86 132 ESTs downloaded from Genbank in NCBI. 14 141 contigs and 9 892 singletons were obtained, these sequences contained 2 463 SSR loci and 1 943 EST-SSRs are developed. The type and frequency are as same as cDNA library.
In general, molecular markers, and microsatellites or simple sequence repeats (SSRs) in particular have proven very useful for crop improvement in many species (Gupta and Varshney, 2000). However, breeding applications using molecular markers in groundnut, which has been limited by the low level of the genetic variation in this species. This low level of genetic variation in cultivated groundnut is attributed to its origin from a single polyploidization event that occurred relatively recently on an evolutionary time scale (Young et al., 1996). However, additional contributing factors to the low levels of molecular polymorphism observed to date could be the marker techniques used and the amount of diversity of samples tested (Singh et al., 1998). In recent years, significant efforts have been made to develop the EST-SSR markers in groundnut (Ferguson et al., 2004;Moretzsohn et al., 2005;Mace et al., 2008). They should be valuable in genome mapping and population studies. The EST-SSRs have two major advantages over genomic SSRs. First, as EST-SSRs are part of or adjacent to functional genes, they can be used for the mapping and functional analysis of candidate genes. Second, because ESTs are more conserved than average genomic sequences, EST-SSRs may be more stable and transferable across species. The successful development of EST-SSR in the cultivated peanut should encourage similar efforts in other Arachis for which a large number of ESTs are available.

Identification of SSR-Containing ESTs
ESTs of the peanut were downloaded from the EST database of NCBI GenBank (http://www.ncbi.nlm. nih.gov/dbEST) in April 2009. And EST sequences of cDNA libraries derived from leaf tissues of high-oil acid accession E12 were obtained previously (data not shown). All sequences were downloaded or transformed to a text file in FASTA format.

ESTs sequences splicing
All EST sequences were spliced by CAP3 software (http://genome.cs.mtu.edu/sas.html). These ESTs are divided into singletons which are not spliced with other ESTs and contigs which are related to one another by overlap of their sequences.

EST-SSR screening
These singletons and contigs were screened for SSRs using the MISA software (MicroSAtellite, http://pgrc. ipk-gatersleben.de/misa/). For this study, the criteria for SSRs were set as sequences having at least eight repeats of dinucleotide and five repeats for all other repeats (tri-, tetra-, penta-, and hexa-nucleotide).

Design Primers
All SSR-containing ESTs were individually inspected for suitability for primer design. SSR-containing ESTs that contain sufficient flanking sequences of good quality (no unknown bases) were selected for primer design. Primers were designed using the PRIMER 3 software(http://frodo.wi.mit.edu/cgi-bin/primer3/prim er3_ www.cgi), with an optimal annealing temperature of 60℃ and a fragment size between 100 bp and 300 bp. AGC clamp was added at the 3' primer end when possible.