Bioinformatics Analysis on Ribulose - 1,5 - bisphosphate Carboxylase/ Oxygenase Large Subunits in Different Plants

Ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCo) is a crucial enzyme in plant photosynthesis. Therefore, to elucidate the characteristics of RuBisCo is important in improving the efficiency of plant photosynthesis, especially the photosynthetic efficiency in staple crops which relates to the biomass and yield directly. In order to reveal the characters of RuBisCos from different higher plants, we analyzed the nucleotide sequences and deduced amino acid sequences of ribulose-1,5-bisphosphate carboxylase/oxygenase large subunits (rbcL) from Zea mays , Arabidopsis thaliana , Pisum sativum , Citrus sinensis , Phalaenopsis aphrodite subsp. formosana , emphasizing Oryza sativa subsp. Japonica , by the tools of bioinformatics. The sequences data were collected from the Genbank of National Center for Biotechnology Information (NCBI). The contents of the analysis cover following aspects: the compositions and the physical and chemical characteristics of nucleotide sequences and deduced amino acid sequences, signal peptide, transmembrane topological structure, hydrophobicity or hydrophilicity, and secondary structure of the polypeptide, nucleotide and amino acid sequences comparisons, and molecular systemic evolution of rbcL DNA sequences. As a result, the amino acid compositions of the rbcLs set out few differentiations. The physical and chemical characteristics are approximately identical among different higher plants. Signal peptide and transmembrane topological structure were not detected in the rbcLs. We classified the rbcLs as hydrophilic protein on account of the distributive features of the amino acid residues in the polypeptides. The rbcLs are mainly composed of α -helix and random coil which are interspersed with extended strand and β -turn elements. The nucleotide sequences and deduced amino acid sequences possess high homologies among different higher plants. The rbcL DNA sequences can reflect the evolutionary relationship among various higher plants clearly.


Background
Ribulose-1,5-bisphosphate carboxylase/oxygenase (EC 4.1.1.39,RuBisCo), transforming the carbon dioxide and ribulose-1,5-bisphosphate (RuBP) into two molecular 3-phosphoglyceric acid, catalyzes the first reaction of carbon dioxide fixation in photosynthetic dark reaction. Also, RuBisCo catalyzes the reaction of oxygen and RuBP to phosphoglyceric acid and phosphoglycollic acid, which is the first reaction of photorespiration. Therefore, RuBisCo is the key enzyme deciding the photosynthetic efficiency by regulating photosynthesis and photorespiration. Base on the dissimilarity of the primary and quarternary structures, the RuBisCos can be partitioned into three types: I form exists in higher plants and most prokaryotes, consisting of eight large subunits (50~60 kD) and eight small subunits (12~18 kD), presenting square symmetry structure (L8S8) (Andersson et al., 1989); form was discovered in purple non Ⅱ -sulfur photosynthetic bacteria, and composed of only two large subunits (L 2 ); form was dug out in Ⅲ Thermococcus kodakaracinsis lately by Kitano (Kitano et al., 2001), likewise formed with only large subunits, and no small subunit, appearing structure of (L 2 ) 5 . The RuBisCo large subunit (rbcL) gene of higher plants sets in the chloroplast DNA, and it is translated by chloroplast ribosome. On the contrary, the RuBisCo small subunit (rbcS) is synthesized in cytoplasm 80S ribosome because the gene exists in cell nucleus genome, and then transfers to chloroplast as precursor protein to assemble with large subunit after processed (Ellis, 1987;Roy H, 1989). To date, the rbcL gene has been cloned from a great many plants, such as Oryza sativa subsp. Japonica (Hiratsuka et al., 1989), Zea mays (Maier et al., 1995), Nicotiana tabacum (Shinozaki et al., 1986), Arabidopsis thaliana (Sato et al., 1999), Citrus sinensis (Bausher et al., 2006), Phalaenopsis aphrodite susp. Formosana (Chang et al., 2006), Astragalus mongholicus (Guo et al., 2010), Marchantia polymorpha (Ohyama et al., 1986;, Picea abies (Relle et al., 1995).
In this study, the rbcL nucleotide sequences and deduced amino acid (AA) sequences from various higher plants were analyzed by the tools of bioinformatics, expecting to provide some theoretical reference for further studies on plant RuBisCo. The plants include Zea mays, Arabidopsis thaliana, Pisum sativum, Citrus sinensis, Phalaenopsis aphrodite susp. formosana, and so on, emphasizing Oryza sativa subsp. japonica. And the following aspects were involved in the analysis: the compositions and the physical and chemical characteristics, signal peptide, transmembrane topological structure, hydrophobicity or hydrophilicity, secondary structure, sequence comparison, and molecular systemic evolution.

The compositions and the physical and chemical characteristics analysis of rbcL nucleotide sequences and deduced amino acid sequences from plants
The compositions and the physical and chemical characteristics of rbcL nucleotide sequences and deduced AA sequences were discribed by ORF Finder, DNAstar, ProtParam and pI/Mw. The analyzed rbcLs data were derived from Oryza sativa subsp. japonica, Zea mays, Arabidopsis thaliana, Pisum sativum, Citrus sinensis, Phalaenopsis aphrodite subsp. formosana. All the initiation codons of the rbcLs genes are ATG, and the termination codons are TAG or TAA. The lengths of ORFs are about 1434 bps, and the encoding proteins are approximately 477 AA residues. The molecular weight and theoretical isoelectric point of the polypeptides are similar among different plants. The proportions of acidic AA, alkaline AA, total electric AA, polar AA and hydrophobic AA in the total AA residues of the rbcLs show tiny differences. On the whole, the most abundant AA residues are Gly, Ala, Leu, Glu and Val. The rbcLs of Pisum sativum and Citrus sinensis belong to stable protein, while that of the other three plants are unstable protein, but the instability indexes of all rbcLs are close to 40% (Table 1).

The signal peptide analysis of plant rbcLs
The rbcL AA sequence signal peptide of Oryza sativa subsp. japonica was predicted by SignalP Server v. 3.0 online program (Nielsen et al., 1997;Bendtsen et al., 2004). The analysis was performed using Neural Networks Model (NN) method. The top values of original shearing site (C score), signal peptide (S score), and synthesized shearing site (Y score) are 0.059, 0.133, and 0.021, which locate at the 24 th , 4 th , and 8 th AA residues, respectively ( Figure 1). All the scores are far less than the critical threshold. Moreover, the probability of presence of signal peptide in the analysis of polypeptide applying Hidden Markov Models (HMM) method is zero. Therefore, it indicates that no signal peptide shearing site exists in the rbcL polypeptide. The similar results were observed in the prediction of rbcL AA sequences of Zea mays, Triticum aestivum, Arabidopsis thaliana, Citrus sinensis. Accordingly, it was inferred that the rbcLs polypeptide synthesized in higher plants chloroplast don't require to be protein transmembrane transfered.

The hydrophobicity and hydrophilicity analysis of plant rbcLs
The hydrophobicity and hydrophilicity analysis of the rbcL AA sequence of Oryza sativa subsp. japonica was fulfilled with ProtScale program (Kyce and Doolittle, 1982). The most hydrophilic AA residue in the polypeptide is Asn, located at 306 th , because of the lowest score of -2.644. And the most hydrophobic AA residue is Ala, situated at 378 th , which has the top score of 1.778. As for the whole polypeptide, the hydrophobic and hydrophilic AA residues distribute uniformly, but the number of hydrophilic AA residues is higher than that of hydrophobic AA residues, and any obvious hydrophobic AA residues concentrative region can't be detected (Figure 3). Similar distributive rule of hydrophobic and hydrophilic AA residues was found in other rbcL AA sequences from Nicotiana tabacum, Lolium perenne, Medicago truncatula, Pisum sativum, and Citrus sinensis. Thus, the results implies that the rbcLs in higher plants are hydrophilic protein, which is in accord with the previous conclusion that transmembrane topological structure is absent in rbcLs of higher plants. Figure 3 Hydrophobicity and hydrophilicity analysis of rice rbcL

The rbcL secondary structure analysis of plants
The rbcL polypeptide secondary structure of Oryza sativa subsp. japonica was detected with SOPMA (Geourjon and Deléage, 1995). Alpha-helix and random coil are the principal structural elements in rbcL polypeptide of Oryza sativa subsp. japonica, and extended strand and β-turn occupy a little scale, which intersperse among the whole protein ( Figure 4). According to the statistic assay consequence, the proportions of α-helix, extended strand, β-turn and random coil in the rbcL secondary structural components of  (Higgins and Sharp, 1988;1989;Thompson et al., 1997;Jeanmougin et al., 1998) and DNAMAN software. Super identities of rbcL AA sequences were illustrated among higher plants, regardless of between gymnosperm and angiosperm, or between dicotyledon and monocotyledon ( Figure 5). It was demonstrated that there are exceeding conservatism and homologies among higher plant rbcLs.   (Saitou and Nei, 1987;Tamura et al., 2004), using Neighbor-Joining (NJ) method. The rbcL DNA sequences, derived from thirteen higher plants, were assembled into two big clusters. One group is comprised of Cycas taitungensis, Cathaya argyrophylla, Pinus thunbergii, and other ten plants constitute the other group. All the thirteen plants originate from a common ancestor. In the phylogenetic tree, the connection of the branches reflects the rbcLs evolutionary relationship of different plants clearly. The three Gymnospermous plants, containing Cycas taitungensis, Cathaya argyrophylla and Pinus thunbergii, constitute a branch, which is distinguished from the other branch that include other ten plants, belonging to angiosperm. And the ten angiospermous plants can be further divided into two branches of dicotyledon and monocotyledon. Also, the Solanaceae plant, containing Nicotiana tabacum and Solanum lycopersicum, and the Poaceae plant, including Oryza sativa subsp. japonica, Zea mays, and Saccharum officinarum, constitute two small branches, respectively, on account of their close relationship ( Figure 6). The molecular level evolutionary relationship was applied in biological systemic taxology widespreadly, after the advance of "molecular evolutionary clock" and "neutral theory" in 1960s. A few divergences are present in the application of molecular evolution to biological taxonomy, due to the dispute of "constant speed of sequence evolution" and "darwinian positive selection" in academic world. However, it is acknowledged that the evolutionary units above family can be differentiated exactly with the phylogenetic analysis of DNA and AA sequence, which was proved adequately in this study. The Zea mays and Saccharum officinarum are separated from Oryza sativa subsp. Japonica correctly (Figure 6), in virtue of their closer relationship, even though all the three plants belong to Gramineae.

Discussion
In this study, we demonstrated that the rbcLs from different higher plants don't possess signal peptide, transmembrane topological structure and the traits of hydrophobic protein. The principal secondary structural elements are α-helix and random coil. The compositions and the physical and chemical characteristics are similar, and extremely high homologies were exhibited among different higher plants. The evolutionary relationship reflected by DNA sequences corresponds with traditional botanical taxonomy.
It is known that the sequences and structures of rbcLs from different higher plants get high homologies, and the similarities of that are above 80%, while the similarities of rbcSs are much smaller and less than 50%. All the analyzed rbcL ORFs from higher plants are about 1434bp, and translate into polypeptides that consist of nearly 477 AA residues (Table 1). The similarities of the rbcL AA residues from different higher plants are more than 97%, and the inferior homological region in the rbcL polypeptide mainly locates at the C-terminal ( Figure 5). The high homology of rbcLs indicates the importance of structural stability in maintaining high catalytic efficiency. Also, it implies that the overwhelming majority of rbcL AA residues play a crucial role in keeping the structural stability, as the report that the RuBisCo catalytic efficiency can be altered obviously when some AA residues of rbcL were substituted (Chen et al., 1988;1993;Seokjoo and Robert., 1997;Bainbridge et al., 1998;Pippa et al., 1998).
As a double functional enzyme, RuBisCo catalyzes the oxygenation reaction of RuBP when it is catalyzing the carboxylation reaction of that. Because of the characteristics of RuBisCo, the plant will suffer a great loss of about 20-50% of the organic carbon, fixed by the carboxylation reaction, no merely energy (Li et al., 2001). So in theoretically, the improvement of crop RuBisCo is a breakthrough point in crop variety improvement using modern biotechnology, and has a tempting perspective (Mann, 1999;Parry et al., 2007). Up to now, rapid progress has been making in studies on RuBisCo structures, biological functions and regulations, and enzymatic characters, but it is still theoretical in improving crop photosynthetic efficiency and increasing yield via the modification of RuBisCo. Therefore, further exploration of RuBisCo natures and molecular characteristics are indispensable to lay a solid foundation of enhancing crop RuBisCo catalytic efficiency and increasing the photosynthetic output, for instance, the diversity of RuBisCo structures and functions among different plants, environmental regulations and active mechanisms, and the relationship of protein structures and functions.

Author Contributions
BJZ and LGL have finished the paper, XXZ, SY, RLL, DWL, YYN, YBZ, QGL and YHW also read the manuscript and revised it. All authors had read and consented the final text.