Genome size estimation of Chinese cultured artemisia annua L

Cite this article: Liu Z, Guo S, Xu J, Zhang Y, Dong L, et al. Genome size estimation of Chinese cultured artemisia annua L. J Plant Biol Crop Res. 2018; 1: 1002. Zhixiang Liu1; Shuai Guo1,2; Jiang Xu1*; Yujun Zhang1; Linlin Dong1; Shuiming Xiao1; Rui Bai3; Baosheng Liao1; He Su1,4; Ruiyang Cheng1; Shilin Chen1 1Key Laboratory of Beijing for Identification and Safety Evaluation of Chinese Medicine, Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China 2College of Agriculture/College of Peony, Henan University of Science and Technology, Luoyang, Henan 471023, China 3College of Pharmacy and Chemistry, Dali University, Dali, Yunan 671000, China 4Guangdong Provincial Hospital of Chinese Medicine, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, and China Academy of Chinese Medical Sciences Guangdong Branch, China Academy of Chinese Medical Sciences, Guangzhou, 510006, China


Introduction
Artemisiae annuae herba, the dried aerial part of the annual herbaceous plant Artemisia annua L. [1], characteristically synthesizes and accumulates the unique sesquiterpene endo peroxide lactone the antimalarial drug artemisinin.Artemisi-nin-based combination therapies (ACTs) is recommended to be the best choice for quick and reliable treating acute malaria by WHO [2][3][4].What's more, the artemisinin isolation enabled the inventor Professor Tu to receive the 2011 Lasker DeBakey Clinical Medical Research Award and the 2015 Nobel Prize in Physiology or Medicine [5].Moreover, antimalarial artemisinin was confirmed to have other multifunctions, such as anticancer [6,7], antiviral [8,9], and antischistosomal activities [10].
As still the main source of artemisinin A. annua plant is cosmopolitan species in the world (such as in Viet Nam and India), but most widespread in each province of China with the artemisinin content ranging from 0.1%~1.5% dried leaf weight, which affected by ecological environment and varietal difference [11].However, the A. annua strains that contain less than 0.5% artemisinin content could not be used as raw material for artemisinin, especially the strains in Northern China, Viet Nam and India (less than 0.1% artemisinin content) [11].It's urgent to increase the artemisinin yield by numerous attempts, focusing on genetic modification and bioengineering of artemisinin biosynthesis in plants during the last two decades [5,12,13].Unfortunately, under the condition of insufficient genomic information and genetic backgrounds, regulatory mechanism of artemisinin biosynthetic pathway has not yet been clarified.Also, artemisinin can be semi synthesized via artemisinic acid or dihydroartemisinic acid feasibly obtained from genetically modified yeast [14][15][16][17], but it cannot far reach high commercial values.Fundamentally, it seems be especially imperative to research accurate evaluation of Chinese A. annua genome size for its subsequent genetics study.
There upon, seven wild A. annua sample strains identified by DNA barcoding were chosen in China five different areas (Shandong, Hunan, Chongqing, Sichuan, and Hainan).Then those genome sizes were estimated by Flow Cytometry (FCM) with Nipponbareas a benchmark calibration standard, having a relatively distinct genome.And considering that both of soybean and maize showed appropriately closed genome sizes with A. annua (Soybean was 889.33 -1118.34 Mbp [38-42], maize was 2300-3360 Mbp [34,49], A. annua maybe 1710 Mbp [50] ), they were adopted as two typical internal standards in FCM.The accurate genome size could facilitate the schedule of A. annua whole genome sequencing project and may be helpful to give further insight into artemisinin improvement genomic studies.

Identification of sample strains with A. annua and other species
Seven ITS2 and seven psbA-trnH sequences were obtained from seven collected wild samples.Based on ITS2 and psbA-trnH sequences of seven wild samples, A. annua, its closely related species and counterfeits, two Neighbor Joining (NJ) trees were constructed separately.All the ITS2 and psbA-trnH control sequences of A. annua and its adulterants were generated from our previous studies [51,52] (Supplementary Table S1).
The sequence length, GC content, and K2P genetic distance of the ITS2 and psbA-trnH regions of samples, A. annua and its adulterants were analyzed and summarized (Supplementary Table S2).The ITS2 sequence length of sample strains gathered from five provinces was 225 bp, while the psbA-trnH sequence length was 353 bp.No variable site existed both in those ITS2 (Figure 1b) and psbA-trnH sequences of A.annua in 5 provinces.The average GC content of ITS2 sequences of A.annua was 56.40%, and that of psbA-trnH sequences was 25.20%.On the basis of the ITS2 and psbA-trnH sequences, the intraspecific divergence of A.annua calculated using the K2P model was zero, which was far lower than the minimum interspecific distance of A.annua and 23 other closely related species and counterfeits (Supplementary Table S2).
All our ITS2 sequences were in accord with A. annua (Figure 1b), separating from other closely related species in neighbor joining (NJ) NJ trees (Figure 1a).In agreement with our previous studies [51,52], A. annua and the closely related species and counterfeits could be distinguished from each other on the basis of the ITS2 sequences (not psbA-trnH).Thus, all strains were identified as genuine A. annua with no variation of their ITS2 and psbA-trnH sequences

A. annua genome size analysis by FCM
In our study, the peaks of Nipponbare, soybean, maize, and seven A. annua strains were alone determined.With no peak overlap, it made sure that the peaks of A. annua strains were well separated from internal standards.Using high-quality sequenced Nipponbare [53] as a benchmark calibration standard, the mixed Nipponbare with soybean, and the mixed Nipponbare with maize were detected three times, separately.And according to the formula: sample 2C DNA content = sample peak mean/standard peak mean standard 2C DNA content, the genome sizes of soybean and maize were measured as 0.92 ± 0.00 Gb and 2.17 ± 0.02 Gb, respectively.Then, the mixed internal standards and per A. annua strain (such as A. annua mixed with soybean, A. annua with maize, A. annua with soybean and maize) were measured with three technical replicates, parallelly twice.The flow cytometric results of seven A. annua strains collected from five provinces were analysed and accounted in Figure 2 and Table 1.The values of A. annua genome sizes were assessed ranging from 1.31 Gb to 1.54 Gb, referred to Nipponbare as primitively control, and soybean and maize as control individually and simultaneously.
Through the flow cytometric data analysis of two parallel per strain detected three repeatedly with three different controls in individual and simultaneous ways, the maximum coefficient of variation values (CV) was detected to be 2.96% in the group data of HK strain with Z. Mays as a control.Using the same control, the differences among genome sizes of 7 A. annua strains were 45 Mb (Nipponbare), 88 Mb (G.max) and 41 Mb (Z.Mays), respectively.And the CV of seven strains with Nipponbare as control ranged from 0.29% to 2.94%, that of G. max was 0.67-2.66%,and that of Z. Mays was 0.86-2.96%.This suggested that there was a quite stability of the flow cytometric instrument (BD AccuriTM C6).However, G. max and Z. Mays genome sizes assessed by FCM using Nipponbare as primitively control were slightly larger than that in previous reports [49,54].Various varieties and their different growth conditions may be the key points [55].From those high-quality data with low CV ≤ 2.96%, seven strains detections had little differences using Nipponbare, G. max and Z. Mays as control, respectively.It also illustrated that diversity lines in A. annua species had little influence on their genome sizes.
The genome sizes variation of the same A. annua strain using the three different control ranged from 38 Mb (the HK strain) to 96 Mb (the SC strain).And the difference sizes among 5 other strains genome were close (57 Mb for LQ strain, 55 Mb for YJ strain, 64 Mb for YY-1 strain, 67 Mb for YY-2 strain, and 67 Mb for JX strain).Those flow cytometric results indicated that the estimation for the same A. annua species with different standards had little diversity.
The biggest genome size of A. annua YY-2 strain was 1.49 ± 0.07 Gb, whereas the smallest of YY-1 strain was 1.38 ± 0.06 Gb by a margin of 110Mb (Table 1).The result of all flow cytometric data of A. annua strains and their mean value (1.44 Gb) differ by 73 Mb, showing that there was slight genome size variation in A. annua species.

Estimation of A. annuagenome size
Originally, the genome size of A. annua estimated by Nagl and Ehrendorfer (in 1974) [56], Geber and Hasibeder (in 1980) [57] was respectively 4.1 pg and 3.8 pg, with microdensitometry measurements based on Feulgen staining.Using Pisumsativum cv.Express long (2C = 8.37 pg) [58] as only one internal standard, the DNA content per haploid genome of A. annua (1.75 pg, 1.71 Gb) was assessed using flow cytometry (FCM) by Torrell and Valles [50].In previous studies, A. annua samples had been deposited in the Herbarium of the Laboratory of Botany, Faculty of Pharmacy, university of Barcelona (BCF) since 1997, lacking of study on molecular identification.The half peak coefficient of variation (HPCV) of A. annua reached to 3.01 and its DNA amount value was very different from other taxa.
In this paper, seven A. annua strains collected from five provinces were identified on the basis of the ITS2 sequences.And their flow cytometric analysis with a low CV (≤2.96%) indicated that their genome sizes range from 1.38 Gb to1.49Gb, using Nipponbare as the benchmark calibration standard and G. max and Z. Mays as two internal standards individually and simultaneously.However, the biggest size (1.49Gb) of the YY-2 strain genome was 12.87% (220 Mb) less than the estimation result (1.71 Gb) of A. annua in Spain by Torrell and Valles with Pisumsativum as an internal standard [50].It was resulted from that there was absence of significant genome size variation in Pisumsativum [59,60], which was unfitness for an internal standard without proofreading by a benchmark calibration standard in FCM.So it does require a benchmark calibration standard in the method of FCM.
Considering the same area, the genome size of YY-1 stain was less about 110Mb than YY-2 while JX was less 54 Mb than HK.Both of YY-1 and YY-2 had little difference with other area strains.Moreover, the variation between all seven strains' genome sizes and their mean value (1.44 Gb) was merely 73 Mb, showing no significant discrepancy.It indicated that there was minor genome size variation in A. annua species, which may predominantly result from transposable element accumulation, expansion/contraction of tandem repeats, variation in intron length and so on [61].
In addition, we have attempted to carry out genome-wide survey with low-depth (<30X) high-throughput sequencing data.The estimation value was a little larger than flow cytometric data (unreported) and A. annua genome was rich in high repeat content sequences.It was assessed in conjunction with Dendrobium officinale (a traditional Chinese Orchid herb), whose genome size is about 1.27 Gb based on flow cytometric data and 1.35 Gb assembled by combining the second-generation and third-generation PacBio sequencing technologies [62].And it also accords with the size of Eriobotrya Lindl.'Jiefangzhong' genome (654.40Mb estimated by FCM and 773.00 Mb by 17-mer spectrum) [63].These inflated genome sizes attribute to their high repeat content and heterozygosity [36].However, flow cytomeric data with internal standards Caenorhabditis elegans (~100 Mb) and Drosophila melanogaster (~175 Mb) showed the genome size of the first sequenced A. thaliana (157 Mb) was 25% larger than that initiative estimate of 125 Mb, partially was set down to genes mission in centromeric and ribosomal DNA regions [64].Though the discrepancy between those two sets of data also exists in many sequenced plant genomes, both can determine the same order of magnitude of plant genome size and flow cytometric results were quite credible.
Hence, the unreported k-mer analysis results confirmed that A. annua genome was closed to 1.38-1.49Gb, having a complex genome with high repeat content.

Performance of FCM for A. annua strains
At present, Feulgen Spectrophotometry and FCM are commonly used methods for estimating genome size.FCM is a powerful method in qualitative and quantitative analysis of animal, botanical, microbial monoplast, and other microscopic particles in liquid suspension [65,66].Served as a traditional standard technology for estimating genome size, FCM can confirm nuclear DNA content exactly [67,68].Bennett et al appealed to provide a precise angiosperm C-value served as a benchmark calibration standard for plant genome estimations [64].So far, Arabidopis (A. thaliana) [69] and rice (O.sativa L.) [28][29][30][31][32] are two high-quality sequenced plants.The sequences of rice genome is considered as a "gold standard" in plants.Moreover, the Nipponbare RefSeq has the best quality information, compared to known crop genome sequences [28][29][30][31][32].It is quite important for sequence comparison of herbaceous plants, and Nipponbare can be used as a benchmark calibration standard.Besides,the genomes of soybean (G.max) and maize (Z.mays) whose genome sizes are both close to that of A. annua L., had been estimated by FCM [43][44][45][46]48,70,71] and also sequenced [38,49].Therefore, soybean and maize remeasured by Nipponbare are competent for well accurate determination of the A. annua genome size.
The method and technique of FCM are simple and convenient, extending the spectrum of its application.FCM has the advantages of great flux per a batch of operations, but it is limited to internal reference and easy to be affected by endogenous DNA and secondary metabolites.In general, high content of cy-Journal of Plant Biology and Crop Research tosolic compounds in medicinal plant leaves containing proteins and secondary metabolites is considerably liable to bias nuclear DNA content estimations by FCM, which cannot be completely overcame [72,73].Recently researches were mainly focused on the most appropriate buffers and procedures for sample preparation.Otto I buffer can precipitate nucleus, as well wipe off cytosolic compounds in officinal plant leaves and nucleus debris in a certain degree [74].From our flow cytometric data, it can be found that the Otto buffer was also applicable to A. annua samples.
Estimates for the same species with different standards were sometimes surprisingly divergent, but not in accordance with our A. annua flow cytometric results.Comparisons with two results of various species genome estimations by FCM and genome survey, we discovered some sequenced model plants characteristically stability in DNA content and ease of preparation were inadequate to serve as internal standards.On condition that the flow cytomeric estimations in conformity with genome assembly results, the genome size discrepancies between Mosobamboo (P.pubescens) and its internal standard soybean (G.max) [75][76][77], between barley (Hordeum vulgare) and standard P. sativum [43,78] are 1.70-fold and 1.49-fold, respectively.It indicated that the optimum genome discrepancies of the uknown and internal standard should be approximately 0.5-or 2-fold [79].However, in our study (Table 2), under Nipponbare as a benchmark calibration standard of G. max and Z. mays, the size of A. annua genome using Nipponbare (3.76-fold discrepancy), G. max (1.56-fold discrepancy) and Z. mays (0.66fold discrepancy) respectively as control were all suitable.
All flow cytometric data of A. annua species (CV≤2.96%)were essentially stable, however, the estimations of different strains with the three different control were disproportionate.The results showed that the different contents of inclusion in different strains would influence the intensity of fluorescent staining in A. annua plants.It may be a valuable reference for quality evaluation of their different compounds in A. annua plants.
In virtue of high-quality sequenced Nipponbareas a benchmark calibration standard, the genome size of genuine A. annua identified by ITS2 was estimated to be approx.1.38-1.49Gb by FCM with Nipponbare, G. max and Z. mays as different internal standards.Further more, genome size did not show significant variation in seven wild A. annua strains coming from five provinces.It showed that no rapid expansion and contraction in A. annua genome was found.So, it is necessary to conduct a further study on the relationship among environment factors, genetic information and artemisinin content variation.The assessment of A. annua genome size would provide a deeper understanding of its genome.It facilitate the suitable schedule of its whole genome sequencing project, and provide references for insight into its subsequent genetics and evolution.

Plant materials
In this essay (Table 1), seven wild sample plants or seeds were gathered from five provinces (Shandong, Hunan, Chongqing, Sichuan, and Hainan).Some wild seedings were transplanted into our greenhouse, and some were seedlings germinated from wild seeds in soil or Murashige and Skoog (MS) medium.
Rice (Nipponbare) seeds, whose genome size is quite definite, came from the Institute of Genetics and Developmental Biology, Chinese Academy of Sciences.And the seeds of soy-bean and maize were purchased from the market, uncertain varieties.Growth and development of Nipponbare, soybean and maize were carried out by the water culture.

DNA extraction and cloning DNA barcoding sequences
We adopted ITS2 and psbA-trnH sequences as barcodes to identify seven samples, A. annua, and the others.Samples genomic DNA isolation and their ITS2 and psbA-trnH sequences were obtained, assembled and analyzed according to the protocol of "Standard DNA Barcodes of Chinese Material Medica in Chinese Pharmacopoeia" [51,80].

Flow cytometric measurement
80mg fully developed fresh leaves of A. annua strains, Nipponbare, soybean and maize or 200mg callus and adventitious bud of cultured strains were collected in clean Petri dishes.Those samples were rapidly chopped in 2 mL cold Nuclei suspension extractions (Ottobuffer I [55,81]) with a sharp razor blade, and filtered through a 40 µm nylon cell strainer, keeping on ice.The extraction liquid was transferred in a new 2 mL tube, and centrifuged coldly for 3min at low speed (1844 g, for 5000 rpm) to remove the supernatant particles.Then the precipitate in the bottom was suspended again with 600ul fresh ice-cold Otto I solution, and centrifuged coldly for 30s at 500g, twice repeatly.Before flow cytometric analysis, staining with propidium iodide (PI, with RNase, BD Biosciences PharmingenTM, San Diego, US) [43,45,82] was performed equivalently in a mixture of Otto I and Otto II buffers (1:2) for 15min.
The nuclear DNA content measurements and analysis were carried out by a FCM (BD AccuriTM C6, USA) at a low flow rate (14µl /min) with more than 100000 cells.Forward scatter (FSC), side scatter (SSC), blue (488nm) and red (640 nm) fluorescence for PI were acquired.Two parallel with three technical replicates per sample were detected for the stability of the instrument.And mean values and standard deviations of all flow cytometric data were calculated.The formula and storage of Otto I and II solution were referred to the Dolezel's protocol [55].

Figure 2 :
Figure 1A: annua ITS2 sequences anylsis a): the NJ phylogenetic tree based on ITS2 sequences of A. annua and counterfeits.b): multiple Aligenment of samples and A. annua ITS2 sequences

Table 2 :
Flow cytometric data of seven A. annua strains.