2 Institute of Chinese Materia Medica, Academy of Chinese Medical Sciences, Beijing 100700, China
Many poisonous plants have been used as medical materials for a rather long period since ancient time such as Croton tiglium, Datura metel and Strychnos nuxvomica which were noted both in Traditional Chinese Medicine (TCM) and the Ayurveda of India . Chinese Pharmacopoeia (2015) contained more than 75 poisonous herbal materials whose original plants cover 34 families, 54 genus and 77 species including Pinellia ternata (Araceae) and Sinopodophyllum hexandrum (Berberidaceae) et al.. The rhizome of Pinellia ternata is currently used for its anti-atherosclerosis and hypolipidemic effect . Podophyllotoxin extracted from Sinopodophyllum hexandrum and Podophyllum peltatum was used in anticancer therapy for its role in cell cycle arrest and suppression of the formation of the mitotic-spindles microtubule .
Nowadays, concerns have been increasing on the safety of poisonous herbal materials for misidentifying, misusing or being mistaken into other herbal materials. Since most commercial herbal productions in the market are particles, slices or even powders, and especially poisonous materials are necessary to be additionally processed, it is commonly seen that it has been misused or mixed with similar relative species intentionally or accidentally, therefore species identification is quite significant especially for poisonous medicinal plants. Mistaken identification, misuse and mixture of poisonous herbal medicine or toxic substances were the main causes of poisoning accidents, such as the accident of aristolochic acids (AAs) caused by misusing Aristolochia fangchi as Stephania tetrandra leading to a number of women suffering from upper tract urothelial carcinomas (UTUC) . Brugmansia arborea mixed into Datura stramonium will cause poisoned response by scopolamine . Persicae semen and Armeniacae semen amarum have similar appearance and texture but completely different clinical applications, and Armeniacae semen amarum is even with mild toxicity. There will be certain risk if Armeniacae semen amarum mixed in Persicae semen . Among poisonous herbal materials, Euphorbia pekinensis and Stellera chamaejasme are also easy to be mixed into each other . Furthermore, substitution and filler also affect the quality of herbal materials in the market .
As for the identification of poisonous herbal materials, traditional methods are based on the differences of botanical morphology, microscopic characteristics and physicochemical properties. Morphological diagnosis has four significant limitations: phenotypic plasticity and genetic variability can lead to misidentification; morphologically cryptic taxa are common; morphological keys are often effective only in flower and fruit stage; and extremely lack of professional taxonomists. In addition, microscopic and physicochemical identification need a high level of expertise yet sample repeatability is not high [9-10]. Besides, the geographical and climatic factors also result in the compositional difference of herbal materials leading to difficulties in herbal materials identification objectively by physicochemical methods . Compared with traditional methods, molecular methods have special genetic diversity, specificity and population differentiation. Basing on molecular biological techniques, Paul Hebert put forward the term of DNA barcode using recognized standard, relatively short DNA sequence fragment as molecular maker for species identification . In other words, DNA barcoding is a microgenomic identification system which has the advantage of good repeatability and high universality. This method was easy to be popularized and standardized by building a unified database and identification platform. And the identification efficiency of DNA barcoding method will not be affected by experience nor environmental factors . Meanwhile, specific barcodes can be used to make accurate identification at species-level and even in population level . Among currently applied DNA barcodes, Consortium for the Barcode of Life (CBOL) adopted combination of large subunit of the ribulose-bisphosphate carboxylase gene (rbcL) plus and maturase K gene (matK) as standard plant DNA barcode [15-16]. In fact, Ka-Lok Wong found that rbcL and matK had no significant advantages over trnH-psbA, trnL-F, rpl36-rps8, ITS and 5S rRNA in identifying Gentiana species from their adulterations . Consequently, CHEN et al. proposed internal transcribed spacer 2 (ITS2) sequence as the universal DNA barcode for medicinal plants [18-19]. Moreover, psbA-trnH spacer was suggested as the complementary barcode of ITS2. The identification rate of psbA-trnH + ITS2 was significantly increased than that of matK + rbcL in 18 families and 21 genera . JIA et al. analyzed the biological composition of Yimu Wan and compared with four prescribed herbal materials by ITS2 + psbA-trnH and single molecule real time sequencing (SMAT sequencing). The result showed SMART sequencing provided strong potential application to control the quality of traditional Chinese medicine patents .
Although many DNA barcoding studies were published, up to now, there were few literatures on species identification of poisonous herbal medicine. We still did not find a universal DNA barcode to identify poisonous medicinal plant. In this study, we collected all poisonous medicinal plants in Chinese pharmacopoeia (2015) and their adulterants or relative species to select a universal DNA barcode. Our findings showed that ITS2 region should be a standard DNA barcode for the identification of poisonous medicinal plants in Chinese pharmacopoeia and their adulterants.Materials and Methods Poisonous plant species and candidate barcodes
We chose all poisonous medicinal plants in Chinese Pharmacopoeia (2015) and collected their adulterants or relative species including 106 species from 27 families and 65 genera, covering all different medicinal parts including root, whole grass, flower, fruits, seeds and cortex. Each species had 2-4 repeated samples. Basing on previous studies, our research chose four candidate barcodes (ITS2, psbA-trnH, matK and rbcL) to evaluate which region could be used as the standard barcode for poisonous medicinal plants.PCR amplification and DNA barcoding analysis
Total DNA were isolated, amplified and sequenced in China Academy of Chinese Medical Sciences (CACMS). Sequences were submitted in GenBank after being assembled. The species name and GenBank accession numbers were shown in Supplementary Note 1 and the bold parts were the sequences uploaded to GenBank (SN 1 and 2).
The primers and PCR reaction conditions are applied according to CHEN et al. . PCR products were sequenced bidirectionally using ABI 3730XL automated sequencer (Applied Biosysstems Inc). The genetic difference among all the species was analyzed by calculating both inter and intra Kimura 2-parameter (K2P) distance, and assessed by six distance value totally: average inter specific distance, average theta prime, average smallest inter-specific distance, average intra-specific distance, average theta, and average coalescent depth. All the sequences used as DNA barcode were deleted gaps and aligned by MUSCLE default options and ClustalW in MEGA 6.0 to get sites information, construct generating K2P distance matrices for each locus and Neighbor-Joining (NJ) phylogenetic trees with bootstrap (1000 replications) respectively [MEGA Koichiro Tamura, Glen Stecher, Daniel Peterson, Alan Filipski, and Sudhir Kumar (2013)].
The identification efficiency of four candidate regions were evaluated using BLAST1 and nearest distance methods according to Ross et al. .Results Sequence alignment and analysis
The lengthes of four aligned regions were 218 bp, 388 bp, 700 bp and 910 bp respectively. Comparing these four regions, the length of ITS2 was significantly shorter than psbA-trnH, matK and rbcL. ITS2 + psbA-trnH had high variable sites in percentage (88.99% + 84.79%) and singleton site in percentage (0.92% + 2.57%). In addition, ITS2 had the highest C + G content (62.90%), the second was rbcL sequence (43.60%), only lower than ITS2, but it contained higher conserved sites in percentage (51.32%). The result showed that rbcL sequence was conserved relatively. MatK sequence had high variable sites in percentage (78.14%) and less singleton site in percentage (only 0.03%) compared with ITS2 and psbA-trnH.Sequence inter/intra-specific genetic distance and data distribution
We calculated inter/intra-specific genetic distances using Kimura's 2-Parameter sequence divergences by MEGA 6.0 software [MEGA Koichiro Tamura, Glen Stecher, Daniel Peterson, Alan Filipski, and Sudhir Kumar (2013)]. The results were shown in Table 2.
All the value of inter-specific distance was higher than intra-specific distance. Comparing both inter- and intra specific distance values, the 6 genetic distance values of rbcL were the lowest in the four tested barcodes. The average intra-specific distance was only 0.0010 ± 0.0029 matching with the characteristic that rbcL was conserved relatively. ITS2 region exhibited higher variation with the maximum average inter-specific distance and average theta prime value, which were 0.6170 ± 0.1958 and 0.6417 ± 0.0998 respectively. Furthermore, we analyzed the data distribution of inter- and inra-specific genetic distance value (Fig. 1).
The scale of ITS2, psbA-trnH and matK data span was wider than rbcL, especially for the scale of intera-specific distance distribution. Compared with matK and rbcL, both ITS2 and psbA-trnH had similar and more uniform distribution scale of inter-/intra-specific distance value. Most intra- specific distance values were focused on 0-0.01, and inter- specific distance values of ITS2, psbA-trnH and matK were focused on 0.5-1.0, while all the genetic distance values of rbcL were lower than 0.5. For overall data distribution trend, genetic distance value scale of ITS2 and psbA-trnH were in uniform relatively but matK and rbcL were more discretized, especially the average intra-specific distance value of matK.Identification efficiency for ITS2, psbA-trnH, matK and rbcL
In order to select the most suitable barcode, we compared the identification efficiency of the four regions. All of the results were shown in Table 3.
In all taxa levels based on different analysis methods, the correct identification efficiency of rbcL was significantly lower than ITS2, psbA-trnH and matK. At species level, ITS2 had the same correct identification efficiency (92.59%) by both BLAST and distance method, which was close to the efficiency of distance methods at genus level (98.15%). Both psbA-trnH and matK had 100% correct identification efficiency at genus level, but lower than ITS2 at species level. By general comparison, the correct identification efficiency values were ITS2 > psbA-trnH > matK > rbcL at species level, and psbA-trnH = matK > ITS2 > rbcL at genus level.Neighbor-Joining (NJ) phylogenetic trees
We constructed NJ phylogenetic trees basing on four regions and compared their distinguishing capability further according to the topological structure.
Among four NJ phylogenetic trees overall, in the topology of ITS2, most branches were independent either at either species or subspecies level. In Liliaceae, Paris polyphylla. var chinensis and Paris polyphylla. var. yunnanensis were divided into independent branches showing monophyletic characteristics. Every species belonging to the same genera was clustered into one group at family level. PsbA-trnH, matK and rbcL formed more single species fell outside their groups than ITS2 at family level. And at species level, Prunus (belonging to Rosaceae) was easier to diverge in our samples. Additionally, matK and rbcL showed diagnostic shortcomings at species level, manifesting that closely related species in the same genus were clustered into one group, which were Prunus dulcis and Prunus persica, Actaea dahurica and Actaea rubra(matK), and Uncaria macrophylla and Uncaria rhynchophylla (rbcL).Discussion and Conclusion
Plant poisoning is a common accident in our daily lives. At present, highly toxic plants can be distinguished from nontoxic ones according to simple toxidromic classification system . DNA barcoding identifies species mainly depending on the difference among the particular sequences, which was used in differential diagnoses and directing earlier management of potentially serious plant ingestion. For both original plants and their productions, DNA barcoding method has been used to confirm the identity or purity .
This research aimed to find the most suitable DNA barcode to identify poisonous medical plants. We found both ITS2 and psbA-trnH sequences showed higher genetic variability. And ITS2 sequence has the shortest average length (218 bp) of all candidate barcodes. RbcL sequence was more conservative and less convenient to distinguish closely related species. It also exhibited similar results from all six specific distance values in genetic distance analysis (Table 2). Previous studies also supported that ITS2 and psbA-trnH had the advantages of shorter sequences and higher variations .
By using BLAST and genetic distance methods, the correct identification efficiency was ITS2 > psbA-trnH > matK > rbcL at species level and psbA-trnH = matK > ITS2 > rbcL at genus level. Among all the candidate barcodes, ITS2 exhibited highest correct identification efficiency using BLAST or distance method at species level (92.15%), and the result was consistent with the analysis data (92.7%) in the study of CHEN et al. (Table 3) .
MatK was previously considered as a suitable plant barcode due to its high evolutionary rate, suitable length, obvious inter-specific divergence and low transition/transversion rate. However, matK has a high substitution rate at the primer sites leading to the difficulty in amplification [26-27]. Comparing with ITS2 and psbA-trnH, MatK or rbcL was rejected as a universal barcode due to low PCR efficiency. RbcL is easy to be amplified, but loci was relatively conservative . Osathanunkul et al. found that the CG% of matK and rbcL were 34.6%, 43.4% respectively in Tinospora species , which was similar to our results (31.85%, 43.60%) (Table 1). Francisco et al. found the CG content was close among related species . According to our results, the CG content of four sequences was different. ITS2 was a non-coding region with a conserved core of the secondary structure promoting the establishment of data handling system . Ka-Lok Wong discovered that matK and rbcL could be used in the identification of related species of Gentiana, but did not show more advantages than other five sequences including ITS and psbA-trnH . In the recent research of poisonous medical materials identification, ITS2 combining with TLC and HPLC could identify Marsdenia tenacissima in the market . And psbA-trnH showed high rates of insertion/deletion and most sequence divergence among non-coding intergenic regions . According to the alignment of four barcodes, the percentage of singleton site in psbA-trnH was the highest (2.58%) and the following sequence was ITS2 (0.92%) > matK (0.03%) > rbcL (0.02%) (Table 1).
Although the purpose of DNA barcoding is not intended to construct phylogenetic trees, the topology of NJ phylogenetic tree can evaluate identification efficiency of candidate barcodes. T. Orihara et al. discovered the phylogeographic relationships and evolutionary information of Rossbeevera (Boletaceae) by using three nuclear (ITS, nLSU, EF-1α) and two mitochondrial DNA sequences (ATP6 and mtSSU) as well as precise morphological observation . He highlighted the utility of ITS for molecular identification. At family level, the closer related species were in the same genus, the easier it was for psbA-trnH, matK and rbcL to form single specie divergences in this study, such as Solanaceae, Araceae, Rutaceae and Ranunculaceae (SFig. 2-4). And at species level, matK and rbcL region showed lower sequence divergences rates. The incorrect monophyletic branch percentage of matK and rbcL were 1.25% and 0.62% respectively. Comparing the cohesion of taxonomic groups of four candidate barcodes, all of the species in one family were clustered into one group only in the NJ phylogenetic trees of ITS2. In the branch of Solanaceae, samples of Brugmansia arborea formed an independent branch outside the group of Datura, which was consistent with the result in the study of HAN et al. . And ITS2 had high cohesion and identification ability (SFig. 1).
A universe DNA barcode should have enough variability in the closely related species. For the taxon profiles of four NJ phylogenetic trees, the total percentage of single specie divergence was rbcL (51.09%) > psbA-trnH (49.82%) > matK (36.99%) > ITS2 (18.31%). The conservation of rbcL may make it difficult to identify species with large intra-specific population variation accurately, as the analysis that rbcL is not suitable barcode at species level . For example, overlapped characters affected the identification results of subdivisions of Paris by morphological methods . We found rbcL and matK also could not form monophyletic branch of Paris polyphylla var. chinensis or Paris polyphylla var. yunnanensis. By contrast, ITS2 showed distinct monophyletic assemblages for Paris signalling with more powerful diagnostic capability at subspecies level (SFig. 1). This was consistent with the findings of ZHU et al. . Rosaceae only contained Prunus genus in our samples, but there were more single specie divergences at species level (SFig. 1-4). We inferred that the reason might be that the species have been bred by artificial cultivation form different cultivars (lines) or hybrids. The NJ phylogenetic trees formed single species fell outside their groups at species level in overall four barcodes. The single specie divergences percentage of ITS2 was the lowest (2.03%), nearly 60% lower than matK (5.02%). XIE et al. conducted an analysis on the identification of poisonous plants by DNA barcoding and suggested that ITS combined matK and rbcL as a composite DNA barcode could increase discrimination rates using Blast method . But the study could not provide a universe barcode for poisonous plans. In addition, ITS was not an ideal plant DNA barcode because of amplification problems .
By comparison, ITS2 presented a good PCR efficiency and satisfactory identification efficiency in different classification levels. Moreover, ITS2 exhibited rich spatial secondary structures which provided additional variation information for distinguishing closely related species. We believe that ITS2 region should be considered a standard barcode for poisonous medicinal plants combining with psbA-trnH as its complementary barcode. On the other hand, medical materials are easy to be polluted by fungus. WANG et al. discovered a high fungal contamination rate (95%) in traditional Chinese medicine . So in practice it is important to prevent fungal contamination in DNA extraction and PCR amplification when using ITS2 as a suitable region for DNA barcoding applications .
Yaman S, Indrajit SK, Sudhaldev M. Review of certain herbal poisonous materials frequently used in Ayurvedic formulation[J]. World J Ayurveda Sci, 2017, 4(7): 283-289.
Yang G, Jiang W, Zhang MZ, et al. Anti-atherosclerosis effect and mechanism of phlegm-removing herbs of Rhizoma Pinelliae and Pseudobudobulbus Cremastrae seu Pleione[J]. Trad Chin Drug Res Clin Pharmacol, 2013, 24(3): 230-233.
Hamidreza A, Amir A, Majid GM. Podophyllotoxin: a novel potential natural anticancer agent[J]. Avicenna J Phytomed, 2017, 7(4): 285-29.
Vanherweghem JL, Tielemans C, Abramowicz D, et al. Rapidly progressive interstitial renal fibrosis in young women: association with slimming regimen including Chinese herbs[J]. Lancet, 1993, 341(8842): 387-391. DOI:10.1016/0140-6736(93)92984-2
Han JP, Li MN, Luo K, et al. Identification of Daturae Flos and its adulterants based on DNA barcoding technique[J]. Acta Pharm Sinic, 2011, 46(11): 1408-1412.
Ye HZ, Cai JZ. Identification and application if Armeniacae Semen amarum and Persicae Semen[J]. Strait Pharm J, 2008, 20(11): 85-86.
Chen SL, Lin Y. The Latest Identification of the Traditional Chinese Medicine[M]. China Electronic Audio and Video Publishing Company, 2013: 71-72.
Steven GN, Meghan G, Dhivya S, et al. DNA barcoding detects contamination and substitution in North American herbal products[J]. BMC Med, 2013, 11(1): 222-234. DOI:10.1186/1741-7015-11-222
Knowlton N. Sibling species in the sea[J]. A Rev Ecol Syst, 1993, 24: 189-216. DOI:10.1146/annurev.es.24.110193.001201
Jarman SN, Elliott NG. 2000 DNA evidence for morphological and cryptic Cenozoic speciations in the Anaspididae nliving fossilso from the Triassic[J]. J Evol Biol, 2000, 13: 624-633. DOI:10.1046/j.1420-9101.2000.00207.x
Wang L. Applied research of identification of traditional Chinese medicine and Chinese patent medicine based on DNA barcoding[D]. Peking Union Medical College, 2016.
Paul DNH, Alina C, Shelley L, et al. Biological identifications through DNA barcodes[J]. Proc R Soc Biol Sci Ser B, 2003, 270(2): 313-321.
Chen SL, Guo BL, Zhang GJ, et al. Advances of studies on new technology and method for identifying traditional Chinese medicinal materials[J]. China J Chin Mat Med, 2012, 37(8): 1043-1955.
Xiwen Li, Yang Yang, Robert Henry, et al. Plant DNA barcoding: from gene to genome[J]. Biol Camb Philos Soc, 2015, 90(1): 157-166. DOI:10.1111/brv.12104
CBOL Plant Working Group. CBOL plant working group, a DNA barcode for land plants[J]. Proc Natl Acad Sci USA, 2009, 106(31): 12794-12797. DOI:10.1073/pnas.0905845106
Peter MH, Sean WG, Damon PL. Choosing and using a plant DNA barcode[J]. PLoS One, 2011, 6(5): e19254. DOI:10.1371/journal.pone.0019254
Ka-Lok W, Paul PHB, Pang CS. Evaluation of seven DNA barcodes for differentiating closely related medicinal Gentiana species and their adulterants[J]. Chin Med, 2013, 8(1): 1-12.
Chen SL, Yao H, Han JP, et al. Validation of the ITS2 region as a novel DNA barcode for identifying medicinal plant species[J]. PLoS One, 2010, 5(1): e8613. DOI:10.1371/journal.pone.0008613
Yao H, Song JY, Liu C, et al. Use of ITS2 region as the universal DNA barcode for plants and animals[J]. PLoS One, 2010, 5(10): e13102. DOI:10.1371/journal.pone.0013102
Pang XH, Liu C, Shi LC, et al. Utility of the trnH-psbA intergenic spacer region and its combinations as plant DNA barcodes: A meta-analysis[J]. PLoS One, 2012, 7(11): e48833. DOI:10.1371/journal.pone.0048833
Jia J, Xu ZC, Xin TY, et al. Quality control of the traditional patent medicine Yimu Wan based on SMRT sequencing and DNA barcoding[J]. Front Plant Sci, 2017, 8(5): 1-11.
Meyer CP, Paulay G. DNA barcoding: error rates based on comprehensive sampling[J]. PLoS Biol, 2005, 3(12): e422. DOI:10.1371/journal.pbio.0030422
Ross HA, Murugan S, Li WLS. Testing the reliability of genetic methods of species identification via simulation[J]. Syst Biol, 2008, 57(2): 216-230. DOI:10.1080/10635150802032990
James H, Diaz MD, MPH TM, et al. Poisoning by herbs and plants: rapid toxidromic classification and diagnosis[J]. Wilderness Environ Med, 2016, 27(1): 136-152. DOI:10.1016/j.wem.2015.11.006
Wallace LJ, Boilard SMAL, Eagle SHC, et al. DNA barcodes for everyday life: routine authentication of natural health products[J]. Food Res Int, 2012, 49: 446-452. DOI:10.1016/j.foodres.2012.07.048
Selvaraj D, Sarma RK, Sathishkumar R. Phylogenetic analysis of chloroplast matK gene from Zingiberaceae for plant DNA barcoding[J]. Bioinformation, 2008, 3(1): 24-27.
Hollingsworth PM, Graham SW, Little DP. Choosing and using a plant DN barcode[J]. PLoS One, 2011, 6(5): e19254. DOI:10.1371/journal.pone.0019254
Kress WJ, Erickson DL. A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region[J]. PLoS ONE, 2007, 2(6): e508. DOI:10.1371/journal.pone.0000508
Maslin O, Rossarin O, Panagiotis M. Species identification approach for both raw materials and end products of herbal supplements from Tinospora species[J]. BMC Complement Altern Med, 2018, 18(1): 111. DOI:10.1186/s12906-018-2174-0
Rodriguez-Trelles F, Tarrio R, Ayala FJ. Evidence for a high ancestral GC content in Drosophila[J]. Mol Biol Evol, 2000, 17(11): 1710-1717. DOI:10.1093/oxfordjournals.molbev.a026269
Schultz J, Wolf M. ITS2 sequence-structure analysis in phylogenetics: a how-to manual for molecular systematics[J]. Mol Phylogenet Evol, 2009, 52(2): 520-523. DOI:10.1016/j.ympev.2009.01.008
Yu N, Wei YL, Zhu Y. Integrated approach for identifying and evaluating the quality of Marsdenia tenacissima in the medicine market[J]. PLoS One, 2018, 13(4): e0195240. DOI:10.1371/journal.pone.0195240
Takamichi O, Thierry L, Ge WZ, et al. Evolutionary history of the sequestrate genus Rossbeevera (Boletaceae) reveals a new genus Turmalinea and highlights the utility of ITS minisatellite-like insertions for molecular identification[J]. Persoonia, 2016, 37(1): 173-198. DOI:10.3767/003158516X691212
Ji Y, Fritsch PW, Li H, et al. Phylogeny and classification of Paris (Melanthiaceae) inferred from DNA sequence data[J]. Ann Bot, 2006, 98(1): 245-256. DOI:10.1093/aob/mcl095
Zhu Ying-jie, Chen Shi-lin, Yao Hui, et al. DNA barcoding the medicinal plants of the genus Paris[J]. Acta Pharm Sin, 2010, 45(3): 376-382.
Xie L, Wang YW, Guan SY, et al. Prospects and problems for identification of poisonous plants in China using DNA barcodes[J]. Biomed Environ Sci, 2014, 27(10): 794-806.
Wang WL. Analysis of the Separation and Identification of Fungal Contamination and Their Mycotoxin-Producing Potential on the Surface of Fifteen Traditional Herbal Medicines[D]. Guangzhou University of Chinese Medicine, 2013.
Chen SL, Yao H, Han JP, et al. Principles for molecular identification of traditional Chinese materia medica using DNA barcoding[J]. China J Chin Mat Med, 2013, 38(2): 141-148.