93 compounds which can permeate the placenta barrier were collected as data set for the construction of support vector regression (SVR) model. Besides, 140 compounds with reproductive toxicity and 170 compounds with no reproductive toxicity were collected as another data set for the construction of support vector classification (SVC) model. 1481 molecular descriptors were calculated to represent the structure characteristics of all the compounds mentioned above by Dragon2.1. CfsSubsetEval valuation method and BestFirst-D1-N5 searching method were used to optimize the subset of molecular descriptors. Then based on the above data, SVR model for prediction the placenta barrier permeability (PBP) and SVC model for prediction the reproductive toxicity were built respectively by using LibSVM program. Both the SVR model and the SVC model obtained better prediction ability. The correlation coefficient (R) values of the training set and test set of the optimal SVR model were 0.990 and 0.780. The accuracy, sensitivity, and specificity values of the optimal SVC model were all above 80%. Subsequently, the SVR model was utilized to predict the PBP of the compounds which were collected from 13 commonly used tocolytic Chinese herbs. The compounds with higher permeability were further studied by the SVC model and 15 compounds were classified as positive compounds with reproductive toxicity. The two models constructed in this study might be employed in guiding the application of the tocolytic Chinese herbs in clinical. Introduction Traditional Chinese medicine (TCM) has been regarded as the main medical for the treatment of diseases throughout East Asia with a history of 5000 years. It has been widely used to promote health and treat illnesses since then and accepted as a major approach of complementary and alternative medicine in Western world in recent years. According to the theory of TCM [1], miscarriage is defined as “fetal irritability”, “fetal restlessness” or “stirring fetus”, and TCM has unique herbs to reduce the risk of miscarriage. However, the information about the application of tocolytic Chinese herbs for miscarriage is limited [2] [3]. Besides, tocolytic Chinese herbs may lead toxic effects to the fetus by transferring across the placenta barrier. Therefore it is urgent to investigate whether the active compounds with high placenta barrier permeability (PBP) are reproductive toxicity or not, for the safety of tocolytic Chinese herbs application. Support vector machine (SVM), of which the contents involved the aspects of the maximum-margin hyperplane, Mercer’s kernels and so on, serves as a new artificial intelligence technology [4]. In practice, it consists of two modes: SVC and SVR. The former is a qualitative method, which is focus on the pattern recognition and classification; the latter is a quantitative method for prediction and forecasting [4]. Meanwhile, the time-consuming and high-cost problem caused by traditional toxicological experiments can be alleviated by SVM [5]. In this paper, a drug molecule-based SVR model was built for evaluating the PBP of tocolytic Chinese herbs compounds which derived from the Traditional Chinese Medicine Database (TCMD, version 2009). Then on this basis, a drug molecule-based SVC model was constructed to predict the reproductive toxicity of the active compounds with relatively high PBP. Thus, the SVR and SVC models of tocolytic International Conference on Materials Engineering and Information Technology Applications (MEITA 2015) © 2015. The authors Published by Atlantis Press 650 compounds were constructed to discover the compounds with reproductive toxicity from Chinese herbs, which may provide guidance for the safety application of tocolytic Chinese herbs in clinical. Materials and methods Training and test sets splitting. The PBP of compounds is typically expressed in relative permeability (RP). Antipyrine, a small lipophilic compound that is known to be transported across the placenta via passive diffusion [6], was used as a reference compound to make the data gathered in multiple laboratories comparable. Then, for the generation of SVR model, 93 compounds that have been reported to have PBP were collected as SVR data set from references [7-12]. To ensure the training set were well representative, the “Kennard-Stone (KS)” program [13] was used to randomly split the SVR data set into a training set containing 73 compounds and a test set with 20 compounds (namely SVR-TS1). On the other hand, for the generation of SVC model, a data set including140 reproductive toxicity compounds (positive) and 170 non-reproductive toxicity compounds (negative) were collected from references [14-17]. The data set was also split by KS. 125 positives and 125 negatives were chosen as training set, and 15 positives and 45 negatives were regarded as internal test set (namely SVC-TS1). Then, to further evaluate the SVC models, we constructed another two independent external test sets, namely SVC-TS2 and SVC-TS3. To be specific, SVC-TS2 was comprised of 455 negatives, which were derived from the “approved” drug molecules in DrugBank (http://www.drugbank.ca/); SVC-TS3 was comprised of 5 Chinese herbs that had been reported to bear reproductive toxicity [18-22]. There is no overlap between each set mentioned above, and the results of splitting training and test sets are detailed in Table 1. Table 1. The data structure partitioning results for each model. The PBP SVR model The reproductive toxicity SVC model Training set SVR-TS1 Training set+ Training setSVC-TS1+ SVC-TS1SVC-TS2SVC-TS3+ 73 20 125 125 15 45 455 5 (+) positive compounds. (-) negative compounds. Molecular descriptors. Dragon2.1 was used for descriptors calculation for all compounds in all data sets and 1481 molecular descriptors were computed. For feature selection, BestFirst-D1-N5, CfsSubsetEval and 10-fold cross-validation were performed in Weka3.6.10 [23]. Three data normalization methods, including no-normalization, [0, 1] and [-1, 1], were performed for each object. Then, those three data processing results for each study object were used for further study, respectively. Development of SVM. SVM is an important machine learning method and is rigorously used in both bioinformatics and chemoinformatics. In this study, LibSVM-Faruto-Ultimate (Version2.0), developed by Faruto, was used to run the SVM algorithm, including SVC and SVR. Appropriate kernel function and parameters are crucial and should be chosen while using SVM to settle factual classification problem. In order to map input vectors into a higher dimensional space nonlinearly, RBF Kernel was chosen as the kernel function of both SVC and SVR. Besides, C and γ are two parameters in RBF kernel, which greatly affect the precision of SVM classifier. In order to ensure the classifier could predict test set accurately, three parameter optimization methods, called Grid Search, Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), were used to identify appropriate (C, γ). In addition, the default parameters were also retained for model construction. Thus, 4 parameter selection methods were used to build SVM models based on 3 data processing results. Data analysis and model validation. Firstly, SVR-TS1 was applied to validate the SVR models so that the relationship between structural variables and placenta barrier RP could be