Background The major difficulty in the research of DNA microarray data is the large number of genes compared with the relatively small number of samples as well as the complex data structure. Random forest has received much attention recently; its primary characteristic is that it can form a classification model from the data with high dimensionality. However, optimal results can not be obtained for gene selection since it is still affected by undifferentiated genes. We proposed recursive random forest analysis and applied it to gene selection. Methods Recursive random forest, which is an improvement of random forest, obtains optimal differentiated genes after step by step dropping of genes which, according to a certain algorithm, have no effects on classification. The method has the advantage of random forest and provides a gene importance scale as well. The value of the area under the curve (AUC) of the receiver operating characteristic (ROC) curve, which synthesizes the information of sensitivity and specificity, is adopted as the key standard for evaluating the performance of this method. The focus of the paper is to validate the effectiveness of gene selection using recursive random forest through the analysis of five microarray datasets; colon, prostate, leukemia, breast and skin data. Results Five microarray datasets were analyzed and better classification results have been attained using only a few genes after gene selection. The biological information of the selected genes from breast and skin data was confirmed according to the National Center for Biotechnology Information (NCBI). The results prove that the genes associated with diseases can be effectively retained by recursive random forest. Conclusions Recursive random forest can be effectively applied to microarray data analysis and gene selection. The retained genes in the optimal model provide important information for clinical diagnoses and research of the biological mechanism of diseases.
Objective The determination of non-inferiority margin is an important and confusing issue which directly influences the acceptability of a new medication. We reviewed the published literature, International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) Guidelines and Committee for Proprietary Medicinal Products (CPMP) papers on the selection of non-inferiority margin and the corresponding statistical tests in clinical trials, in order to provide practical recommendations and suggestions for establishing reference criteria for the non-inferiority margin in China. Data sources The literature on the selection of a non-inferiority margin and statistical tests was mainly extracted from relevant English articles on non-inferior clinical trials published from 1990 to 2007. The starting point (1990) was chosen due to lack of such papers published prior to 1990. This literature was searched via PubMed, Medline and Chinese Knowledge Information (CNKI). ICH guidelines and CPMP papers were downloaded from their official websites. The keywords "clinical trial", "non-inferiority" and "non-inferiority margin" were used. Study selection Forty-three original articles and critical reviews, ICH El0 guideline and CPMP papers were selected. Results The non-inferiority testing with treatment difference and ratio are commonly used, where the non-inferiority margin is determined with and without historical data. Traditionally, this margin is treated as a fixed value, while developed methods take the variation into account in the determination of this margin, on which the test depends is more convincing. The mixed margin consisting of a margin based on treatment difference and a margin based on treatment ratio can exactly control the type I error at the desirable level and obtain a better power. In this review, we also provide some recommendations and suggestions for the selection of the non-inferiority margin in the western countries and China. C