Base substitution is one of the raw fuels that produce genetic variation and drive evolu-tion. Recent studies have shown that the genome components affect mutation patterns to some extent. In order to infer the correlation between the Transition/Transversion ratio (Ts/Tv) and the number of immediately adjacent A&T nucleotides, we investigated 3611007 Oryza sativa SNPs (including 45462 coding SNPs, and 242811 intronic SNPs) and 32019 Arabidopsis SNPs. The results show that Ts/Tv is negatively correlated with the number of immediately adjacent A&T in O. sativa and Arabidopsis. We further calculated AT2 (the number of SNPs whose immediately adjacent nucleotides are either A or T) and AT0 (the number of SNPs whose immediately adjacent nucleotides are either C or G) for all 6 types of SNPs. C/G SNP of O. sativa and Arabidopsis has the highest AT2/AT0, which denotes C/G SNP may be influenced by the adjacent A&T nucleotides mostly. For SNPs in O. sativa, the neighboring effect of A&T nucleotides is limited to 2 nucleotides on both sides; for SNPs in Arabidop-sis, the effect extends no more than 4 nucleotides on both sides.
ZHAO Hui1,3*, LI Qizhai1,2,3*, LI Jun1, ZENG Changqing1, HU Songnian1 & YU Jun1 1. Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100030, China
DNA composition dynamics across genomes of diverse taxonomy is a major sub- ject of genome analyses. DNA composition changes are characteristics of both replication and repair machineries. We investigated 3,611,007 single nucleotide polymorphisms (SNPs) generated by comparing two sequenced rice genomes from distant inbred lines (subspecies), including those from 242,811 introns and 45,462 protein-coding sequences (CDSs). Neighboring-nucleotide e?ects (NNEs) of these SNPs are diverse, depending on structural content-based classifications (genome- wide, intronic, and CDS) and sequence context-based categories (A/C, A/G, A/T, C/G, C/T, and G/T substitutions) of the analyzed SNPs. Strong and evident NNEs and nucleotide proportion biases surrounding the analyzed SNPs were ob- served in 1–3 bp sequences on both sides of an SNP. Strong biases were observed around neighboring nucleotides of protein-coding SNPs, which exhibit a periodicity of three in nucleotide content, constrained by a combined e?ect of codon-related rules and DNA repair mechanisms. Unlike a previous finding in the human genome, we found negative correlation between GC contents of chromosomes and the mag- nitude of corresponding bias of nucleotide C at ?1 site and G at +1 site. These results will further our understanding of the mutation mechanism in rice as well as its evolutionary implications.
This paper discusses inference for ordered parameters of multinomial distributions. We first show that the asymptotic distributions of their maximum likelihood estimators (MLEs) are not always normal and the bootstrap distribution estimators of the MLEs can be inconsistent. Then a class of weighted sum estimators (WSEs) of the ordered parameters is proposed. Properties of the WSEs are studied, including their asymptotic normality. Based on those results, large sample inferences for smooth functions of the ordered parameters can be made. Especially, the confidence intervals of the maximum cell probabilities are constructed. Simulation results indicate that this interval estimation performs much better than the bootstrap approaches in the literature. Finally, the above results for ordered parameters of multinomial distributions are extended to more general distribution models.
This paper investigates one-sided hypotheses testing for p[1], the largest cell probability of multinomial distribution. A small sample test of Ethier (1982) is extended to the general cases. Based on an estimator of p[1], a kind of large sample tests is proposed. The asymptotic power of the above tests under local alternatives is derived. An example is presented at the end of this paper.