The accurate identification of potential poly(A) sites has contributed to all many studies with regard to alternative polyadenylation. The aim of this study was the development of a machine-learning methodology that will help to discriminate real polyadenylation signals from randomly occurring signals in genomic sequence. Since previous studies have revealed that RNA secondary structure in certain genes has significant impact, the authors tried to computationally pinpoint common structural patterns around the poly(A) sites and to investigate how RNA secondary structure may influence polyadenylation. This involved an initial study on the impact of RNA structure and it was found using motif search tools that hairpin structures might be important. Thus, it was propose that, in addition to the sequence pattern around poly(A) sites, there exists a widespread structural pattern that is also employed during human mRNA polyadenylation. In this study, the authors present a computational model that uses support vector machines to predict human poly(A) sites. The results show that this predictive model has a comparable performance to the current prediction tool. In addition, it was identified common structural patterns associated with polyadenylation using several motif finding programs and this provides new insight into the role of RNA secondary structure plays in polyadenylation.
- Data mining
- Polyadenylation poly(A)
- Support vector machines (SVMs)