The accurate prediction of disordered regions in protein sequences using machine learning approaches

Penfei Han

School of Computer Science and IT, RMIT University

Date and time: 11.30am - 12.30pm, Friday 28th August, 2009

Venue: 10.08.03 (Building 10, Level 08, Room 03)

Abstract:

A major challenge in the post-genome era is to determine the function of proteins. Many proteins contain intrinsic unstructured or disordered regions (DRs) under physiological conditions and yet carry important functions. Computational approaches to predicting DRs can greatly assist biologists studying the structures and functions of proteins.

We propose novel application of machine learning models and physiochemical features extracted from protein sequences to predict long, short and global disorder in proteins. We investigate the database of numerical indices representing physiochemical properties of amino acids and select the indices most correlated with disorder. To achieve high accuracy of prediction, novel feature transforms including autocorrelation and wavelet transform are applied to DR prediction based on selected physiochemical properties of amino acids. Several disorder prediction models are built based on Decision Tree,

Random Forest and Support Vector Machine. Our experiments on benchmark datasets show that our predictors achieve more favorable accuracy compared with state-of-the-art disorder proctors, and also can significantly improve the understandability and efficiency of prediction.

About the speaker:

Pengfei is a PhD student at the School of CS&IT at RMIT University. This is a PhD completion seminar.


Seminar Organisation

Seminars are free and open to the general public. No booking is necessary. If you are interested in giving a presentation in this seminar series, or to make suggestions for speakers, please contact Xiaodong Li, the seminar co-ordinator.