Introduction-2

In ligand-based drug design, a model of the biological target may be created from the data obtained in wet lab experiments involving the target. An appropriate classification algorithm applied on this model can aid in an effective computer-aided drug design that can predict the effect of the target with a new ligand from the chemical space. This algorithm has to operate on a large number of training examples with a huge number of descriptors. In order to run this algorithm efficiently with a good speed-up, the algorithm can be developed on a GPU-based parallel computing environment.

Of all machine learning classification algorithms, random forest appears to produce more significant results in medicinal chemistry. The reason behind this can be attributed to random feature selection and repeated feature evaluation. Random forests are a collection of decision trees, that are initially trained with examples and appropriate weights are assigned to each descriptor. The sparse nature of biological data can be taken to advantage and the existing random forest algorithm can be modified. By developing a GPU-based algorithm, the enormous computing power of GPU can be used on large data sets. The aim of this project is to develop a GPU based random forest classifier for virtual screening that represents and considers all the descriptors.

Leave a comment