CAVE-CL: An OpenCL version of the package for detection and quantitative analysis of internal cavities in a system of overlapping balls: Application to proteins

Ján Buša, Ján Buša, Shura Hayryan, Chin Kun Hu, Ming Chya Wu

研究成果: 雜誌貢獻期刊論文同行評審

3 引文 斯高帕斯(Scopus)


Here we present the revised and newly rewritten version of our earlier published CAVE package (Buša et al., 2010) which was originally written in FORTRAN. The package has been rewritten in C language, the algorithm has been parallelized and implemented using OpenCL. This makes the program convenient to run on platforms with Graphical Processing Units (GPUs). Improvements include also some modifications/optimizations of the original algorithm. A considerable improvement in the performance of the code has been achieved. A new tool called input_structure has been added which helps the user to make the data input and conversion more easier and universal. New version program summary Program Title: CAVE-CL, CAVE C Catalogue identifier: AEHC_v2_0 Program summary URL: Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, No. of lines in distributed program, including test data, etc.: 32646 No. of bytes in distributed program, including test data, etc.: 444248 Distribution format: tar.gz Programming language: C, C++, OpenCL. Computer: PC with GPU. Operating system: OpenCL compatible systems. Has the code been vectorized or parallelized?: Parallelized using GPUs. A revised serial version (non GPU) is included in the package as well. Keywords: Proteins, Solvent accessible area, Excluded volume, Cavities, Analytic method, Stereographic projection, GPGPU, OpenCL. PACS: 82.20.Wt, 02.60.Cb, 02.70.Ns. Classification: 16.1. Catalogue identifier of previous version: AEHC_v1_0. Journal reference of previous version: Comput. Phys. Commun. 181 (2010) 2116. Does the new version supersede the previous version?: Yes Nature of problem: Molecular structure analysis. Solution method: Analytical method, which uses the stereographic transformation for exact detection of internal cavities in the system of overlapping balls and numerical algorithm for calculation of the volume and the surface area of cavities. Reasons for the new version: This work is in line with our global efforts to modernize the protein structure related algorithms and software packages developed in our research group during last several years [1–8]. These tools are keeping to receive considerable attention from researches and they have been used in solving many interesting research problems [9,10]. Among many others, one important application has been found by the members of our team [11]. Therefore, we think that there is a demand to revise and modernize these tools and to make them more efficient. Here we follow the approach used earlier in [8] to develop a new version of the CAVE package [7]. The original CAVE package was written in FORTRAN language. One of the reasons for the new version is to rewrite it in C in order to make it more friendly to the young researchers who are not familiar with FORTRAN. Another, a more important reason, is to use the possibilities of the contemporary hardware (for example, the modern graphical cards) to improve the performance of the package. We also want to allow the user to avoid the re-compiling of the program for every molecule during multiple calculations of the array of molecules. For this purpose we are providing the possibility to use general pdb files as an input. After compiling one time, the program can receive any number of input files successively. Also, we found it necessary to go through the algorithm and to optimize, where it is possible, the memory usage and to make the algorithm more efficient. Summary of revisions: 1. Memory usage and language. The whole code has been ported into C and the static arrays have been replaced with dynamic memory allocation. This allows to load and handle the proteins of arbitrary size.2. Changes in the algorithm. Like in [8], the original method of North Pole test and molecule rotation [4] has been changed. The details of implementation and the benefits from this change are properly described in [8] and we find it not necessary to repeat it here.3. New tool. A module called input_structure which takes as an input a protein structure file in the format compatible with Protein Data Bank (pdb) [12] has been adopted from [8]. Using external tool allows users to create their own mappings of atoms and radii without re-compiling the module input_structure itself or the CAVE. It is the user's responsibility to assign proper radii to each type of atoms. One can use any of the published standard sets of radii (see for example, [13–17]). Alternatively, the user can assign his own values for radii immediately in the module input_structure. The radii are assigned in a special file with extension pds (see the documentation) which consists of lines like this: ATOM CA ALA 2.0 which is read as “the Cα atom of Alanine has radius 2.0 Å”.4. Some computational tricks. In several parts of the program square roots were replaced by second powers and calls of sin and cos functions were replaced by calls to sincos allowing for further speed-up (in comparison to original FORTRAN version). The typical value of the relative error between results obtained by original (FORTRAN), C, and OpenCL versions was between 10−8 and 10−10 and it never exceeded 10−5. Small differences in results can be due to the implementation of compiler and specially in case of OpenCL also in the implementation of arithmetic by the GPU vendor. [Table presented]5. OpenCL implementation and testing results. OpenCL [18] is an open standard for parallel programming in heterogeneous systems. It is becoming increasingly popular and has proved to be an efficient tool for computations in different fields (see, for example, the most recent [19,20] and the references therein). Table 1 shows the speedup of the C and OpenCL implementations of CAVE as compared to the FORTRAN version. We compare both results obtained using free GNU FORTRAN (g77) and commercial (and faster) ifort. Speedup is calculated as a ratio between the original time obtained by FORTRAN and C or OpenCL version of program. Times of execution are measured in seconds. [Figure presented] One could expect greater speed-ups but the problem is that not the whole algorithm could be parallelized. Only about 1/3 of the whole program was parallelized and the effect of this is visible for the proteins with 2000 atoms and more if the calculation time of FORTRAN version is higher than approximately 10 s. The rest of the code is sequential and its parallelization will require entirely new algorithm which might be the future work. Fig. 1 shows the speed-up as a function of number of neighbors. This clearly indicates, that the effect of parallelization is stronger for proteins with many neighbors. This is also the reason, why the effect is not so strong for proteins with 0 testing sphere radius. Most of the cavities in such case are enclosed only in few (around 4–8) spheres, while in the case of 1.2 testing sphere radius we have easily 35 or more enclosing spheres. In global, we can see that C version is a good choice for general proteins (and testing sphere radius of 0), OpenCL is proper for larger proteins and larger computational times. 0 in the name of protein means that no probe radius has been added to the atomic radii. In other cases 1.2 Å was added to all atomic radii. All results were obtained on computer with Intel Core 2 Duo E8500 CPU running at 3.16 GHz with 4 GB RAM and GPU NVIDIA GTX470 and computer with Intel Xeon X5450 CPU running at 3.00 GHz with 32 GB RAM and dedicated NVIDIA C1060 GPU card. When considering which GPU to use, it is important to watch its double precision performance. Consumer oriented GPUs have usually intentionally decreased double precision performance and because of that results can be similar even if newer generation of GPUs is used. For instance in 2010 the performance in double precision of NVIDIA GPUs (except for highly specialized GPUs for scientific computing) was 1/8 of the performance in single precision. Nowadays (2014) this ratio is 1/24, meaning that GPUs from 2010 are as fast as current GPUs (except for special editions of GPUs or dedicated cards).

頁(從 - 到)224-227
期刊Computer Physics Communications
出版狀態已出版 - 1 5月 2015


深入研究「CAVE-CL: An OpenCL version of the package for detection and quantitative analysis of internal cavities in a system of overlapping balls: Application to proteins」主題。共同形成了獨特的指紋。