1
|
Alkan M, Pham BQ, Del Angel Cruz D, Hammond JR, Barnes TA, Gordon MS. LibERI-A portable and performant multi-GPU accelerated library for electron repulsion integrals via OpenMP offloading and standard language parallelism. J Chem Phys 2024; 161:082501. [PMID: 39171700 DOI: 10.1063/5.0215352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 07/16/2024] [Indexed: 08/23/2024] Open
Abstract
A portable and performant graphics processing unit (GPU)-accelerated library for electron repulsion integral (ERI) evaluation, named LibERI, has been developed and implemented via directive-based (e.g., OpenMP and OpenACC) and standard language parallelism (e.g., Fortran DO CONCURRENT). Offloaded ERIs consist of integrals over low and high contraction s, p, and d functions using the rotated-axis and Rys quadrature methods. GPU codes are factorized based on previous developments [Pham et al., J. Chem. Theory Comput. 19(8), 2213-2221 (2023)] with two layers of integral screening and quartet presorting. In this work, the density screening is moved to the GPU to enhance the computational efficacy for large molecular systems. The L-shells in the Pople basis set are also separated into pure S and P shells to increase the ERI homogeneity and reduce atomic operations and the memory footprint. LibERI is compatible with any quantum chemistry drivers supporting the MolSSI Driver Interface. Benchmark calculations of LibERI interfaced with the GAMESS software package were carried out on various GPU architectures and molecular systems. The results show that the LibERI performance is comparable to other state-of-the-art GPU-accelerated codes (e.g., TeraChem and GMSHPC) and, in some cases, outperforms conventionally developed ERI CUDA kernels (e.g., QUICK) while fully maintaining portability.
Collapse
Affiliation(s)
- Melisa Alkan
- Department of Chemistry, Iowa State University and Ames National Laboratory, Ames, Iowa 50011, USA
- Department of Chemistry, Stanford University, Palo Alto, California 94305, USA
| | - Buu Q Pham
- Department of Chemistry, Iowa State University and Ames National Laboratory, Ames, Iowa 50011, USA
| | - Daniel Del Angel Cruz
- Department of Chemistry, Iowa State University and Ames National Laboratory, Ames, Iowa 50011, USA
| | | | - Taylor A Barnes
- Molecular Sciences Software Institute, Blacksburg, Virginia 24060, USA
| | - Mark S Gordon
- Department of Chemistry, Iowa State University and Ames National Laboratory, Ames, Iowa 50011, USA
| |
Collapse
|
2
|
Kriebel MH, Tecmer P, Gałyńska M, Leszczyk A, Boguslawski K. Accelerating Pythonic Coupled-Cluster Implementations: A Comparison Between CPUs and GPUs. J Chem Theory Comput 2024; 20:1130-1142. [PMID: 38306601 PMCID: PMC10867805 DOI: 10.1021/acs.jctc.3c01110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 01/12/2024] [Accepted: 01/13/2024] [Indexed: 02/04/2024]
Abstract
In this work, we benchmark several Python routines for time and memory requirements to identify the optimal choice of the tensor contraction operations available. We scrutinize how to accelerate the bottleneck tensor operations of Pythonic coupled-cluster implementations in the Cholesky linear algebra domain, utilizing a NVIDIA Tesla V100S PCIe 32GB (rev 1a) graphics processing unit (GPU). The NVIDIA compute unified device architecture API interacts with CuPy, an open-source library for Python, designed as a NumPy drop-in replacement for GPUs. Due to the limitations of video memory, the GPU calculations must be performed batch-wise. Timing results of some contractions containing large tensors are presented. The CuPy implementation leads to a factor of 10-16 speed-up of the bottleneck tensor contractions compared to computations on 36 central processing unit (CPU) cores. Finally, we compare example CCSD and pCCD-LCCSD calculations performed solely on CPUs to their CPU-GPU hybrid implementation, which leads to a speed-up of a factor of 3-4 compared to the CPU-only variant.
Collapse
Affiliation(s)
- Maximilian H. Kriebel
- Institute of Physics, Faculty of Physics,
Astronomy, and Informatics, Nicolaus Copernicus
University in Toruń, Grudziadzka 5, 87-100 Toruń, Poland
| | - Paweł Tecmer
- Institute of Physics, Faculty of Physics,
Astronomy, and Informatics, Nicolaus Copernicus
University in Toruń, Grudziadzka 5, 87-100 Toruń, Poland
| | - Marta Gałyńska
- Institute of Physics, Faculty of Physics,
Astronomy, and Informatics, Nicolaus Copernicus
University in Toruń, Grudziadzka 5, 87-100 Toruń, Poland
| | - Aleksandra Leszczyk
- Institute of Physics, Faculty of Physics,
Astronomy, and Informatics, Nicolaus Copernicus
University in Toruń, Grudziadzka 5, 87-100 Toruń, Poland
| | - Katharina Boguslawski
- Institute of Physics, Faculty of Physics,
Astronomy, and Informatics, Nicolaus Copernicus
University in Toruń, Grudziadzka 5, 87-100 Toruń, Poland
| |
Collapse
|
3
|
Straatsma TP, Windus TL, Nakajima T. Special Topic on High Performance Computing in Chemical Physics. J Chem Phys 2023; 159:210401. [PMID: 38038196 DOI: 10.1063/5.0185894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 11/08/2023] [Indexed: 12/02/2023] Open
Abstract
Computational modeling and simulation have become indispensable scientific tools in virtually all areas of chemical, biomolecular, and materials systems research. Computation can provide unique and detailed atomic level information that is difficult or impossible to obtain through analytical theories and experimental investigations. In addition, recent advances in micro-electronics have resulted in computer architectures with unprecedented computational capabilities, from the largest supercomputers to common desktop computers. Combined with the development of new computational domain science methodologies and novel programming models and techniques, this has resulted in modeling and simulation resources capable of providing results at or better than experimental chemical accuracy and for systems in increasingly realistic chemical environments.
Collapse
Affiliation(s)
- Tjerk P Straatsma
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6373, USA
- Department of Chemistry and Biochemistry, University of Alabama, Tuscaloosa, Alabama 35487-0336, USA
| | - Theresa L Windus
- Department of Chemistry, Iowa State University, Ames, Iowa 50011-2416, USA
- Chemical and Biological Sciences Division, Ames National Laboratory, Ames, Iowa 50011-2416, USA
| | | |
Collapse
|
4
|
Datta D, Gordon MS. Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives. J Chem Theory Comput 2023; 19:7640-7657. [PMID: 37878756 DOI: 10.1021/acs.jctc.3c00876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]
Abstract
An algorithm is presented for the coupled-cluster singles, doubles, and perturbative triples correction [CCSD(T)] method based on the density fitting or the resolution-of-the-identity (RI) approximation for performing calculations on heterogeneous computing platforms composed of multicore CPUs and graphics processing units (GPUs). The directive-based approach to GPU offloading offered by the OpenMP application programming interface has been employed to adapt the most compute-intensive terms in the RI-CCSD amplitude equations with computational costs scaling as O ( N O 2 N V 4 ) , O ( N O 3 N V 3 ) , and O ( N O 4 N V 2 ) (where NO and NV denote the numbers of correlated occupied and virtual orbitals, respectively) and the perturbative triples correction to execute on GPU architectures. The pertinent tensor contractions are performed using an accelerated math library such as cuBLAS or hipBLAS. Optimal strategies are discussed for splitting large data arrays into tiles to fit them into the relatively small memory space of the GPUs, while also minimizing the low-bandwidth CPU-GPU data transfers. The performance of the hybrid CPU-GPU RI-CCSD(T) code is demonstrated on pre-exascale supercomputers composed of heterogeneous nodes equipped with NVIDIA Tesla V100 and A100 GPUs and on the world's first exascale supercomputer named "Frontier", the nodes of which consist of AMD MI250X GPUs. Speedups within the range 4-8× relative to the recently reported CPU-only algorithm are obtained for the GPU-offloaded terms in the RI-CCSD amplitude equations. Applications to polycyclic aromatic hydrocarbons containing 16-66 carbon atoms demonstrate that the acceleration of the hybrid CPU-GPU code for the perturbative triples correction relative to the CPU-only code increases with the molecule size, attaining a speedup of 5.7× for the largest circumovalene molecule (C66H20). The GPU-offloaded code enables the computation of the perturbative triples correction for the C60 molecule using the cc-pVDZ/aug-cc-pVTZ-RI basis sets in 7 min on Frontier when using 12,288 AMD GPUs with a parallel efficiency of 83.1%.
Collapse
Affiliation(s)
- Dipayan Datta
- Department of Chemistry and Ames Laboratory, Iowa State University, 2416 Pammel Drive, Ames, Iowa 50011-2416, United States
| | - Mark S Gordon
- Department of Chemistry and Ames Laboratory, Iowa State University, 2416 Pammel Drive, Ames, Iowa 50011-2416, United States
| |
Collapse
|
5
|
Zahariev F, Xu P, Westheimer BM, Webb S, Galvez Vallejo J, Tiwari A, Sundriyal V, Sosonkina M, Shen J, Schoendorff G, Schlinsog M, Sattasathuchana T, Ruedenberg K, Roskop LB, Rendell AP, Poole D, Piecuch P, Pham BQ, Mironov V, Mato J, Leonard S, Leang SS, Ivanic J, Hayes J, Harville T, Gururangan K, Guidez E, Gerasimov IS, Friedl C, Ferreras KN, Elliott G, Datta D, Cruz DDA, Carrington L, Bertoni C, Barca GMJ, Alkan M, Gordon MS. The General Atomic and Molecular Electronic Structure System (GAMESS): Novel Methods on Novel Architectures. J Chem Theory Comput 2023; 19:7031-7055. [PMID: 37793073 DOI: 10.1021/acs.jctc.3c00379] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/06/2023]
Abstract
The primary focus of GAMESS over the last 5 years has been the development of new high-performance codes that are able to take effective and efficient advantage of the most advanced computer architectures, both CPU and accelerators. These efforts include employing density fitting and fragmentation methods to reduce the high scaling of well-correlated (e.g., coupled-cluster) methods as well as developing novel codes that can take optimal advantage of graphical processing units and other modern accelerators. Because accurate wave functions can be very complex, an important new functionality in GAMESS is the quasi-atomic orbital analysis, an unbiased approach to the understanding of covalent bonds embedded in the wave function. Best practices for the maintenance and distribution of GAMESS are also discussed.
Collapse
Affiliation(s)
- Federico Zahariev
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
| | - Peng Xu
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
| | - Bryce M Westheimer
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
| | - Simon Webb
- VeraChem LLC, 12850 Middlebrook Road, Suite 205, Germantown, Maryland 20874-5244, United States
| | - Jorge Galvez Vallejo
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
- Research School of Computer Science, Australian National University, Canberra, ACT 2601, Australia
| | - Ananta Tiwari
- EP Analytics, Inc., 9909 Mira Mesa Boulevard, Suite 230, San Diego, California 92131, United States
| | - Vaibhav Sundriyal
- Department of Computational Modeling and Simulation Engineering, Old Dominion University, Norfolk, Virginia 23529, United States
| | - Masha Sosonkina
- Department of Computational Modeling and Simulation Engineering, Old Dominion University, Norfolk, Virginia 23529, United States
| | - Jun Shen
- Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - George Schoendorff
- Propellants Branch, Rocket Propulsion Division, Aerospace Systems Directorate, Air Force Research Laboratory, AFRL/RQRP, Edwards Air Force Base, California 93524, United States
| | - Megan Schlinsog
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
| | - Tosaporn Sattasathuchana
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
| | - Klaus Ruedenberg
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
| | - Luke B Roskop
- Hewlett-Packard Enterprise, 2131 Lindau Lane #1000, Bloomington, Minnesota 55425, United States
| | | | - David Poole
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
- School of Chemistry & Biochemistry, Georgia Institute of Technology, Athens, Georgia 30332, United States
| | - Piotr Piecuch
- Department of Chemistry and Department of Physics and Astronomy, Michigan State University, East Lansing, Michigan 48824, United States
| | - Buu Q Pham
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
| | - Vladimir Mironov
- Department of Chemistry, Kyungpook National University, Daegu 41566, South Korea
| | - Joani Mato
- Physical Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, P.O. Box 999, MS K1-83, Richland, Washington 99352, United States
| | - Sam Leonard
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
| | - Sarom S Leang
- EP Analytics, Inc., 9909 Mira Mesa Boulevard, Suite 230, San Diego, California 92131, United States
| | - Joe Ivanic
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| | - Jackson Hayes
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
| | - Taylor Harville
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
| | - Karthik Gururangan
- Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Emilie Guidez
- Department of Chemistry, University of Colorado Denver, Denver, Colorado 80217, United States
| | - Igor S Gerasimov
- Department of Chemistry, Kyungpook National University, Daegu 41566, South Korea
| | - Christian Friedl
- Institut für Theoretische Physik, Johannes Kepler Universität Linz, Altenberger Str. 69, 4040 Linz, Austria
| | - Katherine N Ferreras
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
| | - George Elliott
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
| | - Dipayan Datta
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
| | - Daniel Del Angel Cruz
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
| | - Laura Carrington
- EP Analytics, Inc., 9909 Mira Mesa Boulevard, Suite 230, San Diego, California 92131, United States
| | - Colleen Bertoni
- Argonne Leadership Computing Facility, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Giuseppe M J Barca
- Research School of Computer Science, Australian National University, Canberra, ACT 2601, Australia
| | - Melisa Alkan
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
| | - Mark S Gordon
- Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50014, United States
| |
Collapse
|
6
|
Galvez Vallejo JL, Snowdon C, Stocks R, Kazemian F, Yan Yu FC, Seidl C, Seeger Z, Alkan M, Poole D, Westheimer BM, Basha M, De La Pierre M, Rendell A, Izgorodina EI, Gordon MS, Barca GMJ. Toward an extreme-scale electronic structure system. J Chem Phys 2023; 159:044112. [PMID: 37497819 DOI: 10.1063/5.0156399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 07/03/2023] [Indexed: 07/28/2023] Open
Abstract
Electronic structure calculations have the potential to predict key matter transformations for applications of strategic technological importance, from drug discovery to material science and catalysis. However, a predictive physicochemical characterization of these processes often requires accurate quantum chemical modeling of complex molecular systems with hundreds to thousands of atoms. Due to the computationally demanding nature of electronic structure calculations and the complexity of modern high-performance computing hardware, quantum chemistry software has historically failed to operate at such large molecular scales with accuracy and speed that are useful in practice. In this paper, novel algorithms and software are presented that enable extreme-scale quantum chemistry capabilities with particular emphasis on exascale calculations. This includes the development and application of the multi-Graphics Processing Unit (GPU) library LibCChem 2.0 as part of the General Atomic and Molecular Electronic Structure System package and of the standalone Extreme-scale Electronic Structure System (EXESS), designed from the ground up for scaling on thousands of GPUs to perform high-performance accurate quantum chemistry calculations at unprecedented speed and molecular scales. Among various results, we report that the EXESS implementation enables Hartree-Fock/cc-pVDZ plus RI-MP2/cc-pVDZ/cc-pVDZ-RIFIT calculations on an ionic liquid system with 623 016 electrons and 146 592 atoms in less than 45 min using 27 600 GPUs on the Summit supercomputer with a 94.6% parallel efficiency.
Collapse
Affiliation(s)
| | - Calum Snowdon
- School of Computing, Australian National University, Canberra 2601, ACT, Australia
| | - Ryan Stocks
- School of Computing, Australian National University, Canberra 2601, ACT, Australia
| | - Fazeleh Kazemian
- School of Computing, Australian National University, Canberra 2601, ACT, Australia
| | - Fiona Chuo Yan Yu
- School of Computing, Australian National University, Canberra 2601, ACT, Australia
| | - Christopher Seidl
- School of Computing, Australian National University, Canberra 2601, ACT, Australia
| | - Zoe Seeger
- School of Chemistry, Monash University, Clayton 3800, VIC, Australia
| | - Melisa Alkan
- Department of Chemistry, Iowa State University, Ames, Iowa 50011-3111, USA
| | - David Poole
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, USA
| | - Bryce M Westheimer
- Department of Chemistry, Iowa State University, Ames, Iowa 50011-3111, USA
| | - Mehaboob Basha
- Pawsey Supercomputing Research Centre, Kensington, WA 6151, Australia
| | | | - Alistair Rendell
- College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | | | | | - Giuseppe M J Barca
- School of Computing, Australian National University, Canberra 2601, ACT, Australia
| |
Collapse
|