Contents
GROMACS 2025.2のCentOS 7(CUDA 12.3; TITAN RTX)へのインストール
$ wget https://ftp.gromacs.org/gromacs/gromacs-2025.2.tar.gz
$ tar xvzf gromacs-2025.2.tar.gz
$ cd gromacs-2025.2
$ mkdir build
$ cd build
$ scl enable devtoolset-11 bash
$ sudo rm /usr/local/cuda
$ sudo ln -s /usr/local/cuda-12.3 /usr/local/cuda
$ ccmake .. -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DGMX_API=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_INSTALL_PREFIX=/usr/local/gromacs/2025.2_cuda -DGMX_GPU=CUDA -DCMAKE_CUDA_ARCHITECTURES=native
$ make -j 18
$ make check -j 18
$ sudo make install
$ source /usr/local/gromacs/2025.2_cuda/bin/GMXRC
GROMACS 2025.3 with support for Neutral Network PotentialsのAlmaLinux 8(CUDA 12.8; TITAN RTX)へのインストールとテスト
参考: Gromacs 2025.2 with CUDA support – 計算科学研究センター
ANI is an abbreviation for ANAKIN-ME (Accurate NeurAl networK engINe for Molecular Energies), a neural network potential framework designed to provide quantum chemistry accuracy (close to DFT) for molecular energies and forces, but with the speed of classical force fields. ANI-2x can be used to simulate organic molecules containing seven elements (H, C, N, O, F, Cl, and S).
テスト結果
System: N-Acetyl-L-alanine methylamide (22 atoms) in water (6765 atoms)
dt = 0.001 ; (1 fs)
tcoup = V-rescale
pcoup = C-rescale
GPU, NVIDIA TITAN-RTX
command: gmx mdrun -deffnm md -ntmpi 1 -ntomp 1 -resethway -noconfout
Conditions | Performance (ns/day) | Speed up |
NNP/MM (TorchANI) with GMX_NN_DEVICE=cpu (1 CPU) | 2.482 | x 1 |
NNP/MM (TorchANI) with GMX_NN_DEVICE=cuda | 3.901 | x 1.6 |
NNP/MM (TorchANI/NNPOps) | 57.793 | x 23 |
MM with 1 CPU | 16.515 | (x 6.7) |
MM with 1 CPU and 1 GPU | 635.914 | (x 256) |
MM with 2 CPU and 1 GPU | 706.947 | (x 285) |
libtorchのインストール
$ sudo dnf install gcc-toolset-13 ninja-build cudnn-cuda-12 cudss-cuda-12 cusparselt-cuda-12 libnccl libnccl-devel openblas $ conda create -n libtorch271 python=3.13 $ conda activate libtorch271 (libtorch271) $ conda install numpy pyyaml typing_extensions (libtorch271) $ scl enable gcc-toolset-13 bash (libtorch271) $ wget https://github.com/pytorch/pytorch/releases/download/v2.7.1/pytorch-v2.7.1.tar.gz (libtorch271) $ tar xvzf pytorch-v2.7.1.tar.gz (libtorch271) $ cd pytorch-v2.7.1 (libtorch271) $ cd third_party (libtorch271) $ git clone https://github.com/NVIDIA/nccl.git (libtorch271) $ cd nccl (libtorch271) $ git checkout v2.21.5-1 (libtorch271) $ cd ../../ (libtorch271) $ mkdir build && cd build (libtorch271) $ cmake .. -GNinja -DBLAS=OpenBLAS -DBUILD_FUNCTORCH=OFF -DBUILD_PYTHON=False\ -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/libtorch/2.7.1/cu128\ -DPython_EXECUTABLE=/home/user/miniforge3/envs/libtorch271/bin/python -DTORCH_BUILD_VERSION=2.7.1\ -DCMAKE_PREFIX_PATH=/usr/local/cuda-12.8 -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_CUDSS=ON -DUSE_CUSPARSELT=ON\ -DUSE_NUMPY=True -DCUDA_NVRTC_SHORTHASH=c3430e8b (libtorch271) $ ninja -j2 (libtorch271) $ sudo ninja install (libtorch271) $ cd ../../ (libtorch271) $ conda deactivate
GROMACS 2025.3のインストール
$ wget https://ftp.gromacs.org/gromacs/gromacs-2025.3.tar.gz $ wget https://www.fftw.org/fftw-3.3.10.tar.gz $ tar xvzf https://ftp.gromacs.org/gromacs/gromacs-2025.3.tar.gz $ cd gromacs-2025.3 $ mkdir build && cd build $ export FFTW_PATH=${BASEDIR}/fftw-3.3.10.tar.gz $ export Torch_DIR=/opt/libtorch/2.7.1/cu128 $ /usr/local/cmake/4.1.0/bin/cmake .. -DCMAKE_PREFIX_PATH="${TORCH_DIR};/usr/local/cuda-12.8"\ -DCMAKE_INSTALL_PREFIX=/usr/local/gromacs/2025.3_cuda_torchcu128 -DGMX_GPU=CUDA -DGMX_USE_CUFFTMP=OFF\ -DGMX_NNPOT=TORCH -DCAFFE2_USE_CUDNN=ON -DCAFFE2_USE_CUSPARSELT=ON -DUSE_CUDSS=ON\ -DPython_EXECUTABLE=/home/user/miniforge3/envs/libtorch271/bin/python -DGMX_BUILD_OWN_FFTW=ON\ -DGMX_BUILD_OWN_FFTW_URL=${FFTW_PATH} -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.8/bin/nvcc\ -DCMAKE_CUDA_IMPLICIT_LINK_DIRECTORIES=/usr/local/cuda-12.8/targets/x86_64-linux/lib\ -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.8 -DGMX_INSTALL_NBLIB_API=OFF\ -DCUDA_NVRTC_SHORTHASH=c3430e8b -DREGRESSIONTEST_DOWNLOAD=ON\ -Dnvtx3_dir=/usr/local/cuda-12.8/targets/x86_64-linux/include/nvtx3 $ make -j37 $ make -j37 check $ sudo make install
Pretrained modelのエクスポート
参考: Neural Network Potentials – GROMACS Reference Manual
$ conda activate libtorch271 (libtorch271) $ pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128 (libtorch271) $ conda install jupyterlab ipywidgets (libtorch271) $ conda install cuda-version==12.8 (libtorch271) $ conda install -c conda-forge torchani nnpops
実行ファイル: lmuellender/gmx-nnpot-wrapper – GitHub
(libtorch271) $ cd gmx-nnpot-wrapper-main
以下のPython scriptでファイルを作成
import os import torch from models import GmxANIModel import torchani # デバイス設定 device = 'cuda' if torch.cuda.is_available() else 'cpu' # TorchANI の site-packages ディレクトリ ta_dir = os.path.dirname(torchani.__file__) # cuAEV の .so ファイルを探索 def find_cuaev_so(base): for fname in os.listdir(base): if "cuaev" in fname and fname.endswith(".so"): return os.path.join(base, fname) return None cuaev_so = find_cuaev_so(ta_dir) assert cuaev_so, f"cuAEV の .so が見つかりません: {ta_dir}" # モデル作成 model = GmxANIModel(version=2, device=device) # TorchScript 保存時に拡張ライブラリのパスを埋め込む save_path = 'models/ani2x.pt' extra_files = {"extension_libs": cuaev_so} torch.jit.script(model).save(save_path, _extra_files=extra_files) print(f"Saved model to {save_path} with cuAEV library: {cuaev_so}") # モデル作成 model = GmxANIModel(use_opt='cuaev', version=2, device=device) # TorchScript 保存時に拡張ライブラリのパスを埋め込む save_path = 'models/ani2x_cuaev.pt' extra_files = {"extension_libs": cuaev_so} torch.jit.script(model).save(save_path, _extra_files=extra_files) print(f"Saved model to {save_path} with cuAEV library: {cuaev_so}") save_path = 'models/ani2x_nnpops.pt' from models import GmxANIModel # デバイス設定 device = 'cuda' if torch.cuda.is_available() else 'cpu' # example atomic number tensor for N-acetyl-L-alanine methylamide atomic_numbers = torch.tensor([1,6,1,1,6,8,7,1,6,1,6,1,1,1,6,8,7,1,6,1,1,1], device=device) model = GmxANIModel(use_opt='nnpops', atomic_numbers=atomic_numbers, version=2, device=device) # nnpops can be found by checking for torch extension library ext_lib = [] for lib in torch.ops.loaded_libraries: if lib: ext_lib.append(lib) # if multiple extensions are found, they are separated by ':' ext_lib = ":".join(ext_lib) print("loaded extension libraries: ", ext_lib) extra_files = {} extra_files['extension_libs'] = ext_lib torch.jit.script(model).save(save_path, _extra_files=extra_files)
(libtorch271) $ cd examples (libtorch271) $ source /usr/local/gromacs/2025.3_cuda_torchcu128/bin/GMXRC (libtorch271) $ export LD_LIBRARY_PATH="$CONDA_PREFIX/lib:$LD_LIBRARY_PATH"
Neural Network Potential (NNP)/MMシミュレーションのテスト(N-acetyl-L-alanine methylamide in water)
Using ani2x.pt with CPU
(libtorch271) $ export GMX_NN_DEVICE=cpu (libtorch271) $ gmx grompp -f md.mdp -c conf.gro -p topol.top -o md.tpr Neural network potential Interface is active, topology was modified! Number of NN input atoms: 22 Number of regular atoms: 6765 Bonds removed: 62 Angles removed: 36 Dihedrals removed: 42 Connection-only (type 5) bonds added: 21 (libtorch271) $ gmx mdrun -deffnm md -ntmpi 1 -ntomp 8 :-) GROMACS - gmx mdrun, 2025.3 (-: Reading file md.tpr, VERSION 2025.3 (single precision) Changing nstlist from 15 to 100, rlist from 1.2 to 1.302 1 GPU selected for this run. Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node: PP:0,PME:0 PP tasks will do (non-perturbed) short-ranged interactions on the GPU PP task will update and constrain coordinates on the GPU PME tasks will do all aspects on the GPU Using 1 MPI thread Using 1 OpenMP threads Core t (s) Wall t (s) (%) Time: 174.068 174.069 100.0 (ns/day) (hour/ns) Performance: 2.482 9.669
Using ani2x.pt with CUDA
(libtorch271) $ export GMX_NN_DEVICE=cuda (libtorch271) $ gmx mdrun -deffnm md -ntmpi 1 -ntomp 1 Core t (s) Wall t (s) (%) Time: 110.756 110.756 100.0 (ns/day) (hour/ns) Performance: 3.901 6.152 GROMACS reminds you: "Go back to the rock from under which you came" (Fiona Apple) Segmentation fault (コアダンプ)
Using ani2x_cuaev.pt
(libtorch271) $ sed 's/ani2x.pt/ani2x_cuaev.pt/g' md.mdp > md_cuaev.mdp (libtorch271) $ gmx grompp -f md_cuaev.mdp -c conf.gro -p topol.top -o md_cuaev.tpr RuntimeError: AssertionError: cuaev currently does not support PBC
Using ani2x_nnpops.pt
(libtorch271) $ sed 's/ani2x.pt/ani2x_nnpops.pt/g' md.mdp > md_nnpops.mdp (libtorch271) $ gmx grompp -f md_nnpops.mdp -c conf.gro -p topol.top -o md_nnpops.tpr -maxwarn 1 (libtorch271) $ gmx mdrun -v -deffnm md_nnpops -ntmpi 1 -ntomp 1 R E A L C Y C L E A N D T I M E A C C O U N T I N G On 1 MPI rank Activity: Num Num Call Wall time Giga-Cycles Ranks Threads Count (s) total sum % -------------------------------------------------------------------------------- Neighbor search 1 1 51 0.149 0.447 2.0 Launch PP GPU ops. 1 1 9951 0.206 0.618 2.8 Force 1 1 5001 0.019 0.058 0.3 NN potential 1 1 5001 6.717 20.153 89.8 PME GPU mesh 1 1 5001 0.144 0.432 1.9 Wait GPU NB local 1 1 5001 0.001 0.004 0.0 Wait GPU state copy 1 1 10256 0.012 0.036 0.2 NB X/F buffer ops. 1 1 51 0.001 0.003 0.0 Write traj. 1 1 51 0.063 0.190 0.8 Kinetic energy 1 1 101 0.004 0.013 0.1 Rest 0.159 0.477 2.1 -------------------------------------------------------------------------------- Total 7.476 22.430 100.0 -------------------------------------------------------------------------------- Breakdown of PME mesh activities -------------------------------------------------------------------------------- Wait PME GPU gather 1 1 5001 0.001 0.003 0.0 Reduce GPU PME F 1 1 5001 0.001 0.002 0.0 Launch PME GPU ops. 1 1 45009 0.133 0.400 1.8 --------------------------------------------------------------------------------
Performanceの比較検証
GROMACS 2025.2
入力データとしてADH cubicのテストデータ(原子数134,177)を使用。CPU, Intel Core i9-9980XE 3.00GHz; GPU, NVIDIA TITAN-RTX。
CPU only (8 cores)
$ gmx mdrun -deffnm adh_cubic_test2025.2 -pin on -resethway -noconfout -ntmpi 1 -ntomp 8 -bonded cpu -update cpu -nb cpu -pme cpu
starting mdrun 'NADP-DEPENDENT ALCOHOL DEHYDROGENASE in water' 10000 steps, 20.0 ps. step 5000: resetting all time and cycle counters Core t (s) Wall t (s) (%) Time: 446.183 55.773 800.0 (ns/day) (hour/ns) Performance: 15.494 1.549
1 GPU + 8 CPU
$ gmx mdrun -deffnm adh_cubic_test2025.2 -pin on -resethway -noconfout -ntmpi 1 -ntomp 8 -bonded gpu -update gpu -nb gpu -pme gpu -nsteps 20000
1 GPU selected for this run. Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node: PP:0,PME:0 PP tasks will do (non-perturbed) short-ranged and most bonded interactions on the GPU PP task will update and constrain coordinates on the GPU PME tasks will do all aspects on the GPU Using 1 MPI thread Using 8 OpenMP threads starting mdrun 'NADP-DEPENDENT ALCOHOL DEHYDROGENASE in water' 20000 steps, 40.0 ps. step 10000: resetting all time and cycle counters Core t (s) Wall t (s) (%) Time: 71.585 8.950 799.8 (ns/day) (hour/ns) Performance: 193.089 0.124