Contents
GROMACS 2025.2のCentOS 7(CUDA 12.3; TITAN RTX)へのインストール
$ wget https://ftp.gromacs.org/gromacs/gromacs-2025.2.tar.gz
$ tar xvzf gromacs-2025.2.tar.gz
$ cd gromacs-2025.2
$ mkdir build
$ cd build
$ scl enable devtoolset-11 bash
$ sudo rm /usr/local/cuda
$ sudo ln -s /usr/local/cuda-12.3 /usr/local/cuda
$ ccmake .. -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DGMX_API=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_INSTALL_PREFIX=/usr/local/gromacs/2025.2_cuda -DGMX_GPU=CUDA -DCMAKE_CUDA_ARCHITECTURES=native
$ make -j 18
$ make check -j 18
$ sudo make install
$ source /usr/local/gromacs/2025.2_cuda/bin/GMXRC
GROMACS 2025.3 with support for Neutral Network PotentialsのAlmaLinux 8(CUDA 12.8; TITAN RTX)へのインストールとテスト
参考: Gromacs 2025.2 with CUDA support – 計算科学研究センター
ANI is an abbreviation for ANAKIN-ME (Accurate NeurAl networK engINe for Molecular Energies), a neural network potential framework designed to provide quantum chemistry accuracy (close to DFT) for molecular energies and forces, but with the speed of classical force fields. ANI-2x can be used to simulate organic molecules containing seven elements (H, C, N, O, F, Cl, and S).
テスト結果
System: N-Acetyl-L-alanine methylamide (22 atoms) in water (6765 atoms)
dt = 0.001 ; (1 fs)
tcoup = V-rescale
pcoup = C-rescale
GPU, NVIDIA TITAN-RTX
command: gmx mdrun -deffnm md -ntmpi 1 -ntomp 1 -resethway -noconfout
| Conditions | Performance (ns/day) | Speed up |
| NNP/MM (TorchANI) with GMX_NN_DEVICE=cpu (1 CPU) | 2.482 | x 1 |
| NNP/MM (TorchANI) with GMX_NN_DEVICE=cuda | 3.901 | x 1.6 |
| NNP/MM (TorchANI/NNPOps) | 57.793 | x 23 |
| MM with 1 CPU | 16.515 | (x 6.7) |
| MM with 1 CPU and 1 GPU | 635.914 | (x 256) |
| MM with 2 CPU and 1 GPU | 706.947 | (x 285) |
libtorchのインストール
$ sudo dnf install gcc-toolset-13 ninja-build cudnn-cuda-12 cudss-cuda-12 cusparselt-cuda-12 libnccl libnccl-devel openblas $ conda create -n libtorch271 python=3.13 $ conda activate libtorch271 (libtorch271) $ conda install numpy pyyaml typing_extensions (libtorch271) $ scl enable gcc-toolset-13 bash (libtorch271) $ wget https://github.com/pytorch/pytorch/releases/download/v2.7.1/pytorch-v2.7.1.tar.gz (libtorch271) $ tar xvzf pytorch-v2.7.1.tar.gz (libtorch271) $ cd pytorch-v2.7.1 (libtorch271) $ cd third_party (libtorch271) $ git clone https://github.com/NVIDIA/nccl.git (libtorch271) $ cd nccl (libtorch271) $ git checkout v2.21.5-1 (libtorch271) $ cd ../../ (libtorch271) $ mkdir build && cd build (libtorch271) $ cmake .. -GNinja -DBLAS=OpenBLAS -DBUILD_FUNCTORCH=OFF -DBUILD_PYTHON=False\ -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/libtorch/2.7.1/cu128\ -DPython_EXECUTABLE=/home/user/miniforge3/envs/libtorch271/bin/python -DTORCH_BUILD_VERSION=2.7.1\ -DCMAKE_PREFIX_PATH=/usr/local/cuda-12.8 -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_CUDSS=ON -DUSE_CUSPARSELT=ON\ -DUSE_NUMPY=True -DCUDA_NVRTC_SHORTHASH=c3430e8b (libtorch271) $ ninja -j2 (libtorch271) $ sudo ninja install (libtorch271) $ cd ../../ (libtorch271) $ conda deactivate
GROMACS 2025.3のインストール
$ wget https://ftp.gromacs.org/gromacs/gromacs-2025.3.tar.gz
$ wget https://www.fftw.org/fftw-3.3.10.tar.gz
$ tar xvzf https://ftp.gromacs.org/gromacs/gromacs-2025.3.tar.gz
$ cd gromacs-2025.3
$ mkdir build && cd build
$ export FFTW_PATH=${BASEDIR}/fftw-3.3.10.tar.gz
$ export Torch_DIR=/opt/libtorch/2.7.1/cu128
$ /usr/local/cmake/4.1.0/bin/cmake .. -DCMAKE_PREFIX_PATH="${TORCH_DIR};/usr/local/cuda-12.8"\
-DCMAKE_INSTALL_PREFIX=/usr/local/gromacs/2025.3_cuda_torchcu128 -DGMX_GPU=CUDA -DGMX_USE_CUFFTMP=OFF\
-DGMX_NNPOT=TORCH -DCAFFE2_USE_CUDNN=ON -DCAFFE2_USE_CUSPARSELT=ON -DUSE_CUDSS=ON\
-DPython_EXECUTABLE=/home/user/miniforge3/envs/libtorch271/bin/python -DGMX_BUILD_OWN_FFTW=ON\
-DGMX_BUILD_OWN_FFTW_URL=${FFTW_PATH} -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.8/bin/nvcc\
-DCMAKE_CUDA_IMPLICIT_LINK_DIRECTORIES=/usr/local/cuda-12.8/targets/x86_64-linux/lib\
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.8 -DGMX_INSTALL_NBLIB_API=OFF\
-DCUDA_NVRTC_SHORTHASH=c3430e8b -DREGRESSIONTEST_DOWNLOAD=ON\
-Dnvtx3_dir=/usr/local/cuda-12.8/targets/x86_64-linux/include/nvtx3
$ make -j37
$ make -j37 check
$ sudo make install
Pretrained modelのエクスポート
参考: Neural Network Potentials – GROMACS Reference Manual
$ conda activate libtorch271 (libtorch271) $ pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128 (libtorch271) $ conda install jupyterlab ipywidgets (libtorch271) $ conda install cuda-version==12.8 (libtorch271) $ conda install -c conda-forge torchani nnpops
実行ファイル: lmuellender/gmx-nnpot-wrapper – GitHub
(libtorch271) $ cd gmx-nnpot-wrapper-main
以下のPython scriptでファイルを作成
import os
import torch
from models import GmxANIModel
import torchani
# デバイス設定
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# TorchANI の site-packages ディレクトリ
ta_dir = os.path.dirname(torchani.__file__)
# cuAEV の .so ファイルを探索
def find_cuaev_so(base):
for fname in os.listdir(base):
if "cuaev" in fname and fname.endswith(".so"):
return os.path.join(base, fname)
return None
cuaev_so = find_cuaev_so(ta_dir)
assert cuaev_so, f"cuAEV の .so が見つかりません: {ta_dir}"
# モデル作成
model = GmxANIModel(version=2, device=device)
# TorchScript 保存時に拡張ライブラリのパスを埋め込む
save_path = 'models/ani2x.pt'
extra_files = {"extension_libs": cuaev_so}
torch.jit.script(model).save(save_path, _extra_files=extra_files)
print(f"Saved model to {save_path} with cuAEV library: {cuaev_so}")
# モデル作成
model = GmxANIModel(use_opt='cuaev', version=2, device=device)
# TorchScript 保存時に拡張ライブラリのパスを埋め込む
save_path = 'models/ani2x_cuaev.pt'
extra_files = {"extension_libs": cuaev_so}
torch.jit.script(model).save(save_path, _extra_files=extra_files)
print(f"Saved model to {save_path} with cuAEV library: {cuaev_so}")
save_path = 'models/ani2x_nnpops.pt'
from models import GmxANIModel
# デバイス設定
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# example atomic number tensor for N-acetyl-L-alanine methylamide
atomic_numbers = torch.tensor([1,6,1,1,6,8,7,1,6,1,6,1,1,1,6,8,7,1,6,1,1,1], device=device)
model = GmxANIModel(use_opt='nnpops', atomic_numbers=atomic_numbers, version=2, device=device)
# nnpops can be found by checking for torch extension library
ext_lib = []
for lib in torch.ops.loaded_libraries:
if lib:
ext_lib.append(lib)
# if multiple extensions are found, they are separated by ':'
ext_lib = ":".join(ext_lib)
print("loaded extension libraries: ", ext_lib)
extra_files = {}
extra_files['extension_libs'] = ext_lib
torch.jit.script(model).save(save_path, _extra_files=extra_files)
(libtorch271) $ cd examples (libtorch271) $ source /usr/local/gromacs/2025.3_cuda_torchcu128/bin/GMXRC (libtorch271) $ export LD_LIBRARY_PATH="$CONDA_PREFIX/lib:$LD_LIBRARY_PATH"
Neural Network Potential (NNP)/MMシミュレーションのテスト(N-acetyl-L-alanine methylamide in water)
Using ani2x.pt with CPU
(libtorch271) $ export GMX_NN_DEVICE=cpu
(libtorch271) $ gmx grompp -f md.mdp -c conf.gro -p topol.top -o md.tpr
Neural network potential Interface is active, topology was modified!
Number of NN input atoms: 22
Number of regular atoms: 6765
Bonds removed: 62
Angles removed: 36
Dihedrals removed: 42
Connection-only (type 5) bonds added: 21
(libtorch271) $ gmx mdrun -deffnm md -ntmpi 1 -ntomp 8
:-) GROMACS - gmx mdrun, 2025.3 (-:
Reading file md.tpr, VERSION 2025.3 (single precision)
Changing nstlist from 15 to 100, rlist from 1.2 to 1.302
1 GPU selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
PP:0,PME:0
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PP task will update and constrain coordinates on the GPU
PME tasks will do all aspects on the GPU
Using 1 MPI thread
Using 1 OpenMP threads
Core t (s) Wall t (s) (%)
Time: 174.068 174.069 100.0
(ns/day) (hour/ns)
Performance: 2.482 9.669
Using ani2x.pt with CUDA
(libtorch271) $ export GMX_NN_DEVICE=cuda
(libtorch271) $ gmx mdrun -deffnm md -ntmpi 1 -ntomp 1
Core t (s) Wall t (s) (%)
Time: 110.756 110.756 100.0
(ns/day) (hour/ns)
Performance: 3.901 6.152
GROMACS reminds you: "Go back to the rock from under which you came" (Fiona Apple)
Segmentation fault (コアダンプ)
Using ani2x_cuaev.pt
(libtorch271) $ sed 's/ani2x.pt/ani2x_cuaev.pt/g' md.mdp > md_cuaev.mdp (libtorch271) $ gmx grompp -f md_cuaev.mdp -c conf.gro -p topol.top -o md_cuaev.tpr RuntimeError: AssertionError: cuaev currently does not support PBC
Using ani2x_nnpops.pt
(libtorch271) $ sed 's/ani2x.pt/ani2x_nnpops.pt/g' md.mdp > md_nnpops.mdp
(libtorch271) $ gmx grompp -f md_nnpops.mdp -c conf.gro -p topol.top -o md_nnpops.tpr -maxwarn 1
(libtorch271) $ gmx mdrun -v -deffnm md_nnpops -ntmpi 1 -ntomp 1
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 1 MPI rank
Activity: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
--------------------------------------------------------------------------------
Neighbor search 1 1 51 0.149 0.447 2.0
Launch PP GPU ops. 1 1 9951 0.206 0.618 2.8
Force 1 1 5001 0.019 0.058 0.3
NN potential 1 1 5001 6.717 20.153 89.8
PME GPU mesh 1 1 5001 0.144 0.432 1.9
Wait GPU NB local 1 1 5001 0.001 0.004 0.0
Wait GPU state copy 1 1 10256 0.012 0.036 0.2
NB X/F buffer ops. 1 1 51 0.001 0.003 0.0
Write traj. 1 1 51 0.063 0.190 0.8
Kinetic energy 1 1 101 0.004 0.013 0.1
Rest 0.159 0.477 2.1
--------------------------------------------------------------------------------
Total 7.476 22.430 100.0
--------------------------------------------------------------------------------
Breakdown of PME mesh activities
--------------------------------------------------------------------------------
Wait PME GPU gather 1 1 5001 0.001 0.003 0.0
Reduce GPU PME F 1 1 5001 0.001 0.002 0.0
Launch PME GPU ops. 1 1 45009 0.133 0.400 1.8
--------------------------------------------------------------------------------
Performanceの比較検証
GROMACS 2025.2
入力データとしてADH cubicのテストデータ(原子数134,177)を使用。CPU, Intel Core i9-9980XE 3.00GHz; GPU, NVIDIA TITAN-RTX。
CPU only (8 cores)
$ gmx mdrun -deffnm adh_cubic_test2025.2 -pin on -resethway -noconfout -ntmpi 1 -ntomp 8 -bonded cpu -update cpu -nb cpu -pme cpu
starting mdrun 'NADP-DEPENDENT ALCOHOL DEHYDROGENASE in water'
10000 steps, 20.0 ps.
step 5000: resetting all time and cycle counters
Core t (s) Wall t (s) (%)
Time: 446.183 55.773 800.0
(ns/day) (hour/ns)
Performance: 15.494 1.549
1 GPU + 8 CPU
$ gmx mdrun -deffnm adh_cubic_test2025.2 -pin on -resethway -noconfout -ntmpi 1 -ntomp 8 -bonded gpu -update gpu -nb gpu -pme gpu -nsteps 20000
1 GPU selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
PP:0,PME:0
PP tasks will do (non-perturbed) short-ranged and most bonded interactions on the GPU
PP task will update and constrain coordinates on the GPU
PME tasks will do all aspects on the GPU
Using 1 MPI thread
Using 8 OpenMP threads
starting mdrun 'NADP-DEPENDENT ALCOHOL DEHYDROGENASE in water'
20000 steps, 40.0 ps.
step 10000: resetting all time and cycle counters
Core t (s) Wall t (s) (%)
Time: 71.585 8.950 799.8
(ns/day) (hour/ns)
Performance: 193.089 0.124