The nequip
workflow
The nequip
workflow has three steps:
Training:
nequip-train
Testing:
nequip-train
Deploying:
nequip-deploy
Using deployed models: Integrations
Training
The core command in nequip
is nequip-train
, which takes in a YAML config file defining the dataset(s), model, and training hyperparameters, and then runs (or restarts) a training session. Hydra is used to manage the config files, and so many of the features and tricks from Hydra can be used if desired. nequip-train
can be called as follows.
$ nequip-train -cp full/path/to/config/directory -cn config_name.yaml
Note that the flags -cp
and -cn
refer to the “config path” and “config name” respectively and are features of hydra’s command line flags. If one runs nequip-train
in the same directory where the config file is located, the -cp
part may be omitted. Note also that the full path is usually required if one uses -cp
. Users who seek further configurability (e.g. using relative paths, multiple config files located in different directories, etc) are directed to the “command line flags” link to learn more.
Under the hood, the Hydra config utilities and the PyTorch Lightning framework are used to facilitate training and testing in the NequIP infrastructure. One can think of the config as consisting of a set of classes to be instantiated with user-given parameters to construct objects required for training and testing to be performed. Hence, the API of these classes form the central source of truth in terms of what configurable parameters there are. These classes could come from
torch
in the case of optimizers and learning rate scheduler, orLightning
such as Lightning’s trainer or Lightning’s native callbacks, ornequip
itself such as the various DataModules, custom callbacks, etc
Users are advised to look at configs/tutorial.yaml
to understand how the config file is structured, and then to look up what each of the classes do and what parameters they can take (be they on torch
, Lightning
or nequip
’s docs). The documentation for nequip
native classes can be found under Python API.
Checkpointing behavior is controlled by Lightning
and configuring it is the onus of the user. Checkpointing can be controlled by flags in Lightning’s trainer and can be specified even further with Lightning’s ModelCheckpoint callback.
One can continue training from a checkpoint file with the following command
nequip-train -cp full/path/to/config/directory -cn config_name.yaml ++ckpt_path='path/to/ckpt_file'
where we have used Hydra’s override syntax (++
). Note how one must still specify the config file used. Training from a checkpoint will always use the model from the checkpoint file, but other training hyperparameters (dataset, loss, metrics, callbacks, etc) is determined by the config file passed in the restart nequip-train
(and can therefore be different from that of the original config used to generate the checkpoint).
Note that the working directories are managed by Hydra, and users can configure how these directories behave, as well as pass these directories to Lightning
objects (e.g. so that model checkpoints are saved in the Hydra generated directories). Visit Hydra’s output/working directory page to learn more.
Testing
Testing is also performed with nequip-train
by adding test
to the list of run
parameters in the config. Testing requires test dataset(s) to be defined with the DataModule
defined by the data
key in the config.
There are two main ways users can use test
.
One can have testing be done automatically after training in the same
nequip-train
session by specifyingrun: [train, test]
in the config. Thetest
phase will use thebest
model checkpoint from thetrain
phase.One can run tests from a checkpoint file by having
run: [test]
in the config and using the same command as restarts in the command line, that is,
nequip-train -cp full/path/to/config/directory -cn config_name.yaml ++ckpt_path='path/to/ckpt_file'
One can use nequip.train.callbacks.TestTimeXYZFileWriter
(see API) as a callback to have .xyz
files written with the predictions of the model on the test dataset(s). (This is the replacement for the role nequip-evaluate
served before nequip
version 0.7.0
)
Deploying
Once you have trained a model, you must deploy it to create an archive of its trained parameters and metadata that can be used for simulations and other calculations:
nequip-deploy build -ckpt_path path/to/ckpt_file -out_file path/to/deployed_model
One can inspect the deployed model with the following command
nequip-deploy info path/to/deployed_model
Using deployed models
…to run simulations and other calculations
There are many ways a deployed model can be used. Most often it can be used for moelcular dynamics and other calculations in LAMMPS. For integrations with other codes and simulation engines, see Integrations.
…at a low level in your own code
While LAMMPS, or other integrations should be sufficient for the vast majority of usecases, deployed models can also be loaded as PyTorch TorchScript models to be called in your own code:
import torch
import ase.io
from nequip.data import AtomicData, AtomicDataDict
from nequip.scripts.deploy import load_deployed_model, R_MAX_KEY
device = "cpu" # "cuda" etc.
model, metadata = load_deployed_model(
"path_to_deployed_model.pth,
device=device,
)
# Load some input structure from an XYZ or other ASE readable file:
data = AtomicData.from_ase(ase.io.read("example_input_structure.xyz"), r_max=metadata[R_MAX_KEY])
data = data.to(device)
out = model(AtomicData.to_AtomicDataDict(data))
print(f"Total energy: {out[AtomicDataDict.TOTAL_ENERGY_KEY]}")
print(f"Force on atom 0: {out[AtomicDataDict.FORCE_KEY][0]}")