Allegro Model

../_images/allegro_arch.png

Allegro hyperparameters

The core hyperparameters of the Allegro model include

  • r_max is the cutoff radius used for the strictly local Allegro model.

  • l_max governs the angular resolution of the tensor features. Reasonable l_max values to try include 1, 2, and 3, where raising l_max may improve accuracy at the cost of speed. NOTE that the computational cost does not increase linearly with l_max, but rather scales polynomially due to the \(O(\ell_\text{max}^6)\) scaling of the Clebsch-Gordan tensor products. Raising l_max will also increase the number of tensor paths taken, and the growth of paths is also tied to the choice of num_layers.

  • num_layers is the number of Allegro layers, which corresponds to the body-ordering (num_layers=1 corresponds to three-body tensor features, num_layers=2 corresponds to four-body tensor features, and so on). A Clebsch-Gordan tensor product occurs at each Allegro layer. It is usually appropriate to use 1, 2, or 3 layers.

  • num_scalar_features and num_tensor_features correspond to the number of scalar track channels and tensor track channels repsectively. They are separate parameters to set in the Allegro model because of its two-track system. It is often useful to keep num_tensor_features small and try to raise num_scalar_features to improve the learning capacity of the model. For num_scalar_features, 16, 32, 64, 128, 256 are good options to try depending on the dataset. For num_tensor_features, 8, 16, 32, 64 are good options to try depending on the dataset.

  • Each Allegro layer contains a neural network or multilayer perceptron (MLP). These are governed by the allegro_mlp parameters. allegro_mlp_hidden_layers_depth is the depth and is defaulted to 1. One could try reducing it to 0 (making the MLP a linear layer), or raising it to 2 or 3 (it is unhelpful to go beyond 3). allegro_mlp_hidden_layers_width is the width and can be defaulted to a value that is a multiple of 16 or 32 (for performance) and larger than num_scalar_features (maybe 2 to 4 times as large to start). allegro_mlp_nonlinearity is silu by default (which is recommended).

The above core hyperparameters are the most important to set correctly for most use cases. The following are some advanced hyperparameters that can be tuned, but is discouraged unless one is comfortable with the process of hyperparameter tuning.

  • The initial scalar embedding has some level of configurability. For radial_chemical_embed, refer to the TwoBodyBesselScalarEmbed() below as a starting point (it’s usually fine to just use the defaults). The output features of the radial-chemical embedding module is then put through a scalar embedding MLP. radial_chemical_embed_dim is the dimensionality of the feature vector output by the radial-chemical module, that is used as input to the scalar embedding MLP. It is typical to make radial_chemical_embed_dim default to num_scalar_features, but it can be tuned to other values. scalar_embed_mlp_hidden_layers_depth is defaulted to 1. scalar_embed_mlp_hidden_layers_width can also be defaulted to num_scalar_features, and tuned as desired. scalar_embed_mlp_nonlinearity is defaulted to silu.

  • parity determines whether to use the full set of allowed irreps (i.e. the default behavior of true), or to use a set restricted to spherical harmonic irreps (i.e. the false option).

  • tp_path_channel_coupling determines whether the tensor product weights couple paths and channels or not. The default of true is expected to be more expressive.

  • After all the Allegro layers comes the readout MLP to energy predictions. These are governed by the readout_mlp parameters. The following default behavior is recommended as a starting point, but can be tuned as desired. readout_mlp_hidden_layers_depth is defaulted to 1. readout_mlp_hidden_layers_width can be defaulted to num_scalar_features. readout_mlp_nonlinearity is defaulted to silu.

API

allegro.model.AllegroModel(l_max: int, parity: bool = True, **kwargs)[source]

Allegro model that predicts energies and forces (and stresses if cell is provided).

Parameters:
  • seed (int) – seed for reproducibility

  • model_dtype (str) – float32 or float64

  • r_max (float) – cutoff radius

  • per_edge_type_cutoff (Dict) – one can optionally specify cutoffs for each edge type [must be smaller than r_max] (default None)

  • type_names (Sequence[str]) – list of atom type names

  • l_max (int) – maximum order \(\ell\) to use in spherical harmonics embedding, 1 is baseline (fast), 2 is more accurate, but slower, 3 highly accurate but slow

  • parity (bool) – whether to include features with odd mirror parity (default True)

  • radial_chemical_embed – an Allegro-compatible two-body radial-chemical embedding module, e.g. allegro.nn.TwoBodyBesselScalarEmbed

  • two_body_mlp_hidden_layers_depth (int) – number of hidden layers of two-body MLP (default 1)

  • two_body_mlp_hidden_layers_width (int) – depth of hidden layers of two-body MLP

  • two_body_mlp_nonlinearity (str) – silu, mish, gelu, or None (default silu)

  • scalar_embed_output_dim (int) – output dimension of the scalar embedding module (default None will use num_scalar_features)

  • num_layers (int) – number of Allegro layers

  • num_scalar_features (int) – multiplicity of scalar features in the Allegro layers

  • num_tensor_features (int) – multiplicity of tensor features in the Allegro layers

  • allegro_mlp_hidden_layers_depth (int) – number of hidden layers in the Allegro scalar MLPs (default 1)

  • allegro_mlp_hidden_layers_width (int) – width of hidden layers in the Allegro scalar MLPs (reasonable to set it to be the same as num_scalar_features)

  • allegro_mlp_nonlinearity (str) – silu, mish, gelu, or None (default silu)

  • tp_path_channel_coupling (bool) – whether Allegro tensor product weights couple the paths with the channels or not, True is expected to be more expressive than False (default True)

  • readout_mlp_hidden_layers_depth (int) – number of hidden layers in the readout MLP (default 1)

  • readout_mlp_hidden_layers_width (int) – width of hidden layers in the readout MLP (reasonable to set it to be the same as num_scalar_features)

  • readout_mlp_nonlinearity (str) – silu, mish, gelu, or None (default silu)

  • avg_num_neighbors (float/Dict[str, float]) – used to normalize edge sums for better numerics (default None)

  • per_type_energy_scales (float/List[float]) – per-atom energy scales, which could be derived from the force RMS of the data (default None)

  • per_type_energy_shifts (float/List[float]) – per-atom energy shifts, which should generally be isolated atom reference energies or estimated from average pre-atom energies of the data (default None)

  • per_type_energy_scales_trainable (bool) – whether the per-atom energy scales are trainable (default False)

  • per_type_energy_shifts_trainable (bool) – whether the per-atom energy shifts are trainable (default False)

  • pair_potential (torch.nn.Module) – additional pair potential term, e.g. :class:nequip.nn.pair_potential.ZBL (default None)

  • do_derivatives (bool) – whether to compute forces and stresses via autograd (default True)

allegro.nn.TwoBodyBesselScalarEmbed(type_names: Sequence[str], num_bessels: int = 8, bessel_trainable: bool = False, polynomial_cutoff_p: int = 6, module_output_dim: int = 64, forward_weight_init: bool = True, scalar_embed_field: str = 'edge_embedding', irreps_in=None) SequentialGraphNetwork[source]

Two-body Bessel scalar embedding.

The radial edge lengths are encoded with a Bessel basis, which is then projected to two_body_embedding_dim. The center-neighbor atom types are embedded with weights to the same two_body_embedding_dim. The radial embedding and center-neighbor type embedding are multiplied.

This module can be used for the scalar_embed argument of the AllegroModel in the config as follows.

model:
  _target_: allegro.model.AllegroModel
  # other Allegro model parameters
  scalar_embed:
    _target_: allegro.nn.TwoBodyBesselScalarEmbed
    num_bessels: 8
    bessel_trainable: false
    polynomial_cutoff_p: 6
Parameters:
  • num_bessels (int) – number of Bessel basis functions (default 8)

  • bessel_trainable (int) – whether Bessel roots are trainable (default False)

  • polynomial_cutoff_p (int) – p-exponent used in polynomial cutoff function, smaller p corresponds to stronger decay with distance (default 6)

class allegro.nn.TwoBodySplineScalarEmbed(type_names: Sequence[str], num_splines: int = 16, spline_span: int = 12, module_output_dim: int = 64, forward_weight_init: bool = True, scalar_embed_field: str = 'edge_embedding', edge_type_field: str = 'edge_type_flat', norm_length_field: str = 'normed_edge_lengths', irreps_in=None)[source]

Two-body spline scalar embedding.

This module can be used for the scalar_embed argument of the AllegroModel in the config as follows.

model:
  _target_: allegro.model.AllegroModel
  # other Allegro model parameters
  scalar_embed:
    _target_: allegro.nn.TwoBodySplineScalarEmbed
    num_splines: 16
    spline_span: 12
Parameters:
  • num_splines (int) – number of spline basis functions

  • spline_span (int) – number of spline basis functions that overlap on spline grid centers