OMNI-P2x

OMNI-P2x is the first universal excited-state potential, see our preprint:

Details are in the paper:

Note

We are preparing release of MLatom with OMNI-P2x, which is currently available via add-ons.

Model weights and additional scripts are available at https://github.com/dralgroup/omni-p2x.

Tutorials below are prepared by Mikołaj Martyka with modifications by Pavlo O. Dral.

Contents:

UV/vis spectra simulations with OMNI-P2x

Single-point convolution

Files for download:

Nuclear-ensemble approximation via fine-tuning

Files for download:

Active Transfer Learning for Nonadiabatic Molecular Dynamics with OMNI-P2x

In this section:

Overview

This tutorial demonstrates how to set up and run an active learning (AL) loop for nonadiabatic molecular dynamics (NAMD) simulations in MLatom, using the pre-trained universal neural network potential OMNI-P2x as a starting point. By fine-tuning OMNI-P2x on system-specific reference data generated on-the-fly, you can obtain a production-ready machine learning potential for trajectory surface hopping (TSH) simulations with significantly less data and wall-clock time than training from scratch.

We use fulvene with CASSCF(6,6) as a working example, but the procedure is general: you can use any excited-state electronic structure method available in MLatom (or interface your own) as the reference.

Background

Nonadiabatic molecular dynamics (NAMD) via trajectory surface hopping (TSH) is the most widely used technique for simulating photochemical processes in real time. However, it requires computing excited-state energies and gradients at every time step for hundreds of trajectories, often amounting to hundreds of thousands of electronic structure calculations per simulation. Machine learning potentials (MLPs) can break through this bottleneck, but learning the dense manifold of excited-state PESs, with small interstate gaps and complex topography near conical intersections, remains a fundamental challenge. Previous active learning strategies for ML-TSH often required manual adjustment of sampling criteria or additional interventions, limiting their practical adoption.

The ATL-NAMD protocol in MLatom addresses this with a fully automated active learning loop combining a physics-informed multi-state ML model with gap-driven dynamics that bias sampling towards conical intersection regions, and uncertainty quantification that monitors the quality of each adiabatic surface and hopping probability. This tutorial demonstrates how to further accelerate convergence by fine-tuning from the pre-trained OMNI-P2x universal potential, whose weights encode excited-state PES knowledge across chemical space, leading to up to 3x faster convergence and improved stability compared to training from scratch.

Prerequisites

  • MLatom >= 3.22.0 (install or upgrade with pip install --upgrade mlatom)

  • PyTorch with CUDA support (recommended; CPU-only is possible but much slower)

  • A reference electronic structure program interfaced with MLatom. In this example, we use COLUMBUS for CASSCF calculations. For a list of available excited-state methods and their interfaces, see the MLatom excited-state tutorials. If you need to interface your own code, contact us and we will be happy to help.

  • Download the archive with all input/output files for this tutorial for fulvene zipped folder, and extract it to your working directory:

    • Sample input script (see also below) AL_ft_fulvene.py.

    • Sample output log AL_fulvene.out from a run of the script above, which you can analyze and compare with your own runs.

    • An equilibrium geometry, with computed frequencies, of your molecule saved as eqmol.json in MLatom’s JSON format. See the tutorial on frequencies and thermochemistry.

    • Input files for your reference method (here, the contents of a columbus/ directory with CASSCF inputs).

    • The OMNI-P2x pre-trained weights file (e.g. OMNIP2x_CV_1.pt), taken from the OMNI-P2x repository.

Input script

Below is the complete input script used in this tutorial. We will walk through each section.

import mlatom as ml
import os
import torch

torch.cuda.empty_cache()
root_dir = os.getcwd()

# 1. Reference method setup
col = ml.methods(
        program='columbus',
        command_line_arguments=['-m', '1700'],
        directory_with_input_files=f'{root_dir}/columbus',
        save_files_in_current_directory=False
)

model_predict_kwargs = {
        'nstates': 2,
        'current_state': 0,
        'calculate_energy_gradients': [True] * 2
}

# 2. Load equilibrium molecule
eqmol = ml.molecule.load('eqmol.json', format='json')

# 3. Sampler settings for NAMD trajectories
init_sampler_kwargs = {
        'eqmol': eqmol,
        'initial_temperature': 0,
        'number_of_initial_conditions': 50,
}

sampler_kwargs = {
        'maximum_propagation_time': 60.0,
        'time_step': 0.1,
        'nstates': 2,
        'initial_state': 1,
        'prevent_back_hop': False,
        'dump_trajectory_interval': 40,
        'reduce_memory_usage': True,
        'min_surfuq_points': 2,
        'ngapmd_trajs': 50,
        'max_hopprobuq_points': 15,
        'prob_uq_threshold': 0.1,
        'plot_populations': False,
        'nthreads': 16,
        'stop_uncertain_md': True,
        'initcond_sampler_kwargs': init_sampler_kwargs,
}

# 4. Initial dataset sampling
init_dataset_sampler_kwargs = {
        'molecule': eqmol,
        'initial_temperature': 0,
        'number_of_initial_conditions': 50,
}

# 5. ML model settings with pre-trained OMNI-P2x weights
ml_model_kwargs = {
        'nstates': 2,
        'gap_weight': 1.0,
        'use_pretrained_model': 'OMNIP2x_CV_1.pt',
}

# 6. Run active learning
ml.al(
        molecule=eqmol,
        reference_method=col,
        ml_model='vecmsani',
        sampler='lzsh',
        initdata_sampler='wigner',
        initdata_sampler_kwargs=init_dataset_sampler_kwargs,
        sampler_kwargs=sampler_kwargs,
        refmethod_kwargs=model_predict_kwargs,
        label_nthreads=16,
        ml_model_kwargs=ml_model_kwargs,
        new_points=300,
)

Walkthrough

1. Reference method

The ml.methods object wraps your electronic structure program. Here we call COLUMBUS for CASSCF(6,6) calculations, with 1700 MB of memory. The directory_with_input_files should point to a directory containing all the COLUMBUS input files. MLatom will call COLUMBUS automatically whenever it needs to label a new geometry.

You can replace this with any method available in MLatom for excited states, for example, ORCA for TD-DFT, Turbomole for ADC2, or the semiempirical AIQM1/MRCI method. The rest of the script stays the same.

The model_predict_kwargs tell MLatom how many states to compute and that energy gradients are needed for both states.

2. Equilibrium geometry

The eqmol.json file contains your molecule’s equilibrium geometry in MLatom’s JSON format. This is used to generate initial conditions (via Wigner sampling) and to define the molecular system.

3. NAMD sampler settings

These parameters control the trajectory surface hopping simulations used during the active learning loop:

  • maximum_propagation_time: 60 fs. Trajectories run for this long, or until the model becomes too uncertain (if stop_uncertain_md is True).

  • time_step: 0.1 fs. A small time step appropriate for fulvene.

  • nstates / initial_state: Two electronic states (S0 and S1), starting in S1.

  • ngapmd_trajs: 50 gap-driven MD trajectories per iteration, which bias sampling towards regions of small energy gaps (near conical intersections), which are the critical regions for the model to properly learn.

  • stop_uncertain_md: When True, trajectories are stopped early if the model’s uncertainty exceeds the threshold. It can reduce sampling time, but the downside is that trajectories will be cut short and you won’t be able to plot populations for the entire desired run time.

  • min_surfuq_points: This is the crucial parameter controlling convergence, the maximum number of points that can be uncertain. Here it is set to 2 out of 50, so a convergence rate of 96%.

  • nthreads: Number of threads for propagation of trajectories in parallel.

4. Initial dataset

Before the first AL iteration, an initial training set must be generated to give the model a reasonable starting point. The initial data is sampled using a statistically-driven procedure described in Hou et al, JCTC. Starting from the equilibrium geometry, conformations are sampled from a harmonic-oscillator Wigner distribution. A ground-state energy-only model is trained using 5-fold cross-validation, and the validation error is monitored. Sampling continues, adding more Wigner-sampled conformations, until the projected accuracy improvement falls below 10%, as estimated by fitting the learning curve. This ensures the initial dataset is large enough to be statistically meaningful, but avoids wasteful oversampling. Other options are available, such as 'one-shot', sampling a specified number of points in one attempt.

The init_dataset_sampler_kwargs control the Wigner sampling parameters: the equilibrium molecule, temperature (0 K here, meaning only zero-point energy contributions), and the number of initial conditions per sampling batch.

5. ML model with transfer learning

This is the key part of the setup. The ml_model_kwargs dictionary configures the neural network potential:

  • 'nstates': 2: the model learns two electronic states.

  • 'gap_weight': 1.0: adds a penalty term to the loss function, with the specified weight, that encourages accurate prediction of the \(\text{S}_{1}\text{-S}_0\) energy gap, which is critical for correct hopping dynamics.

  • 'use_pretrained_model': 'OMNIP2x_CV_1.pt': this is the transfer learning option. Instead of random initial weights, the model starts from the pre-trained OMNI-P2x weights. You provide the path to the weights file. You could also try weights from a previous fine-tuning run (e.g., CASSCF weights from a related molecule), though how well this transfers will depend on the similarity of the systems.

6. The active learning call

ml.al() runs the full active learning loop. At each iteration, it trains (fine-tunes) the ML model on the current dataset, runs ML-NAMD trajectories to probe the dynamics, identifies uncertain geometries, labels them with the reference method, adds them to the training set, and repeats. The loop terminates when the ML-NAMD trajectories are converged, meaning that almost no new uncertain structures are sampled at the latest iterations, which typically translates to the electronic state populations being stable across consecutive iterations.

The new_points=300 parameter sets the maximum number of new training points added per iteration, from different sources.

Understanding the output

The output log reports the progress of each AL iteration. Here is what to look for.

Iteration header and dataset size

Each iteration begins with a header showing the current dataset size:

Active learning iteration 5
Number of points in the labeled data set: 1800

Model accuracy

After training, the RMSE and correlation coefficients for both the main and auxiliary models are printed on the subtraining and validation sets. Look for the main model, validation set RMSE values. In this run, it stabilizes around 0.002 Ha (about 1.3 kcal/mol) after a few iterations.

Trajectory propagation and surface stopping times

During each iteration, 50 ML-TSH trajectories are propagated. For each trajectory, the output reports the number of time steps and the time at which the model’s uncertainty triggered a stop (the “surface stopping time”). Early in the AL loop, trajectories crash almost immediately (stopping times of a few fs), because the model has not yet learned the relevant parts of the PES. As the model improves, stopping times increase towards the maximum_propagation_time of 60 fs.

Convergence check

After each batch of trajectories, the convergence criterion is evaluated:

The model is 88.0 % converged

This measures what fraction of trajectories reached the maximum propagation time without encountering excessive uncertainty. When less than min_surfuq_points are sampled, the model is considered converged. Here, it is 2 points out of 50, so above 96%.

The model is 100.0 % converged
The ML-NAMD trajectories are converged, AL-NAMD converged.

Timing breakdown

Each iteration ends with a timing summary:

Iteration 10 takes 4916.52 s
    Labeling time: 1132.77 s
    Training time: 3552.81 s
    Sampling time: 230.94 s

In this example, the total wall-clock time across all 11 iterations was approximately 10.5 hours on a single 4090 GPU with 16 threads, with labeling (CASSCF calculations) and training (GPU) being the dominant costs.

Example convergence behavior

In this fulvene run, the AL loop converged in 10 iterations (iterations 0–10), using a total of 3300 CASSCF-labeled geometries. The convergence was non-monotonic: iterations 5–6 reached about 88% convergence, but iterations 7–8 dropped back to 0% as the newly added data shifted the model’s behavior, requiring further refinement. This is normal and expected. The AL procedure is designed to recover from such regressions by sampling more data in the problematic regions. Final convergence was achieved at iteration 10.

Analyzing the output

The following script parses the AL output log and produces two diagnostic plots:

  • the maximum convergence rate reached at each iteration

  • the average surface stopping time, which shows how far trajectories propagate before the model becomes uncertain

"""
Parse an AL-NAMD output log and plot:
    1. Maximum convergence rate (%) per AL iteration
    2. Average surface stopping time (fs) per AL iteration
Usage: python plot_al_convergence.py AL_fulvene.out
"""
import re, sys, matplotlib.pyplot as plt

def parse_al_log(path, ntraj=50, max_time=60.0):
        with open(path) as f:
                text = f.read()
        bounds = [(int(m.group(1)), m.start())
                            for m in re.finditer(r'Active learning iteration (\d+)', text)]
        bounds.append((-1, len(text)))

        iters, max_convs, avg_stops = [], [], []
        for i in range(len(bounds) - 1):
                block = text[bounds[i][1]:bounds[i+1][1]]
                convs = [float(m.group(1))
                                   for m in re.finditer(r'The model is ([\d.]+) % converged', block)]
                # Each convergence check has a stopping-times block listing ONLY uncertain stops.
                # Converged trajectories (reaching max_time) are not listed.
                st_blocks = re.findall(
                        r'The surface stopping times are:\n(.*?)The model is', block, re.DOTALL)
                check_avgs = []
                for j, sb in enumerate(st_blocks):
                        uncertain = [float(x) for x in sb.strip().split('\n') if x.strip()]
                        n_uncertain = len(uncertain)
                        n_converged = ntraj - n_uncertain
                        all_times = uncertain + [max_time] * n_converged
                        check_avgs.append(sum(all_times) / len(all_times))
                iters.append(bounds[i][0])
                max_convs.append(max(convs) if convs else 0)
                avg_stops.append(sum(check_avgs) / len(check_avgs) if check_avgs else 0)
        return iters, max_convs, avg_stops

if __name__ == '__main__':
        path = sys.argv[1] if len(sys.argv) > 1 else 'AL_fulvene.out'
        iters, max_convs, avg_stops = parse_al_log(path)

        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
        ax1.plot(iters, max_convs, 'o-', color='#2563eb')
        ax1.set_xlabel('AL iteration')
        ax1.set_ylabel('Max convergence (%)')
        ax1.set_ylim(0, 105)
        ax1.set_title('Convergence rate per iteration')

        ax2.plot(iters, avg_stops, 'o-', color='#dc2626')
        ax2.set_xlabel('AL iteration')
        ax2.set_ylabel('Avg. surface stopping time (fs)')
        ax2.set_title('Average surface stopping time')
        ax2.axhline(60, ls='--', color='gray', label='max propagation time')
        ax2.legend()

        plt.tight_layout()
        plt.savefig('al_convergence.png', dpi=150)
        plt.show()
        print('Saved al_convergence.png')

You can run the script above to generate al_convergence.png from the provided AL_fulvene.out log file, see example below:

AL convergence diagnostics for fulvene

Additional features

Pre-defined uncertainty thresholds

By default, the uncertainty thresholds are calculated automatically, based on statistics on the initial data pool. You can provide them yourself within sampler_kwargs, as a list per-state, or a scalar number for all states.

sampler_kwargs = {
        ...
        'uq_threshold': [0.1, 0.1],  # threshold in Hartree for [S0, S1]
}

This can be useful for tightening the convergence after several iterations.

Providing an initial dataset

Instead of sampling the initial data pool, you can provide it yourself with the following code. Be careful, as this is the dataset that will be used for calculating the thresholds, and it can have a critical impact on accuracy and convergence.

data = ml.data.molecular_database.load('my_existing_data.json', format='json')

ml.al(
        molecule=eqmol,
        initial_dataset=data,
        reference_method=col,
        ...
)

Per-state accuracy breakdown

After upgrading to the latest MLatom version, the training output will include a breakdown of model accuracy (RMSE, correlation coefficient) for each electronic state. This helps diagnose whether the model struggles with a particular state.

Tips and practical advice

  • GPU memory: Since MLatom version ≥ 3.22 release, the ML models are moved from GPU to CPU during NAMD propagation, preventing VRAM exhaustion from creating multiple copies of model weights across trajectories. If you were previously hitting out-of-memory errors, upgrading should resolve this.

  • Choosing new_points: 300 points per iteration is a reasonable default. Fewer points (e.g., 50–100) lead to more iterations but faster individual iterations; more points may be wasteful if many sampled geometries are redundant.

  • Convergence is non-monotonic: Do not be alarmed if convergence drops between iterations. The AL procedure adds data specifically in regions where the model fails, which can temporarily destabilize predictions in other regions. This is by design and the model will recover.

  • Wall-clock time: For this fulvene example with CASSCF(6,6), the total AL procedure took about 10.5 hours. The dominant cost was training on GPU (iterations with larger datasets take longer). Labeling time with the reference method is influenced by label_nthreads. However, fulvene is a very simple photophysical system. More complicated, photoreactive molecules can take a lot longer.

  • Using different pre-trained weights: You are not limited to the OMNI-P2x weights. You can provide any compatible .pt file. For example, if you have already fine-tuned a model for a similar molecule at the same level of theory, those weights might transfer even better.

References

  • OMNI-P2x: Mikołaj Martyka, Xin-Yu Tong, Joanna Jankowska, Pavlo Dral. OMNI-P2x: A Universal Neural Network Potential for Excited-State Simulations. ChemRxiv. First posted: 16 April 2025. DOI: https://doi.org/10.26434/chemrxiv-2025-j207x-v2

  • Active learning for NAMD: Mikołaj Martyka, Lina Zhang, Fuchun Ge, Yi-Fan Hou, Joanna Jankowska*, Mario Barbatti, Pavlo O. Dral. Charting electronic-state manifolds across molecules with multi-state learning and gap-driven dynamics via efficient and robust active learning. npj Comput. Mater. 2025, 11, 132. DOI: https://doi.org/10.1038/s41524-025-01636-z

  • MLatom surface hopping: Lina Zhang, Sebastian V. Pios, Mikołaj Martyka, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen, Lipeng Chen, Joanna Jankowska, Mario Barbatti, and Pavlo O. Dral. MLatom Software Ecosystem for Surface Hopping Dynamics in Python with Quantum Mechanical and Machine Learning Methods. J. Chem. Theory Comput. 2024, 20 (12), 5043–5057. DOI: https://doi.org/10.1021/acs.jctc.4c00468