.. _tutorial_krr_julia:

Kernel ridge regression in Julia
=================================

Here we show how to use kernel ridge regression (KRR) in MLatom to train and use your model. This function is based on our implementation in `Julia <https://julialang.org/>`_.

Prerequisites
---------------

- ``MLatom 3.18.0`` or later
- ``Julia 1.10.2`` (Julia v1.4+ should work)
- ``julia 0.6.2`` (this is a Python package)

.. warning::

    PyCall.jl in Julia does not support Python 3.12 (June 28th, 2025). It may support later. You can use Python 3.11 instead.

Apparently, you should have Julia installed in your local environment. You can download it easily from `its website <https://julialang.org/downloads/>`_ and install it following the `instructions <https://julialang.org/downloads/platform/>`_.

After you successfully dowanload and configure Julia, you can enter ``julia`` in the command line and open the interactive session. 

.. code-block:: 

                   _
       _       _ _(_)_     |  Documentation: https://docs.julialang.org
      (_)     | (_) (_)    |
       _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
      | | | | | | |/ _` |  |
      | | |_| | | | (_| |  |  Version 1.10.2 (2024-03-01)
     _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
    |__/                   |
    
    julia> 

.. note:: 

    As you see, we are using Julia 1.10.2. But to make PyJulia work, you should have at least Julia v1.4+.


If you have no knowledge about Julia, it's totally OK, because we call Julia functions from Python. Before that, we need to install such a bridge in Python:

.. code-block:: bash

    pip install julia==0.6.2

Then in Python, run the following codes:

.. code-block:: python

    import julia 
    julia.install()

But things are not done yet. The Python interpreter is statically linked to libpython, but PyJulia does not fully support such Python interpreter. We recommend using ``python-jl`` to run your script:

.. code-block:: bash 

    python-jl your_script.py

If it works well, you will be able to run the following codes:

.. code-block:: python 
    
    from julia import Base 
    print(Base.sind(90))


Data and models
-------------------

Here, you can download dataset and models that will be used in the following tutorials.

- :download:`E_FCI_451.dat <tutorial_files/tutorial_krr_julia/E_FCI_451.dat>`
- :download:`E_UHF_451.dat <tutorial_files/tutorial_krr_julia/E_UHF_451.dat>`
- :download:`R_451.dat <tutorial_files/tutorial_krr_julia/R_451.dat>`

ML database
--------------

We introduce a new data class called ``ml_database`` You can save x and y values in it. The ``ml.models.krr`` will use ``ml_database.x`` and ``ml_database.y`` to train the model.

Here are the examples on how to deal with this class.

.. code-block:: python 

    import mlatom as ml 

    mlDB = ml.data.ml_database()
    mlDB.read_from_x_file(filename="R_451.dat") # This will be saved in mlDB.x
    mlDB.read_from_y_file(filename="E_FCI_451.dat",property_name="fci")
    mlDB.read_from_y_file(filename="E_UHF_451.dat",property_name="uhf")
    mlDB.y = mlDB.get_property("fci")

Train a KRR model
-------------------

After loading the ML database, we can train a model.

.. code-block:: python 

    model = ml.models.krr(model_file="E_FCI.npz",kernel_function='Gaussian',ml_program='mlatom.jl') # The model will be saved in E_FCI.npz. Gaussian function is used as the kernel function.
    model.hyperparameters['sigma'].value = 4.0 
    model.hyperparameters['lambda'].value = 0.00001
    model.train(ml_database=mlDB,prior=-1.0) # Set the prior value. The prior will be subtracted from each y value before training (and added back after prediction).

    # Predict values with the model
    model.predict(ml_database=mlDB,property_to_predict='fci_yest')

Now we support several kernel functions:

- Gaussian kernel function: :math:`k(\textbf{x}_1,\textbf{x}_2) = \text{exp}(-\lvert \textbf{x}_1-\textbf{x}_2 \rvert ^{2} / (2\sigma ^{2}))`
- Periodic Gaussian kernel function: :math:`k(\textbf{x}_1,\textbf{x}_2) = \text{exp}(-2.0(\text{sin} (\pi \lvert \textbf{x}_1-\textbf{x}_2 \rvert))^2 / \sigma ^2)`
- Decaying periodic Gaussian kernel function: :math:`k(\textbf{x}_1,\textbf{x}_2) = \text{exp} (-\lvert \textbf{x}_1-\textbf{x}_2 \rvert ^{2} / (2\sigma ^{2}) - 2.0(\text{sin} (\pi \lvert \textbf{x}_1-\textbf{x}_2 \rvert))^2 / \sigma_{p} ^2)`
- Matern kernel function: :math:`k(\textbf{x}_1,\textbf{x}_2) = \text{exp}(-\lvert \textbf{x}_1-\textbf{x}_2 \rvert / \sigma)  \sum \limits _{k=0} ^{n} \frac {(n+k)!} {(2n)!}  \begin{pmatrix} n \\ k \end{pmatrix} (2\lvert \textbf{x}_1-\textbf{x}_2 \rvert / \sigma)^{n-k}`

Hyperparameters of these kernel functions:

.. table::
    :align: center 

    ============================= ================================== ====================================
     kernel function               how to call in MLatom              hyperparameters
    ============================= ================================== ====================================
     Gaussian                      ``'Gaussian'``                     ``sigma``
     Periodic Gaussian             ``'periodic_Gaussian'``            ``sigma``, ``period``
     Decaying periodic Gaussian    ``'decaying_periodic_Gaussian'``   ``sigma``, ``period``, ``sigmap``
     Matern                        ``'Matern'``                       ``sigma``, ``nn``
    ============================= ================================== ====================================

Default values of hyperparameters:

.. table:: 
    :align: center

    ================== =============== =======================================
     hyperparameters    default value   description
    ================== =============== =======================================
     ``sigma``          ``1.0``         sigma
     ``period``         ``2E-35``       period
     ``sigmap``         ``100.0``       sigmap
     ``nn``             ``1.0``         nn
     ``lambda``         ``2``           regularization parameter of KRR
    ================== =============== =======================================

Load a KRR model
------------------

If the model file exists, it will be loaded automatically during the initialization of the instance.

.. code-block:: python 

    import mlatom as ml 

    mlDB = ml.data.ml_database()
    mlDB.read_from_x_file(filename="R_451.dat")
    model = ml.models.krr(model_file='E_FCI.npz',ml_program='mlatom.jl')
    model.predict(ml_database=mlDB,property_to_predict='fci_yest')

Optimize hyperparameters
-------------------------

Below, we show how to optimize hyperparameters using grid search.

.. code-block:: python 

    import mlatom as ml 
    mlDB = ml.data.ml_database()
    mlDB.read_from_x_file(filename="R_451.dat") # This will be saved in mlDB.x
    mlDB.read_from_y_file(filename="E_FCI_451.dat",property_name="fci")
    mlDB.y = mlDB.get_property("fci")

    # Optimize hyperparameters
    model = ml.models.krr(model_file="E_FCI_grid_search.npz",kernel_function='Gaussian',ml_program='mlatom.jl')
    model.hyperparameters['sigma'].minval = 2**-5              # Set the minimum value of sigma
    model.hyperparameters['sigma'].maxval = 2**9               # Set the maximum value of sigma
    model.hyperparameters['sigma'].optimization_space = 'log'  
    model.hyperparameters['lambda'].minval = 2**-35            # Set the minimum value of lambda
    model.hyperparameters['lambda'].maxval = 1.0               # Set the maximum value of lambda
    model.hyperparameters['lambda'].optimization_space = 'log'

    sub,val = mlDB.split(number_of_splits=2, fraction_of_points_in_splits=[0.9, 0.1], sampling='none')
    model.optimize_hyperparameters(subtraining_ml_database=sub,validation_ml_database=val,
                                optimization_algorithm='grid',
                                hyperparameters=['lambda','sigma'],
                                training_kwargs={'prior':-1.0},
                                prediction_kwargs={'property_to_predict':'yest'},debug=False)

    model.predict(ml_database=mlDB,property_to_predict='fci_yest')

Multi-outputs
--------------

The model can be trained on multiple y values at the same time. Below we show how to train a model on both FCI and UHF energies.

.. code-block:: python 

    import mlatom as ml

    mlDB = ml.data.ml_database()
    mlDB.read_from_x_file(filename="R_451.dat")
    mlDB.read_from_y_file(filename="E_FCI_451.dat",property_name="fci")
    mlDB.read_from_y_file(filename="E_UHF_451.dat",property_name="uhf")
    Ntrain = len(mlDB.y)
    mlDB.y = np.concatenate((mlDB.get_property('fci').reshape((Ntrain,1)),mlDB.get_property('uhf').reshape((Ntrain,1))),axis=1)

    model = ml.models.krr(model_file="E_FCI_multi_output.npz",kernel_function='Gaussian',ml_program='mlatom.jl')

    # Optimize hyperparameters
    model.hyperparameters['sigma'].minval = 2**-5              # Set the minimum value of sigma
    model.hyperparameters['sigma'].maxval = 2**9               # Set the maximum value of sigma
    model.hyperparameters['sigma'].optimization_space = 'log'  
    model.hyperparameters['lambda'].minval = 2**-35            # Set the minimum value of lambda
    model.hyperparameters['lambda'].maxval = 1.0               # Set the maximum value of lambda
    model.hyperparameters['lambda'].optimization_space = 'log'

    sub,val = mlDB.split(number_of_splits=2, fraction_of_points_in_splits=[0.9, 0.1], sampling='none')
    model.optimize_hyperparameters(subtraining_ml_database=sub,validation_ml_database=val,
                                optimization_algorithm='grid',
                                hyperparameters=['lambda','sigma'],
                                training_kwargs={'prior':-1.0},
                                prediction_kwargs={'property_to_predict':'yest'},debug=False)

    model.predict(ml_database=mlDB,property_to_predict='fci_uhf_yest')