Example: Parameter Estimation 1

Estimation under square loss

To ease the estimation process when given data, a separate module ode_loss has been constructed for observations coming from a single state. We demonstrate how to do it via two examples, first, a standard SIR model, then the Legrand SEIHFR model from [Legrand2007] used for Ebola in estimate2.

SIR Model

We set up an SIR model as seen previously in sir.

In [1]: from pygom import SquareLoss, common_models

In [2]: import numpy

In [3]: import scipy.integrate

In [4]: import matplotlib.pyplot

In [5]: # Again, standard SIR model with 2 parameter.  See the first script!

In [6]: # define the parameters

In [7]: paramEval = [('beta',0.5), ('gamma',1.0/3.0)]

In [8]: # initialize the model

In [9]: ode = common_models.SIR(paramEval)

and we assume that we have perfect information about the \(R\) compartment.

In [10]: x0 = [1, 1.27e-6, 0]

In [11]: # Time, including the initial time t0 at t=0

In [12]: t = numpy.linspace(0, 150, 1000)

In [13]: # Standard.  Find the solution.

In [14]: solution = scipy.integrate.odeint(ode.ode, x0, t)

In [15]: y = solution[:,1:3].copy()

Initialize the class with some initial guess

In [16]: # our initial guess

In [17]: theta = [0.2, 0.2]

In [18]: objSIR = SquareLoss(theta, ode, x0, t[0], t[1::], y[1::,:], ['I','R'])

Note that we need to provide the initial values, \(x_{0}\) and \(t_{0}\) differently to the observations \(y\) and the corresponding time \(t\). Additionally, the state which the observation lies needs to be specified. Either a single state, or multiple states are allowed, as seen above.

Difference in gradient

We have provided two different ways of obtaining the gradient, these are explained in gradient in a bit more detail. First, lets see how similar the output of the two methods are

In [19]: objSIR.sensitivity()
Out[19]: array([-0.95621274,  0.87448359])

In [20]: objSIR.adjoint()
Out[20]: array([-0.95498053,  0.87325191])

and the time required to obtain the gradient for the SIR model under \(\theta = (0.2,0.2)\), previously entered.

In [21]: %timeit objSIR.sensitivity()
18.2 ms +- 315 us per loop (mean +- std. dev. of 7 runs, 100 loops each)

In [22]: %timeit objSIR.adjoint()
653 ms +- 18.2 ms per loop (mean +- std. dev. of 7 runs, 1 loop each)

Obviously, the amount of time taken for both method is dependent on the number of observations as well as the number of states. The effect on the adjoint method as the number of observations differs can be quite evident. This is because the adjoint method is under a discretization which loops in Python where as the forward sensitivity equations are solved simply via an integration. As the number of observation gets larger, the affect of the Python loop becomes more obvious.

Difference in gradient is larger when there are less observations. This is because the adjoint method use interpolations on the output of the ode between each consecutive time points. Given solution over the same length of time, fewer discretization naturally leads to a less accurate interpolation. Note that the interpolation is currently performed using univaraite spline, due to the limitation of python packages. Ideally, one would prefer to use an (adaptive) Hermite or Chebyshev interpolation. Note how we ran the two gradient functions once before timing it, that is because we only find the properties (Jacobian, gradient) of the ode during runtime.

Optimized result

Then standard optimization procedures with some suitable initial guess should yield the correct result. It is important to set the boundaries for compartmental models as we know that all the parameters are strictly positive. We put a less restrictive inequality here for demonstration purpose.

In [23]: # what we think the bounds are

In [24]: boxBounds = [(0.0,2.0),(0.0,2.0)]

Then using the optimization routines in scipy.optimize, for example, the SLSQP method with the gradient obtained by forward sensitivity.

In [25]: from scipy.optimize import minimize

In [26]: res = minimize(fun=objSIR.cost,
   ....:                jac=objSIR.sensitivity,
   ....:                x0=theta,
   ....:                bounds=boxBounds,
   ....:                method='SLSQP')
   ....: 

In [27]: print(res)
     fun: 6.107148148593866e-07
     jac: array([ 0.16105372, -0.17475277])
 message: 'Optimization terminated successfully.'
    nfev: 18
     nit: 10
    njev: 10
  status: 0
 success: True
       x: array([0.5000679 , 0.33337421])

Other methods available in scipy.optimize.minimize() can also be used, such as the L-BFGS-B and TNC. We can also use methods that accepts the exact Hessian such as trust-ncg but that should not be necessary most of the time.