Parameter Estimation Nrtl Using Unit Model Solution¶

Notebook

Parameter Estimation Using Flash Unit Model¶

In this module, we will be using Pyomo's parmest tool in conjuction with IDAES models for parameter estimation. We demonstrate these tools by estimating the parameters associated with the NRTL property model for a benzene-toluene mixture. The NRTL model has 2 sets of parameters: the non-randomness parameter (alpha_ij) and the binary interaction parameter (tau_ij), where i and j is the pure component species. In this example, we will be only estimate the binary interaction parameter (tau_ij) for a given dataset. When estimating parameters associated with the property package, IDAES provides the flexibility of doing the parameter estimation by just using the state block or by using a unit model with a specified property package. This module will demonstrate parameter estimation by using the flash unit model with the NRTL property package.

We will complete the following tasks:

Set up a method to return an initialized model
Set up the parameter estimation problem using parmest
Analyze the results
Demonstrate advanced features from parmest

Key links to documentation:¶

Inline Exercise: import `ConcreteModel` from Pyomo, `FlowsheetBlock` and `Flash` from IDAES.

In [1]:

# Todo: import ConcreteModel from pyomo.environ
from pyomo.environ import ConcreteModel, value

# Todo: import FlowsheetBlock from idaes.core
from idaes.core import FlowsheetBlock

# Todo: import Flash unit model from idaes.models.unit_models
from idaes.models.unit_models import Flash

In the next cell, we will be importing the parameter block that we will be using in this module and the idaes logger.

In [2]:

from idaes.models.properties.activity_coeff_models.\
    BTX_activity_coeff_VLE import BTXParameterBlock
import idaes.logger as idaeslog

In the next cell, we import parmest from Pyomo and the pandas package. We need pandas as parmest uses pandas.dataframe for handling the input data and the results.

In [3]:

import pyomo.contrib.parmest.parmest as parmest
import pandas as pd

Setting up an initialized model¶

We need to provide a method that returns an initialized model to the parmest tool in Pyomo.

Inline Exercise: Using what you have learned from previous modules, fill in the missing code below to return an initialized IDAES model.

In [4]:

def NRTL_model(data):
    
    #Todo: Create a ConcreteModel object
    m = ConcreteModel()
    
    #Todo: Create FlowsheetBlock object
    m.fs = FlowsheetBlock(dynamic=False)
    

    #Todo: Create a properties parameter object with the following options:
    # "valid_phase": ('Liq', 'Vap')
    # "activity_coeff_model": 'NRTL'
    m.fs.properties = BTXParameterBlock(valid_phase=('Liq', 'Vap'),
                                        activity_coeff_model='NRTL')
    m.fs.flash = Flash(property_package=m.fs.properties)

    # Initialize at a certain inlet condition
    m.fs.flash.inlet.flow_mol.fix(1)
    m.fs.flash.inlet.temperature.fix(368)
    m.fs.flash.inlet.pressure.fix(101325)
    m.fs.flash.inlet.mole_frac_comp[0, "benzene"].fix(0.5)
    m.fs.flash.inlet.mole_frac_comp[0, "toluene"].fix(0.5)

    # Set Flash unit specifications
    m.fs.flash.heat_duty.fix(0)
    m.fs.flash.deltaP.fix(0)

    # Fix NRTL specific variables
    # alpha values (set at 0.3)
    m.fs.properties.\
        alpha["benzene", "benzene"].fix(0)
    m.fs.properties.\
        alpha["benzene", "toluene"].fix(0.3)
    m.fs.properties.\
        alpha["toluene", "toluene"].fix(0)
    m.fs.properties.\
        alpha["toluene", "benzene"].fix(0.3)

    # initial tau values
    m.fs.properties.\
        tau["benzene", "benzene"].fix(0)
    m.fs.properties.\
        tau["benzene", "toluene"].fix(-0.9)
    m.fs.properties.\
        tau["toluene", "toluene"].fix(0)
    m.fs.properties.\
        tau["toluene", "benzene"].fix(1.4)

    # Initialize the flash unit
    m.fs.flash.initialize(outlvl=idaeslog.INFO_LOW)

    # Fix at actual temperature
    m.fs.flash.inlet.temperature.fix(float(data["temperature"]))

    # Set bounds on variables to be estimated
    m.fs.properties.\
        tau["benzene", "toluene"].setlb(-5)
    m.fs.properties.\
        tau["benzene", "toluene"].setub(5)

    m.fs.properties.\
        tau["toluene", "benzene"].setlb(-5)
    m.fs.properties.\
        tau["toluene", "benzene"].setub(5)

    # Return initialized flash model
    return m

Parameter estimation using parmest¶

In addition to providing a method to return an initialized model, the parmest tool needs the following:

List of variable names to be estimated
Dataset with multiple scenarios
Expression to compute the sum of squared errors

In this example, we only estimate the binary interaction parameter (tau_ij). Given that this variable is usually indexed as tau_ij = Var(component_list, component_list), there are 2*2=4 degrees of freedom. However, when i=j, the binary interaction parameter is 0. Therefore, in this problem, we estimate the binary interaction parameter for the following variables only:

fs.properties.tau['benzene', 'toluene']
fs.properties.tau['toluene', 'benzene']

Inline Exercise: Create a list called `variable_name` with the above-mentioned variables declared as strings.

In [5]:

# Todo: Create a list of vars to estimate
variable_name = ["fs.properties.tau['benzene', 'toluene']",
                 "fs.properties.tau['toluene', 'benzene']"]

Pyomo's parmest tool supports the following data formats:

pandas dataframe
list of dictionaries
list of json file names.

Please see the documentation for more details.

For this example, we load data from the csv file BT_NRTL_dataset.csv. The dataset consists of fifty data points which provide the mole fraction of benzene in the vapor and liquid phase as a function of temperature.

In [6]:

# Load data from csv
data = pd.read_csv('BT_NRTL_dataset.csv')

# Display the dataset
display(data)

	temperature	liq_benzene	vap_benzene
0	365.500000	0.480953	0.692110
1	365.617647	0.462444	0.667699
2	365.735294	0.477984	0.692441
3	365.852941	0.440547	0.640336
4	365.970588	0.427421	0.623328
5	366.088235	0.442725	0.647796
6	366.205882	0.434374	0.637691
7	366.323529	0.444642	0.654933
8	366.441176	0.427132	0.631229
9	366.558824	0.446301	0.661743
10	366.676471	0.438004	0.651591
11	366.794118	0.425320	0.634814
12	366.911765	0.439435	0.658047
13	367.029412	0.435655	0.654539
14	367.147059	0.401350	0.604987
15	367.264706	0.397862	0.601703
16	367.382353	0.415821	0.630930
17	367.500000	0.420667	0.640380
18	367.617647	0.391683	0.598214
19	367.735294	0.404903	0.620432
20	367.852941	0.409563	0.629626
21	367.970588	0.389488	0.600722
22	368.000000	0.396789	0.612483
23	368.088235	0.398162	0.616106
24	368.205882	0.362340	0.562505
25	368.323529	0.386958	0.602680
26	368.441176	0.363643	0.568210
27	368.558824	0.368118	0.577072
28	368.676471	0.384098	0.604078
29	368.794118	0.353605	0.557925
30	368.911765	0.346474	0.548445
31	369.029412	0.350741	0.556996
32	369.147059	0.362347	0.577286
33	369.264706	0.362578	0.579519
34	369.382353	0.340765	0.546411
35	369.500000	0.337462	0.542857
36	369.617647	0.355729	0.574083
37	369.735294	0.348679	0.564513
38	369.852941	0.338187	0.549284
39	369.970588	0.324360	0.528514
40	370.088235	0.310753	0.507964
41	370.205882	0.311037	0.510055
42	370.323529	0.311263	0.512055
43	370.441176	0.308081	0.508437
44	370.558824	0.308224	0.510293
45	370.676471	0.318148	0.528399
46	370.794118	0.308334	0.513728
47	370.911765	0.317937	0.531410
48	371.029412	0.289149	0.484824
49	371.147059	0.298637	0.502318

We need to provide a method to return an expression to compute the sum of squared errors that will be used as the objective in solving the parameter estimation problem. For this problem, the error will be computed for the mole fraction of benzene in the vapor and liquid phase between the model prediction and data.

Inline Exercise: Complete the following cell by adding an expression to compute the sum of square errors.

In [7]:

# Create method to return an expression that computes the sum of squared error
def SSE(m, data):
    # Todo: Add expression for computing the sum of squared errors in mole fraction of benzene in the liquid
    # and vapor phase. For example, the squared error for the vapor phase is:
    # (float(data["vap_benzene"]) - m.fs.flash.vap_outlet.mole_frac_comp[0, "benzene"])**2
    expr = ((float(data["vap_benzene"]) -
             m.fs.flash.vap_outlet.mole_frac_comp[0, "benzene"])**2 +
            (float(data["liq_benzene"]) -
             m.fs.flash.liq_outlet.mole_frac_comp[0, "benzene"])**2)
    return expr*1E4

Note: Notice that we have scaled the expression up by a factor of 10000 as the SSE computed here will be an extremely small number given that we are using the difference in mole fraction in our expression. A well-scaled objective will help improve solve robustness when using IPOPT.

We are now ready to set up the parameter estimation problem. We will create a parameter estimation object called pest. As shown below, we pass the method that returns an initialized model, dataset, list of variable names to estimate, and the SSE expression to the Estimator object. tee=True will print the solver output after solving the parameter estimation problem.

In [8]:

# Initialize a parameter estimation object
pest = parmest.Estimator(NRTL_model, data, variable_name, SSE, tee=True)

# Run parameter estimation using all data
obj_value, parameters = pest.theta_est()

Ipopt 3.13.2: 

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt

This version of Ipopt was compiled from source code available at
    https://github.com/IDAES/Ipopt as part of the Institute for the Design of
    Advanced Energy Systems Process Systems Engineering Framework (IDAES PSE
    Framework) Copyright (c) 2018-2019. See https://github.com/IDAES/idaes-pse.

This version of Ipopt was compiled using HSL, a collection of Fortran codes
    for large-scale scientific computation.  All technical papers, sales and
    publicity material resulting from use of the HSL codes within IPOPT must
    contain the following acknowledgement:
        HSL, a collection of Fortran codes for large-scale scientific
        computation. See http://www.hsl.rl.ac.uk.
******************************************************************************

This is Ipopt version 3.13.2, running with linear solver ma27.

Number of nonzeros in equality constraint Jacobian...:    10946
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:     6600

Total number of variables............................:     2950
                     variables with only lower bounds:      150
                variables with lower and upper bounds:      600
                     variables with only upper bounds:        0
Total number of equality constraints.................:     2948
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  6.0671019e+01 5.63e+02 1.08e-04  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1  5.0339335e+00 1.57e+03 7.47e+01  -1.0 1.37e+04    -  9.45e-01 1.00e+00h  1
   2  5.1535704e+00 1.93e+02 4.59e+02  -1.0 5.54e+02  -4.0 9.90e-01 1.00e+00h  1
   3  5.1392848e+00 1.07e+00 3.40e+01  -1.0 6.17e+01  -4.5 9.92e-01 1.00e+00h  1
   4  5.1359488e+00 3.65e+02 2.24e+01  -1.0 8.41e+02    -  1.00e+00 1.00e+00h  1
   5  5.1198699e+00 1.64e+00 1.32e-01  -1.0 3.65e+02    -  1.00e+00 1.00e+00h  1
   6  5.0735545e+00 1.54e+02 1.83e-01  -2.5 3.80e+02    -  9.96e-01 1.00e+00h  1
   7  5.0752210e+00 1.03e+01 5.00e-02  -2.5 9.51e+01    -  1.00e+00 1.00e+00h  1
   8  5.0750012e+00 5.57e-03 2.07e-05  -2.5 2.09e-01    -  1.00e+00 1.00e+00h  1
   9  5.0749679e+00 5.85e-02 7.21e-04  -3.8 8.43e+00    -  1.00e+00 1.00e+00h  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  10  5.0749686e+00 5.59e-04 1.05e-05  -5.7 9.63e-01    -  1.00e+00 1.00e+00h  1
  11  5.0749686e+00 3.98e-08 1.56e-09  -8.6 7.56e-03    -  1.00e+00 1.00e+00h  1

Number of Iterations....: 11

                                   (scaled)                 (unscaled)
Objective...............:   5.0749685783045084e+00    5.0749685783045084e+00
Dual infeasibility......:   1.5648775501801708e-09    1.5648775501801708e-09
Constraint violation....:   1.3843631310512158e-10    3.9843143895268440e-08
Complementarity.........:   2.5074825419922871e-09    2.5074825419922871e-09
Overall NLP error.......:   2.5074825419922871e-09    3.9843143895268440e-08


Number of objective function evaluations             = 12
Number of objective gradient evaluations             = 12
Number of equality constraint evaluations            = 12
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 12
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 11
Total CPU secs in IPOPT (w/o function evaluations)   =      0.107
Total CPU secs in NLP function evaluations           =      0.025

EXIT: Optimal Solution Found.

You will notice that the resulting parameter estimation problem, when using the flash unit model, will have 2952 variables and 2950 constraints. This is because the unit models in IDAES use control volume blocks which have two state blocks attached; one at the inlet and one at the outlet. Even though there are two state blocks, they still use the same parameter block i.e. m.fs.properties in our example which is where our parameters that need to be estimated exist.

Let us display the results by running the next cell.

In [9]:

print("The SSE at the optimal solution is %0.6f" % (obj_value*1e-4))
print()
print("The values for the parameters are as follows:")
for k,v in parameters.items():
    print(k, "=", v)

The SSE at the optimal solution is 0.000507

The values for the parameters are as follows:
fs.properties.tau[benzene,toluene] = -0.8987624039723903
fs.properties.tau[toluene,benzene] = 1.410486110660486

Using the data that was provided, we have estimated the binary interaction parameters in the NRTL model for a benzene-toluene mixture. Although the dataset that was provided was temperature dependent, in this example we have estimated a single value that fits best for all temperatures.

Advanced options for parmest: bootstrapping¶

Pyomo's parmest tool allows for bootstrapping where the parameter estimation is repeated over n samples with resampling from the original data set. Parameter estimation with bootstrap resampling can be used to identify confidence regions around each parameter estimate. This analysis can be slow given the increased number of model instances that need to be solved. Please refer to https://pyomo.readthedocs.io/en/stable/contributed_packages/parmest/driver.html for more details.

For the example above, the bootstrapping can be run by uncommenting the code in the following cell:

In [10]:

# Run parameter estimation using bootstrap resample of the data (10 samples),
# plot results along with confidence regions

# Uncomment the following lines

# bootstrap_theta = pest.theta_est_bootstrap(4)
# display(bootstrap_theta)