Data Management Framework¶
Data Management Framework Tutorial¶
This tutorial will introduce you to the use of the IDAES Data Management Framework (DMF).
The purpose of the DMF is to give IDAES users an easy and consistent way to manage the data they use and create, with a particular focus on (a) the ability to annotate all data items with a combination of pre-defined and user-defined metadata, and (b) the ability to link between data items to record relationships between them.
In this tutorial, you will learn:
- Basic DMF terms and concepts
- How to create a new DMF instance
- How to add your data to the DMF
- How to set up relations to show dependencies and provenance of resources
- How to find, load, and remove data from the DMF
Conventions¶
The typographic conventions used in this tutorial are as follows.
Names of files will be in "double quotes", new terms will be in italics, and keywords and source code snippets will be in fixed-width type
.
Sections that offer additional detail that may be skipped are set off in blocks like this:
Basic concepts¶
This section describes the DMF conceptually, introducing standard terms used in this context. The terms are summarized at the end of the section.
Workspaces¶
The DMF is designed to support multiple users, or multiple different projects for the same user. Different users or projects are isolated from each other through the concept of a DMF workspace. Each DMF instance is initialized with a given workspace, and all its DMF operations are confined to that workspace.
For the default (and only, at this time) storage engine, the a directory in the filesystem is used for each workspace. As long as file permissions allow it, multiple users can share the same workspace, but note that there is no authentication required to modify DMF contents, so users must take care to coordinate their actions.
This file records, among other things, the 'current' workspace. This avoids having to specify the workspace with every command from the command-line interface.
Resources and relations¶
The DMF is designed to store resources, which can be any sort of digital data, and to allow users to add structured metadata to describe the author, date, and origin of those resources. Users can explicitly register the relations, or named connections, between resources in the DMF, and use these to navigate between resources.
Files¶
Usually, but not always, each resource will be associated with one or more files, such as a data file, an image, or a spreadsheet. These files are by default copied into the DMF's workspace at the time the resource is added. After this, the file may be changed or deleted without affecting the DMF. There is an option to instead refer to the location of the file (in the filesystem). This has the advantage of efficiency -- no data is copied -- and will also, for better or worse, let the file's contents change even as the resource ID stays the same. In this case, it is considered the user's problem to protect the file from being moved or deleted.
Terms¶
- resource: Digital data, with attached metadata, that is managed by the DMF
- relation: Connection between two resources, which has a named type. Connections are directional in their sense -- resource A "derived from" resource B -- but can be navigated in both the "in" and "out" direction.
- JSON: Short for JavaScript Object Notation, this is a standard way to format structured data. See https://json.org for more information.
Initialization¶
Physically, the default (and currently only) storage engine of the DMF is a simple directory on the disk. This directory corresponds t workspace.
- "config.yaml": Workspace configuration information.
- "resourcedb.json": Metadata for all resources (JSON format)
- "files/": Data files associated with resources
from idaes.core.dmf import DMF # import main class
import shutil, os
workspace_dir = "my_workspace"
# clean slate: remove anything from previous runs
if os.path.exists(workspace_dir):
shutil.rmtree(workspace_dir)
# create our workspace
dmf = DMF(workspace_dir, create=True) # create means 'create the directory'
2023-03-04 01:47:29,386 [WARNING] idaes.core.dmf.dmfbase: Unable to open global DMF configuration file for reading: File not found: /home/runner/.dmf. Using default configuration values. 2023-03-04 01:47:29,387 [INFO] idaes.core.dmf.workspace: Create new configuration at '/tmp/tmpr11nwzke/my_workspace/config.yaml' 2023-03-04 01:47:29,388 [INFO] idaes.core.dmf.dmfbase: Saving configuration location to: /home/runner/.dmf
IDAES model setup¶
In this example, we will work with the Flash unit model introduced in the 'Flash_Unit' tutorial. We will use the DMF to store, and connect, the inputs, model, and results. The following block of code creates the Flash unit model and its containing flowsheet block. Refer to the tutorial for details. Note that we add the level of indirection to make this a Python string, so we can save the code easily as a resource.
model_setup = """
from pyomo.environ import ConcreteModel, SolverFactory, Constraint, value
from idaes.core import FlowsheetBlock
from idaes.models.properties.activity_coeff_models.BTX_activity_coeff_VLE \
import BTXParameterBlock
from idaes.models.unit_models import Flash
m = ConcreteModel()
m.fs = FlowsheetBlock(dynamic=False)
m.fs.properties = BTXParameterBlock(valid_phase=('Liq', 'Vap'),
activity_coeff_model="Ideal",
state_vars="FTPz")
m.fs.flash = Flash(property_package=m.fs.properties)
"""
exec(model_setup)
Adding resources¶
Using the Python API, the user adds resources to the DMF by either creating an instance of the Resource
class, and adding it, or implicitly creating and adding a resource in one step by importing a file. One resource can also refer to multiple files. There is standard metadata that describes who, what, when, and how a resource was created. In addition, arbitrary structured data can be added in a "data" section of each resource.
When adding, the basic workflow is: 0. Initialize DMF
- Create/add resources
- Create relations between these resources (more detail later)
- Update the DMF to save all the relations
We have already initialized the DMF. So, now we create and add some resources. First, the Flash code:
from idaes.core.dmf.resource import Resource, ResourceTypes
# Add the Flash code block as a resource
flash_code = Resource(type_=ResourceTypes.code, value={
"codes": [{
"type": "block",
"language": "python",
"desc": "Create Flash unit model and containing flowsheet",
"inline": model_setup
}],
"name": "flash_code"
})
dmf.add(flash_code)
'ad990aa1680c4233b1250f695ba7ac9a'
Now we put the values that will be fed to the Flash unit model inlet into another resource.
# Put inlet values into a dict
inlet_values = {
"flow_mol": 1,
"temperature": 368,
"pressure": 101325,
"mole_frac_comp": {
"benzene": 0.5,
"toluene": 0.5
}
}
# Save the inlet values as another resource
inlet_params = Resource(type_=ResourceTypes.data, value={"data": {"unit": "Flash", "inlet_values": inlet_values}})
dmf.add(inlet_params)
'3a24e00189d0402486cf70b6afee7cc9'
Create a relation¶
Once created, resources are connected to each other, where needed, by creating relations between them. The DMF defines a few types of possible relations:
- derived: object is derived from subject
- contains: subject contains the object
- uses: subject uses the object
- version: object is a (new) version of the subject
The terms subject and object, above, describe the direction of the relation with respect to the two resources. To take a simple example, imagine I have resources that represent a shoebox and a pair of shoes. To represent the relationship of the shoebox containing the shoes, then the shoebox is the subject and the shoes are the object of the relation "contains".
Relations, unlike resources, are not added directly to the DMF. Instead they are created with a method that also
requires both resources to be provided, and then you call update()
to save them.
The relation is represented in both the subject
and object of the relation. For example, given the shoes resource from the example above, one can find that they are
the object of the "contains" relation with a shoebox resource.
We can connect the two resources we just created with the relation "uses", in the sense of the Flash unit model using the inlet values. The Flash unit code is the subject, and the inlet values are the object, of this relation.
from idaes.core.dmf.resource import create_relation, Predicates
create_relation(flash_code, Predicates.uses, inlet_params)
dmf.update() # sync to DMF storage
2
Now, we finish constructing the Flash unit with the parameter values, then solve the model.
For good measure, we save the status of the solution in another resource, and connect that to the flash model with the derived relation.
import idaes.logger as idaeslog
# Set inlet values using dict created above
for key, value in inlet_values.items():
if key == "mole_frac_comp":
for chem, frac in value.items():
m.fs.flash.inlet.mole_frac_comp[0, chem].fix(frac)
else:
getattr(m.fs.flash.inlet, key).fix(value)
# Finish Flash unit setup
m.fs.flash.heat_duty.fix(0)
m.fs.flash.deltaP.fix(0)
m.fs.flash.initialize(outlvl=idaeslog.WARNING) # quiet
# Solve the model
solver = SolverFactory('ipopt')
status = solver.solve(m, tee=False) # also, quiet
print(status)
# Save the result in the DMF
result_rsrc = Resource(type_=ResourceTypes.data, value={"data":{"status": str(status), "solver": "ipopt"}})
dmf.add(result_rsrc)
create_relation(flash_code, Predicates.derived, result_rsrc)
dmf.update()
Problem: - Lower bound: -inf Upper bound: inf Number of objectives: 1 Number of constraints: 41 Number of variables: 41 Sense: unknown Solver: - Status: ok Message: Ipopt 3.13.2\x3a Optimal Solution Found Termination condition: optimal Id: 0 Error rc: 0 Time: 0.00811314582824707 Solution: - number of solutions: 0 number of solutions displayed: 0
3
Searching for resources¶
To retrieve and use previously added items from the DMF, you need to first locate them. There are four primary ways to do locate a resource:
- Find by resource ID: Of course, this is fast, but you first need to know the identifier.
- Find by query: You can search by any of the fields in the resource.
- Find by name: A special case of find by query, you can search on the 'name' you gave the resource.
- Find by relations: Given one resource, you can search for other resources to which it is related.
Finding resources by ID¶
We first show how to retrieve a resource by its ID. The method is a generator because an ID prefix is allowed, so you need to look at the first (and only) result explicitly.
# use id from resource above
rsrc = list(dmf.find_by_id(flash_code.id))[0]
print(rsrc.v["codes"][0]["inline"])
from pyomo.environ import ConcreteModel, SolverFactory, Constraint, value from idaes.core import FlowsheetBlock from idaes.models.properties.activity_coeff_models.BTX_activity_coeff_VLE import BTXParameterBlock from idaes.models.unit_models import Flash m = ConcreteModel() m.fs = FlowsheetBlock(dynamic=False) m.fs.properties = BTXParameterBlock(valid_phase=('Liq', 'Vap'), activity_coeff_model="Ideal", state_vars="FTPz") m.fs.flash = Flash(property_package=m.fs.properties)
Finding resources by query¶
The query syntax that is supported is a filter that is in the style of the filters
used by the MongoDB database engine. To simplify the common case, a name
parameter
is also provided to find resources by name. For the purposes of this tutorial, we will
show the use of both methods.
First, let's find the Flash inlet parameters by looking for the value of "unit" in the "data" section. Note that this type of search requires that you know something about the structure of the metadata inside a resource.
For the purposes of understanding this query, you need to know 3 things:
- The query itself is contained in a Python dict given to the
filter_dict
parameter of thefind_one()
method. - You can search for nested fields, such as the "unit" field in the "data" section, by using dots between the field names.
- By putting a tilde ("~") before the value, the value will be treated like a regular expression. Here, we use this to find anything with the word "flash" in it, without worrying about upper and lower case.
import re
rsrc = dmf.find_one(filter_dict={"type": "data", "data.unit": "~.*flash.*"}, re_flags=re.IGNORECASE)
display(rsrc.v)
{'id_': '3a24e00189d0402486cf70b6afee7cc9', 'type': 'data', 'aliases': [], 'collaborators': [], 'created': 1677894449.474633, 'modified': 1677894449.474633, 'creator': {'name': 'runner'}, 'data': {'unit': 'Flash', 'inlet_values': {'flow_mol': 1, 'temperature': 368, 'pressure': 101325, 'mole_frac_comp': {'benzene': 0.5, 'toluene': 0.5}}}, 'datafiles': [], 'datafiles_dir': '', 'desc': '', 'relations': [{'predicate': 'uses', 'identifier': 'ad990aa1680c4233b1250f695ba7ac9a', 'role': 'object'}], 'tags': [], 'version_info': {'created': 1677894449.474633, 'version': [0, 0, 0], 'name': ''}, 'doc_id': 2}
Findin resources by name¶
Finding by name is relatively simple, you just pass the name
parameter to the find()
or
find_one()
method. Note that the name must match exactly what was placed there.
# Find the Flash code resource by its name
rsrc = list(dmf.find(name="flash_code"))[0]
print("Result metadata:")
display(rsrc.v)
Result metadata:
{'id_': 'ad990aa1680c4233b1250f695ba7ac9a', 'type': 'code', 'aliases': ['flash_code'], 'collaborators': [], 'created': 1677894449.466004, 'modified': 1677894449.466004, 'creator': {'name': 'runner'}, 'data': {}, 'datafiles': [], 'datafiles_dir': '', 'desc': '', 'relations': [{'predicate': 'uses', 'identifier': '3a24e00189d0402486cf70b6afee7cc9', 'role': 'subject'}, {'predicate': 'derived', 'identifier': '0fd9c8ea78714b8cb0833ea5287e4fd3', 'role': 'subject'}], 'tags': [], 'version_info': {'created': 1677894449.466004, 'version': [0, 0, 0], 'name': ''}, 'codes': [{'type': 'block', 'language': 'python', 'desc': 'Create Flash unit model and containing flowsheet', 'inline': '\nfrom pyomo.environ import ConcreteModel, SolverFactory, Constraint, value\nfrom idaes.core import FlowsheetBlock\nfrom idaes.models.properties.activity_coeff_models.BTX_activity_coeff_VLE import BTXParameterBlock\nfrom idaes.models.unit_models import Flash\nm = ConcreteModel()\nm.fs = FlowsheetBlock(dynamic=False)\nm.fs.properties = BTXParameterBlock(valid_phase=(\'Liq\', \'Vap\'),\n activity_coeff_model="Ideal",\n state_vars="FTPz")\nm.fs.flash = Flash(property_package=m.fs.properties)\n'}], 'doc_id': 1}
Finding resources by relations¶
Finally, you can also find a resource by navigating to it from an existing resource. For example, we can find all the resources that are connected to our Flash unit model. This will discover:
- Flash code ─ uses → Flash inlet parameters
- Flash code ─ derived → Status of solve
for depth, triple, meta in dmf.find_related(rsrc, # <-- this was the flash code we found above
outgoing=True, # <-- look at outgoing edges
meta=["type", "data"]): # <-- extract this metadata
if meta["type"] == ResourceTypes.data:
data = meta["data"]
if "status" in data:
print(f"{triple.predicate} --> Status:")
print(data["status"])
elif "inlet_values" in data:
print(f"{triple.predicate} --> inlet values:")
display(data["inlet_values"])
uses --> inlet values:
{'flow_mol': 1, 'temperature': 368, 'pressure': 101325, 'mole_frac_comp': {'benzene': 0.5, 'toluene': 0.5}}
derived --> Status: Problem: - Lower bound: -inf Upper bound: inf Number of objectives: 1 Number of constraints: 41 Number of variables: 41 Sense: unknown Solver: - Status: ok Message: Ipopt 3.13.2\x3a Optimal Solution Found Termination condition: optimal Id: 0 Error rc: 0 Time: 0.00811314582824707 Solution: - number of solutions: 0 number of solutions displayed: 0
Working with files¶
A resource provides a relatively simple way to access any related files, with the get_datafiles()
method.
In the simplest case, this returns a list of pathlib.Path
objects that can be used however the user desires.
If you pass in a "mode", then the method will attempt to open files in that mode and return a Python file object
instead.
To illustrate this, let's create a resource for this notebook itself, as a file. Then, we can find a special code block in the notebook, showing that we can actually read the resulting file.
# Add a new resource for the notebook file
notebook = dmf.new(file="data_management_framework.ipynb")
# Special:Start
for fp in notebook.get_datafiles(mode="r"):
text = fp.read()
start, end = text.find("Special:Start"), text.rfind("Special:End")
block = text[start:end]
print(block[block.find("for"):])
# Special:End
for fp in notebook.get_datafiles(mode=\"r\"):\n", " text = fp.read()\n", " start, end = text.find(\"Special:Start\"), text.rfind(\"Special:End\")\n", " block = text[start:end]\n", " print(block[block.find(\"for\"):])\n", "#
Removing resources¶
Resources can be removed using either their unique identifier (found in the ".id" attribute), or en masse using
a search expression, as you would give to the find()
method. The latter is not recommended unless you know what
you are doing.
# uncomment and run this to remove the notebook resource
# dmf.remove(notebook.id)
Jupyter "magics"¶
To make working with the DMF in Jupyter Notebooks a little easier, a few Jupyter "magics" have been defined. Magics are special keywords that are prefixed with a '%', that call underlying Python code. There are a number of built-in magics that you can see here.
The magics defined for the DMF are:
%dmf workspaces
: List all possible workspaces (does not require %dmf init)%dmf init
: Call this to set the DMF workspace used by all other magics%dmf status
: Show information about current DMF status%dmf list
: List contents of current workspace%dmf help
: Show help on an object or method for IDAES
Below are some examples of using these magics.
# Need to import this to register the magics with your Jupyter Notebook
from idaes.core.dmf import magics
%dmf workspaces
Path | Name | Description |
---|---|---|
./my_workspace | none | none |
'1 workspace(s) found'
%dmf init my_workspace
2023-03-04 01:47:29,832 [INFO] idaes.core.dmf.dmfbase: Saving configuration location to: /home/runner/.dmf
Success! Using workspace at "my_workspace"
%dmf status
Configuration¶
- _id: bafee5c7bdaf4a9b9f742089eb6e6b45
- created: 2023-03-04T01:47:29.832220
- modified: 2023-03-04T01:47:29.832220
- name: my_workspace
%dmf list
ID | Name(s) | Type | Modified | Description |
---|---|---|---|---|
ad990aa1680c4233b1250f695ba7ac9a | flash_code | code | 1677894449.466004 | |
3a24e00189d0402486cf70b6afee7cc9 | data | 1677894449.474633 | ||
0fd9c8ea78714b8cb0833ea5287e4fd3 | data | 1677894449.715906 | ||
48da48d6ca87457ea49c9c552d9aae69 | notebook | 1677894449.761714 | data_management_framework.ipynb |
True
Thank you!¶
That's the end of the tutorial. Thanks for your interest.
For more information on the DMF APIs and command-line interfaces, see the DMF section of the IDAES documentation.