Semantics for Extra Large Banks (video) - Configurable Ontology to Data model Transformation

The Configurable Ontology to Data Model Transformation (CODT) technology created the 3,173-entity FIBO Data Model. The US Patent and Trademark Office issued patent #12,038,939, which opens CODT for Financial Institutions.

The PowerPoint for the FIB-DM Education lesson for large banks has 68 pages and takes several hours to deliver in class. Therefore, the YouTube Training Course has two videos:

Part 1 on the FIB-DM website addresses challenges for Semantic Centers of Excellence in leveraging ontologies for Data Management. Semantic and conventional data management use the same language rather than creating silos. The vision is Semantic Enterprise Information Architecture with ontologies at the apex and derived data, message, process, and object models. The video outlines goals for a “perfect” ontology-derived data model and shows in detail the outcome of the transformation: the FIBO Data Model.

Part 2 (this video) explains how CODT works: the Extract, Transform, and Load (ETL) approach and the patented Metadata Sets, end to end. We briefly discuss the Reverse Mode (transforming data models into RDF/OWL), as well as Patent, Licensing, Pricing, and Proof of Concept.

You can review the PowerPoint presentation and download a PDF copy here.

Transcript of the video lecture

Hello, welcome back to the second part of Semantics for Extra Large Banks.
This is Jurgen.
If you remember, in the first part, I spoke about the challenges for Semantic Centers of Excellence and the vision of Semantic Enterprise Information Architecture, with the FIBO at the apex and derived data, object, and message models.
I showed you the source ontologies, the FIBO and OMG Commons.
We looked at the Financial Industry Business Data Model.
How is it structured?
How does it preserve the extensive FIBO/OMG documentation?
Now, in the second part, we will look at how the Configurable Ontology to Data model Transformation CODT achieves its results.
“Atlantic” was the nickname for the first version, which used MS Excel Power Query and the M language.

The way to um SEIA’s Model-Driven Development.
We have relational databases, and now also semantic Triple Stores or RDF Stores.
We have the industry-standard FIBO in RDF/OWL, and we have always had data models that deploy on RDBMS.
And with CODT, the Configurable Ontology to Data model Transformation, we have the sibling of the FIBO, FIB-DM as a Conceptual Data Model.
Atlantic is the way to Semantic Enterprise Information Architecture and Model-Driven Development.
The FIB-DM Full version, as of Q4 last year, has over 3,000 entities. It’s the world’s largest data model.
And Atlantic CODT, the Configurable Ontology to Data model Transformation, created FIB-DM from the industry-standard ontologies.

The patented technology that created the FIBO Data Model and that’s US patent 1238939.
And in the first part, we learned that the old file-parsing approach doesn’t produce usable data models, and it cannot cope with very large ontologies like the FIBO.
The new ETL approach in CODT creates high-quality models.
The technology is fully scalable and configurable.
And what it really is is what we’ve done for 30 years.
From the RDF/OWL, it’s ETL, Extract, Transform, and Load into the data modeling tool. Metadata sets are keyed records that hold properties for all objects in a model. Ontology metadata sets hold the record extracted from the ontology platform.
Entity-Relationship metadata sets transform ontology into entity-relationship representation.
PowerDesigner (or other tools) metadata sets are ready to load into the data modeling tool.

Metadata sets are a radically different new approach.
They are metadata stored in data sets. That’s similar to system tables on your relational database. Just like they store metadata about tables, columns, and foreign keys.
CODT metadata sets are isomorphic representations of Ontology, Entity-Relationship, and data-modeling tool-specific metadata.
And then the transformation is a simple two-step process.
From the ontology metaset data set in step one, we transform it into generic entity-relationship metadata sets, and then, in the second step, we transform the generic ER into tool-specific metadata, such as here for PowerDesigner.
So, the same generic ER metadata set is the source for both PowerDesigner and Sparx EA metadata sets.

Let’s look at CODT as a system.
This is a UML component diagram and is actually Figure 2 of the CODT Patent drawings.
In the box is the Configurable Ontology to Data model Transformation. It interfaces with two external systems: the Ontology Platform, an ontology editor, or an RDF or Triple Store.
And the Data Modeling Tool, for example, PowerDesigner.
Then we have 3 core components: Extraction interfaces with the ontology platform. Transformation, and finally, we Load into the data modeling tool.
MS Excel for version Atlantic is the tool of choice to view and analyze tabular data. Every data architect has Excel and knows how to use it.
Therefore, MS Excel is a fast prototyping tool for the CODT Metadata Sets, and it makes it easy to deploy the transformation. Here, this little table shows the Components and the Metadata Sets. In CODT Atlantic, there are different Excel workbooks for ontology, entity-relationship, and PowerDesigner.
But any platform and programming language can implement the CODT system, the metadata sets, and the method.

From ontology class to data model entity, we follow the journey.
Here at the top, we have the Ontology MDS in Excel. It shows us the Classes tab.
At the bottom, we have the outcome, the PowerDesigner MDS for Entities.
PowerDesigner can import this Excel spreadsheet directly.
We will see how classes transform into entity Code, and how the SKOS Definition becomes a Comment on an entity, and how the Local Name transforms to the Entity Name.

The Extraction works with SPARQL queries.
That is the novelty. CODT doesn’t parse RDF/OWL files; instead, it uses SPARQL to extract the ontology metadata.
Here’s an example query that selects the class, qualified name, namespace, and definition, and filters out unnamed classes. We don’t need “owl:Nothing,” and we don’t need “owl:Thing.”
The result of that query is a CSV file.

Extraction loads the CSV result set into the Ontology MDS. The Ontology Metadata workbook imports the raw extract and performs simple format conversions from the raw result.
So, we have the Class, we have the Qualified Name, the Namespace, and the SKOS Definition. Other tabs load Ontology Metadata Sets for Superclasses, Subclasses, Equivalent Classes, Data-, and Object Properties.

Throughout CODT, Excel Power Queries extract into the Meterdata Sets.
Here is our Classes Metadata Set again. We see here a comment from the Commons Ontology, Authorization, Code Elements, Code sets, Classifiers, and so on.
These are all ontology classes. If we select “Get Data,” the Excel Power Query ribbon opens, and the Metadata Sets self-populate.
So, every worksheet has a query. We see the Queries & Connections for Class, Superclass, and Subclass here. We can “refresh,” meaning load individual or all Metadata Sets. The Queries & Connections pane shows the load status, any errors, and the number of records in the Metadata Set. For example, FIBO/OMG has 2,436 classes.

Excel Power Query makes the transformation rules very transparent.
Here, we have the Power Query editor, and on the left-hand side, we see all the Ontology Metadata Sets. And here, for the Class Metadata Set, it shows us the steps and transformation rules, and gives us a preview of the result set.

It shows us the editor of the Power Query M language. That’s a 4GL language.
We see the data source is the raw SPARQL query result set. It shows us the code for the steps that it applies.

Transformation step one into the Entity-Relationship MDS
We see the Entity tab and the Code um is populated from the Class Qualified Name. The URI is the Uniform Resource Identifier of the FIBO class.
A VBA function transforms the local name into an entity name as per the naming convention. There is basically an uncamel: changing the Camel Code into Logical Data Model names, and you can configure that.
Power Query, using the Ontology MDS as a source, populates the Entity tab in the Entity-Relationship MDS.

Transformation step two into the Tool-specific Metadata Set. There, we convert the generic E/R into a Data Modeling tool-specific meter data, in this case, for PowerDesigner, and PowerDesigner can directly import this spreadsheet.
For entities, the transformation is a simple copy of the record in the E/R Metadata Set.

Load into the data modeling tool.
Here we have PowerDesigner, and PowerDesigner imports the meter data set.
On the left-hand side, we see all the Excel imports defined in PowerDesigner. We have 25 of them.
And we see the Metadata Set Column mapping to PowerDesigner model objects.

Stacked queries and ETL master the complexity of the transformation.
We can see the query dependencies in MS Power Query.
On the left-hand side, we have the Ontology Metadata Sets, and on the right-hand side, the Entity-Relationship Metadata Sets. They are implemented in two Excel workbooks.
For example, if we look at Entity Subtypes, we see that this Excel worksheet is populated from the Ontology Metadata Set for Subclasses.
Just a note here about Intermediate Metadata Sets. How do we populate Associative entities? For example, in the OMG Commons and in the FIBO, an Object Property can have an Inverse Property. For example, OMG Commons Business Authorizations have “authorizedBy” and “authorizes.” We don’t want to have two Associative Entities in the data model. So, we configure transformation rules to merge these two, and then we derive Association Subtypes from both the Ontology Object Property and its Inverse.

Some statistics about CODT Atlantic.
The MDS folders contain the queries that provide the interface for Metadata Sets in the next transformation step. We can see the biggest piece is the entity relationship Metadata Sets, which has 80 tabs and 84 Power Queries.
In total, across the three CODT workbooks, we have 142 worksheets and 158 Power Queries. We have 18 SQL queries and 25 Excel imports into Power Designer.
So, CODT is a white box, an open book. Also, in the Excel version, the software fully discloses all worksheets, queries, and VBA code.
New users and operators can generate with a single click using the default configuration settings.
As a data architect, you use CODT as an ETL development platform, diagnosing results and tweaking transformation rules for your modeling and naming standards. And if you want, VBA developers can secure the data sheets and fully automate the ETL. Or you can port CODT to your ETL environment.

Let’s take a look at the CODT embodiments.
Here is the patent table 14. The shaded area in blue, that is Atlantic CODT on MS Excel.
The license includes the Patent Rights to use the Intellectual Property, to use the Metadata Sets and algorithm for full production Semantic Enterprise Information Architecture. You can automate interfaces and encode the patented embodiment below.
For example, you can create a connection to your RDF store and run the queries in a batch.
You can move the CODT server-side. You can migrate the ETL to your environment, or use your ETL language rather than M, and store the Metadata Sets on your relational database.
You can create a user interface for operators. You can create configuration wizards. And you can generate other models, such as object, message, or physical models. And finally, you can load directly using your data modeling tool or model repository API.

The Reverse Mode means transforming your Data Models into RDF/OWL to leverage your design of Enterprise Ontologies and Knowledge Graphs.
I said before, the CODT Metadata Sets are bi-directional. No matter what direction we go, it’s the same Excel tab. With that, we can reverse-engineer ontologies from data models.
Again, it’s ETL. Only here on the left-hand side, we have the Data Model, for example, in PowerDesigner.
We Extract, Transform, and Load into our RDF store. The first step is to generate list reports matching the Data Modeling Tool-specific Metadata Sets.
Power Query populates the Metadata Sets, performs basic data cleansing, and then the Tool-specific MDS populates the Entity-Relationship Metadata Sets; finally, the Ontology Metadata Sets populate from the Entity-Relationship Metadata Sets. Power Queries and formulas break the dataset down into Triples, and we load the Triples into the Ontology Platform using SPARQL CONSTRUCT or a bulk insert.

Here’s an example with a very simple model. The New York Stock Exchange OpenMAMA messaging API. This data model has some tables for Auction, Order Book, Quotes, and Trades.
The PowerDesigner entity list report has a Code, Name, and Comment. The Power Designer MDS sources this list report. We create this report in the modeling tool, then load the result set into our PowerDesigner MDS.

And then, we transform the Entity-Relationship MDS. The Meterdata Set here populates from the PowerDesigner Entity MDS. We have the same columns for the entity: Code, Name, Comment, Prefix, Local Name, and URI.
Prefix and URI are configuration settings matching the designated Prefix and Namespace in the ontology. In other words, I tell CODT that I want to use this prefix here for my generated um ontology.
The Entity Name transforms into a Local Name with a “camel code” string function. So, from the logical naming standard “Order Book”, we get the camel code. The Resource Name is simply a concatenation of Prefix, the colon as a delimiter, and the Local Name.

And then we can load into the Ontology Metadata Set.
There we have Class, Namespace, SKOS Definition, and the other properties.
This helper sheet, Triple Metadata Set, breaks down the Class record set into Triples: Subject, Predicate, and Object.
The triples that we assert in the ontology actually match the SPARQL SELECT joins.
We have triples that we want to construct, assert in the ontology: Subject, Predicate, Object – auction, RDF type, owl class. And that matches what we did to create FIB-DM: we queried Class, Qualified Name, Namespace, and SKOS Definition.
And then the Definition here is the same. Here we want to CONSTRUCT fib-omds:SecurityStatus, OMDs Security Status as SKOS Definition, and here’s the definition text.

Yeah, and finally, we assert the triples on the ontology platform.
Here we are in Topbraid, with several classes and our SPARQL CONSTRUCT statement. We see the Auction class with its SKOS Definition.

The United States Patent and Trademark Office has granted the CODT patent and published it with its 23 drawings, 19 tables, and 35 pages of specification, which fully disclose the invention.
You can read about the patent on the codt.net website.
16 claims comprehensively cover the method, system, non-transitory storage medium, and all embodiments. The patent protects CODT licenses and generated models, including FIB-DM.

About licensing CODT.
FIB_DM licensees can purchase CODT as an add-on, and new users can license CODT and FIB-DM as a bundle. There is no standalone CODT license, because Jayzed already holds the copyright to the FIBO Data Model.
Software deliverables are the MS Excel CODT workbooks. The license doesn’t limit the number of users. You’re free to modify the software and to create new models for internal use. Just as with your FIB-DM license, you must keep derived models confidential. So you cannot use CODT and publish your own FIB_DM.
Educational resources are included, and you’re free to modify, translate, add, lift off images and diagrams as long as they remain within your organization.
And the license fully covers the Intellectual Property, the patent rights. So you’re free to leverage Metadata Sets, queries, formulas, and algorithms disclosed in the source code and the specification for internal development. But you must not share CODT embodiments outside your organization.

As with FIB-DM, licenses are priced by institution size.
It started as per your EDM Council membership tier as a pricing segment.
The add-on price for existing FIB-DM licensees is 2/3 of the price of your data model license. For example, around $40,000 for a Tier B bank.
The bundle price for new users is 1.5 times the standalone FIB-DM.
And, as with FIB-DM, central banks, multilateral lenders, and other qualifying financial institutions receive the Tier-C price regardless of asset size.

CODT Proof of Concept.
That’s an offer to try, test, and evaluate CODT. The scope of Semantic Enterprise Information Architecture is quite a big undertaking.
FIB-DM already proves that CODT creates a superior data model. So, the objective for the PoC is simply to show that CODT works for your FIBO extensions, to test the application, and to evaluate the intellectual property.
The materials for the PC are the Excel workbooks, education material, and the Patent for Legal & Compliance to assess. It contains two days of training and three additional days of offline support (in emails and calls).

Who should be in a PoC team?
Management, Finance,, your business sponsor who’s authorized to sign nondisclosure and license agreements.
The Ontologist with an in-depth understanding of the FIBO and in-house ontologies, because you are the one responsible for adapting the queries to your SPARQL dialect. You produce the raw ontology metadata.
A Data Architect with experience in enterprise reference models. You configure CODT to match your naming standards, and you load Metadata Sets into the modeling tool. The developer or MS Excel power user with some experience in VBA, Power Query and the M language. You can troubleshoot complex formulas and queries and explore other technical embodiments.

Technical preparation for the PoC.
You should have a Power PC. I recommend Windows 11 64-bit with some 32 GB of RAM. MS Excel, Power Query, the Data Modeling Tool, and an interface to the RDF Platform are installed.
You need the platform where you have the FIBO ontology, the platform or ontology editor with a SPARQL query user interface.
That can be Toopbraid, Protégé, or any RDF Store semantic endpoint.
You should use the SAP PowerDesigner data modeling tool. If you have ERWin or other modeling tools, use the PowerDesigner trial first, and import the data model. Later, you can customize CODT to import into your tool.
You need the FIBO loaded into your ontology platform. You should try entity queries and reproduce the raw meter data extract. Your proprietary ontology should be an extension of the FIBO. Make sure to include FIBO modules and to define a prefix for your namespace. Here, for example, is the Bank ontology. There, we just have another TTL (turtle) file. The entity query must return FIBO alongside your classes with the prefix. So, this all means your FIBO extension or other in-house ontologies must be well-formed.

The Proof of Concept would typically take six weeks.
We have two weeks for the introduction to CODT and the transformation of the FIBO as a PoC, and then we repeat that exercise, adding your proprietary ontologies. You can explore configuration changes and other embodiments.

Next steps: if you want to discuss a CODT PoC, you can find further resources on the FIB-DM and CODT websites and on this video on the YouTube Education channel. You’re welcome to send me an email to schedule an overview and discussions with your questions and answers.

Summary and conclusion.
The Semantic Center of Excellence must not become another silo.
Our vision is Semantic Enterprise Information Architecture with the ontology at the apex.
FIBO is the industry standard, and FIB-DM is the superior industry standard data model.
CODT leverages the ontology for data management.
Copyrights and Patents protect your investment.
Well, thanks for watching.