Semantics for extra large Banks - Configurable Ontology to Data model Transformation

The Configurable Ontology to Data Model Transformation (CODT) technology created the 3,074-entity FIBO Data Model. The US Patent and Trademark Office issued patent #12,038,939, which opens CODT for Financial Institutions.
The first video, focused on the model content and structure is available on the FIB-DM website.
This second part of the tutorial is a deep dive into the ETL (Extract, Transform, and Load) inspired approach. Patented Metadata Sets for Ontology, Entity-Relationship model, and the specific Data Modeling Tool are self-populating. We extract Ontology metadata using SPARQL queries, MS Power Query, Excel formulas, and Visual Basic for Applications code for transformations. CODT also operates in reverse mode, transforming Data Models into RDF/OWL.

Presentation

The updated PowerPoint deck reflects the granted patent.

I recommend watching the video first and then downloading a PDF of the PowerPoint for your reference. The video demos the screenshots in PowerDesigner and MS Excel; the voiceover has twice the word count because it explains the diagrams.

Transcript

Welcome back, this is the second introduction.
It is where version 1.0, “Atlantic” meets Microsoft Power Query. In this stack we show how the actual transformation works. You will see it in MS-Excel using Power Query and the “M” language. The patent-pending technology that created the FIBO Data Model.
As we have seen in the first part of the webinar video, the old OWL-file parsing approach doesn’t produce usable data models, and it cannot cope with very large ontologies. The new approach is inspired by ETL. It creates high-quality models and the technology is fully scalable, and configurable. With ETL, we start with RDF/OWL, the ontology: We Extract Transform and Load into the data modeling tool.
Behind the scenes are Metadata Sets, MDS. They are keyed records holding properties for all objects of the model. For example, we have metadata set for entities that has 4,568 records. There are three types of metadata sets: The ontology metadata sets; they hold the record extracted from the ontology platform. Entity-Relationship metadata sets transform the ontology metadata into entity-relationship metadata. PowerDesigner (or any other tool) metadata sets are ready to load into the modeling/development platform.
The metadata sets are a novel approach to model transformation. Let me define it: Metadata Sets is metadata stored in data sets. We have all seen that before. Think of System Tables on the relational database. They are data sets that hold metadata, let’s say about a database table or database columns. The CODT metadata sets an isomorphic representation of ontology, entity-relationship, and data modeling tool-specific metadata.
The transformation is a two-step process: Step one transforms the ontology metadata into generic entity-relationship metadata. The second step transforms the generic er meter data into tool-specific metadata. The same generic metadata set is the source for both PowerDesigner and Sparx EA metadata sets.
This is a look at the folder structure. CODT installed in the home directory, in this case I named it FIBO second quarter production. It installs with the main a worksheet, CODT that has VBA and macro code and three subfolders for Ontology, PowerDesigner, and Entity- relationship metadata sets. If I open the Ontology Source metadata set that has the workbook, and also all the SPARQL queries, and the output, the raw query results. I open the Ontology and the Power Designer MDS. On the top is the ontology. Here’s the PowerDesigner What we see is, basically every metadata set is a sheet in the workbook. The top here, we look closely at classes. Likewise, here for the PowerDesigner MDS, we have entities, inheritance, inheritance link.
Basically, the transformation uses this ontology metadata to populate the PowerDesigner metadata that PowerDesigner can directly import as a data model. The classes here, we see the Class Code, Qualified Name, Namespace, SKOS definition. In our target we see the Class Code has become an Entity Code. The Name, actually that is a transformation rule: We take the Localname here, in this case “BoardAgreement” and we “uncamel” the name to comply with the Logical Naming Standard. The comment is populated from the SKOS definition, and the URI is simply the Namespace plus the Localname. Now, let’s look at the system specification. This is actually Figure 2 of the patent application. It shows um the UML component diagram of CODT.
The two external systems, the Ontology Platform and the Data Modeling Tool, and internal components: Extraction of the ontology meter data set. Transformation is the E/R metadata set. Load – the PowerDesigner metadata set. We see the two interfaces. So, Extraction uses the Ontology Platform SPARQL Interface.
The data modeling tool in this case PowerDesigners import interface. We have a Configuration that we have already seen in the in the CODT metadata set Microsoft Excel, in my opinion, is the tool of choice to view and analyze tabular data. Every data architect/data modeler has Excel and knows how to use it. Therefore, MS-Excel is a fast prototyping tool for the CODT metadata sets but it also makes the transformation easy to to deploy. Here’s a table, it shows the components, Extraction, Transformation, Load, the corresponding Ontology, generic E/R, and PowerDesigner metadata sets, and the Excel workbooks that you find with the installation. However, any platform and programming language can implement the system, the metadata sets, and the method.
The Extraction works with with SPARQL queries. Here this is the SPARQL query to to extract the Class metadata: It selects the class, the qualified name, namespace, and definition. It filters out the the unnamed classes. This is where it may vary depending on the ontology platform and dialects. If you don’t have a filter “smf:isBound” for the namespace and an “afn” function look it up in for your SPARQL dialect. You want to filter out, you do not want to extract, unnamed classes. Here at the bottom we see the result. In my ontology platform, I simply used TopBraid Composer, it extracts it into a CSV file. We see here our SELECT variables, the license document, a class, the QName, the namespace and the SKOS definition.
So, here’s our Ontology metadata set again with the classes tab.Under >Data< in Excel, we have “Get and Transform Data.” That lets us source queries from various data sources. The Queries and Connections: I look at the queries and connections defined in this workbook Ontology Mds and here we have 21 Ontology MDS queries. They basically populate the spreadsheet that constitutes the interface. In other words, the data that the E/R Metadata Sets take. Here at classes, we can see the the number of rows loaded, and we have a little preview of the Power Query.
if I edit this query, it invokes the Power Query Editor. It shows here. It shows our data source the classes CSV extract. It lets us do different transformation steps. Okay, the first step of the the transformation happens in the Entity-Relationship Meta data Set. We have the entity tab, and the Code is a sourced from the Ontology Metadata Set, the class QName.
Prefix and Localname simply break up the code, and then formulas transform the local name into an Entity Name as per the naming convention. We use an “uncamel” function that breaks up the local name, in this case simply inserting a space.
In the second step to the tool specific metadata set, we convert the generic Entity-Relationship MDS into data modeling tool-specific metadata set, in this case PowerDesigner can directly import the Metadata Set and for entities the transformation is a simple copy of E/R. Finally, we can load the PowerDesigner Metadata Set, the Excel spreadsheet directly into the modeling tool. In PowerDesigner we can define excel imports.
There are 24 Metadata Sets, for entities, inheritances, data items, attributes, associative entities, relationships packages, and annotations. Each of these imports have a name and the imported file. It’s the CODT home directory that we looked at with the entities. The mapping description shows that what we’re mapping. Each Metadata Set column maps to a power designer meta-model object. It facilitates easy import if we use the PowerDesigner meta-model object names as our columns. Here simple, we have a table Entity that maps to Entity model object. The properties of an entity are the Code, the Name Comment, and then a bunch of extended attributes for the FIBO annotations.
Stacked queries and ETL masters the complexity. The screenshot here is the power query dependencies on the left- hand side we have Ontology Metadata Sets, on the right-hand side Entity-Relationship Metadata Sets. They are from the two Excel workbooks the Power Query data sources and tables. Interface MDS are queries and worksheets that subsequent Metadata Sets use as a data source. Here, the E/R Association
Supertype Subtype Metadata Set is a set that the Power Designer MDS use as a source for their population query. There is, in this case, a hierarchy of Intermediate Metadata Sets, association subtypes going up to Properties, Active & Passive, and all the way to the E/R Metadata Set. Some statistics about the CODT excel application. The MDS folders, that we looked at, the whole queries that provide the interface for Metadata Sets in the next transformation step.
We have that for the ontology, 21 Interface Metadata Sets, some 20 Intermediate ones, and here Entity- Relationships, 24 and quite a lot Intermediate Metadata Sets because that transformation is rather complex when it comes to Object Properties. Finally, the data model, in this case, PowerDesigner Metadata Sets. It’s generally straightforward all we have to do is some renamings from the generic E/R so for example in the generic E/R we call it Subtypes, PowerDesigner calls it Inheritances, Sparx EA calls it Generalization. All in all, there are 150 Excel sheets and PowerQueries.
CODT is a White Box, an open book. The Excel version fully discloses all worksheets queries, and the Visual Basic for Applications code. New users and operators can generate the data model import sheets with a single click, using default configuration settings. A Data Architect uses CODT as an ETL and Development Platform, for diagnosing results and tweaking transformation rules to match your modeling and naming standards. VBA developers may secure, in other words, hide and block, the data sets and fully automate extract and load or port the application from MS=Excel to a java program or put it all server- side.
CODT embodiments: An embodiment is one of the ways to build the invention. This is Table 14 of the patent application and I added blue for the Excel embodiment. It breaks it down by different ways to build the connection to the ontology source, the transformation system, and the data model. For example here, to create a connection we can directly encode it to connect to an RDF-Store or run the queries in a batch. Instead of MS-Windows, we can move the application server-side.
For the application type, here it’s on MS-Excel; we can encode it in in our ETL environment or simply as a Java or C program. The user interface in the Excel application is a White Box – everything visible, but it can be a guided user interface. For example, you can encode a configuration wizard, that takes a user through the options and parameters. Besides a Conceptual Data Model, we can generate other types of models, Logical, Physical, Object. and also instead of Instead of manually importing worksheets into PowerDesigner, we could use a load directly using the data modeling tool or repository API. For example, ERWin doesn’t really have a metadata import from CSV or from Excel. If you want to load directly into ERWin then you would use the ERWinn Application Programming Interface. The Reverse Mode is something special that doesn’t really fit in the previous table about the different ways to implement CODT.
The CODT Metadata Sets they are by design bi-directional. In other words, they work in both ways. CODT can reverse-engineer ontologies from data models. The first step here is to extract. The way it works is that the data modeling tool generates List Reports, matching the data modeling tool-specific Metadata Sets. Then the PowerQuery populates the Metadata Sets, performing simple data cleansing. In the Transform step, the Entity- Relationship Metadata Sets populate from Tool-Specific Metadata Sets. The Ontology Metadata set populates from the Entity-Relationship Metadata Sets.
Power Queries and Formulas break the data down into Triples, and then we load these triples onto the ontology platform using SPARQL CONSTRUCT or Bulk Inserts. In the Reverse example, we extract from PowerDesigner Entities.
Our example is a Logical Model created from the New York Stock Exchange Open MAMA messaging API. We have things like the Auction, Order Book, Quote, Referential, Security Status, and Trade. The thing is PowerDesigner, every data modeling tool, can create list reports. The list report is simply a columnar report with Code, Name, Comment. The power designer Entity List Report has Code, Name, and Comment. The Metadata Set just sources that list report – just like we have seen for the Ontology Metadata Sets importing raw ontology metadata.
Once we have it in a Metadata Set the second step is to transform in the Entity-Relationship Metadata Set, so the E/R Entity MDS populates from the PowerDesigner MDS. There are minimal changes and transformation here. Prefix and URI are Configuration Settings – they must match what we designate in the ontology as a prefix and namespace, for these reverse-engineered classes. In this case, I just call it a fib-omdm. The local name transforms the entity name.
It’s simple a Camel-Coding, we eliminate spaces and capitalize string components. Finally, the resource name it’s a concatenation of Prefix delimiter and Localname. Finally, to load into the ontology, we have our ontology metadata set, here the class set, and we see that a query populates the Class Metadata Set from the Entity MDS. It is the same transformation in reverse. The class name is from the Entity Name, the Namespace is a constant that we have defined plus the Localname, and then the SKOS definition sources from the Entity Comment.
In addition, we have several meter data sets they have a T_ for Triple as a prefix. They break down the class record into triples of Subject, Predicate, and Object. The triplets, if we look closely, match the SPARQL SELECT Joins. Here are the triplets to create classes. We see the Subject, Predicate, Object. For example, the Auction class is an RDF Type of OWL class. Likewise, the triples for the SKOS definitions, The auction, “skos:definition,” and then we have the definition text.
If we look at the OWL classes query again, the one we use to extract Ontology Metadata about classes, what we see here is our join “class a owl:class” that is the same subject-predicate-object that we now use to create classes. Likewise, the join class SKOS definition, and the variable “skos_definition” corresponds to our SKOS definition triple set. Then we can take these triples and assert them in the ontology platform. We can do this either using our Triple-Store Bulk Insert, or in this case, SPARQL CONSTRUCT. This is simply CONSTRUCT statement wrapped around the SKOS definition triplets. When we execute this query, we see here the classes created in the ontology tree and we see for an individual class, here the auction class, that the definition has populated.
This is the bi-directional mode: Transformation enables Semantic Enterprise Information Architecture. We have our ontology, the FIBO, maybe other industry ontologies, our in-house ontologies. On the right-hand side, we have FIB-DM the data model, Enterprise Data Model, and Project Data Models.
We generate data models from industry, domain, and our proprietary ontologies. We design conceptual models in RDF/OWL. We reverse-engineer our data models to extend the Enterprise and Project Ontologies.
*** NOTE: The USPTO has issued the CODT Patent, making this part of the video obsolete ***
Just briefly, the United States Patent and Trademark Office acknowledged the Utility Patent Application for CODT. It is quite comprehensive: 23 drawings, 19 tables, 35 pages of specification. This non-provisional patent application fully discloses the invention. In about half a year, the USPTO will publish the pending application. Twenty claims comprehensively cover the Method, System, and Non-Transitory Storage Medium, and all embodiments. Once granted, the patent protects CODT Licensees and generated models including the FIB-DM. That enables me to share the inner workings with you in POCs.
*** end of note ***

The license agreement for CODT is similar to the FIB-DM license agreement. FIB-DM licensees can purchase CODT as an add-on. New users can license FIB-DM and CODT in a bundle. There is no standalone CODT license. Software Deliverables are the MS-Excel CODT workbooks. It’s a Site License – it doesn’t limit the number of users. You are free to modify the software and create new models for internal use. Just like with the FIB-DM license, you must keep derived models and changes to code confidential. The license includes Education Resources. You are free to modify, translate, edit, even lift off images and diagrams, as long as they remain within your organization. Finally, the license covers the intellectual-property. In other words, you get granted license rights to the whole space carved out by the CODT patent. You can leverage Metadata Sets, queries, formulas algorithms disclosed in the source code, the specification for internal development. But you must not share or sell any of these embodiments.
Pricing: Licenses just like for FIB-DM are priced by institution size. I simply use your EDM Council Membership-Tier as a segment. So, if I look at buy-side, for banks or investment companies that simply are the assets under management. The EDMC defines three tiers: Up to $50 billion in assets, 50 to 200, and above 200 billion. The add-on price for existing FIB-DM licensees is two-thirds of your data model license. So, it would be 10,000 for a Tier-C bank. The bundle price for new users is 1.5 times the standalone FIB-DM license price. Central Banks, Multilateral Lenders, and other qualifying Financial Institution get the Tier-C price without further discounts, irrespective of the asset size. Large Commercial Lenders and Investment Companies can get the early-adopter or the U.S. Stimulus Discount.
About the offer for Prove of Concept: You can try, test, and evaluate CODT free of charge. The scope should be confined to CODT itself – no no not the whole SEIA, that’s a colossal enterprise transformation. CODT is a central piece of the puzzle. The POC is about CODT only. Well, FIB-DM already proves that CODT creates a superior data model. You have evaluated the FIB-DM. The objective for this POC is to Prove the Concept that CODT works also for your FIBO extensions, for you to test the application and evaluate the intellectual property. The materials are the Excel Workbooks, the Education materials, and the Patent Application.
That is for your Legal and Compliance Department to assess. Training and support: There are two days of training via online video conferences and three days of support like Q&A, sending back and forth Metadata Sets via email. For your POC team, I suggest you should have a Management, Finance, or Business Sponsor. You should be authorized to sign non-disclosure and license agreements, that would typically be a director level at U. S. banks. An Ontologist with an in-depth understanding of the FIBO and your in-house ontologies because you are the one to adapt the template queries to your SPARQL dialect. You will produce the raw ontologymetadata.
The Data Architect with experience in Enterprise Reference Models. You configure CODT to match your naming standards, and you load Metadata Sets into the data modeling tool. Finally, a developer or MS-Excel Power Users with experience in VBA, Power Query and the M-language. You can troubleshoot complex formulas, queries, and explore the other technical embodiments. About the technical requirements and preparation: I recommend a power PC. Mine has 32 Gigabyte RAM, it’s on Windows 10, 64 bit, and it has MS-Excel and MS-PowerQuery. In particular, the data modeling tool is a bit of a bottleneck.
For very huge models, no matter whether it’s in Power Designer, Sparx, or ERWin, you need a very powerful PC. Likewise the transformation in MS-Excel – the round trip for all three Metadata Sets it’s around 10 minutes. This can be considerably slower on a weak PC. The Ontology Platform should have a SPARQL Query User Interface. I simply use TopBraid but also Protégé or any RDF-Store or Semantic Endpoint should work. For the data modeling tool, the reference tool is SAP PowerDesigner. If you have ERWin or another modeling tool, I recommend using a PowerDesigner trial first. You can download a PD trial. Import the data model and then later, you may customize CODT to import into your tool. The FIBO itself, it should be loaded onto your Ontology Platform before the POC. Try the entity queries and reproduce the raw metadata. This is around to get any around any differences between different SPARQL. dialects so try out the query beforehand
Your proprietary ontology should be an extension of the FIBO. In other words, it should import the FIBO. So, make sure to include FIBO modules and have a Prefix defined for your Namespaces. Here’s an example for the Bank Ontology. There’s a Prefix and the URI. The entity query must return FIBO alongside your classes with the Prefix. That’s how you can test it. Typically the Proof of Concept would have a six weeks timeline. Two weeks for preparation, then comes a kickoff and in four weeks pretty intense proving the concept. For that reason, POCs are rolling with maximal two banks at a time. In other words, I can support two banks at a time, and once one is finished then the next one can engage in the POC. Two weeks are for introduction into CODT and transforming the FIBO as a Proof of Concept, and then we repeat that transformation exercise with the addition of your proprietary ontologies. You can explore configuration changes and other embodiments.
Conclusion: The Semantic Center of Excellence and Ontologies must not become another Silo. Our vision is Semantic Enterprise Information Architecture, with the ontology at the apex and derived implementation models. The FIBO is the industry standard, and FIB-DM is the superior industry- standard data model. CODT leverages the ontology for data management. Copyrights and Patents protect your investment. Well, so let’s discuss a CODT POC. Just send an email to jziemer@jayzed.com and we can have an overview and discussion with your questions and answers. Now, I cannot just email you the Excel sheets – you do need a team and you need an executive sponsor to sign off on the non-disclosure agreements. You will find further resources on the FIB-DM website, the YouTube education channel, and follow the LinkedIn showcase for news updates and discussions. Well, thanks, and have a nice day.