Spider Schema Data Model
The Spider Schema Data Model:
- Alternative to OLTP: Alternative to OLAP.
- Alternative to the Star and Snowflake Schema.
Can you imagine a CRM or ERP system that not only does the day to day processing of Sales, Procurement, Marketing, Transportation, and Planning … , but also provides robust and true Business Intelligence or Analytical Reporting with no further ETL, Data Warehouses, Data Marts, etc …?
On the project I am working on, the CEO recently said: “Is it too much to ask, seriously here; to create a system where data is not only painstakingly entered in by competent staff, but then to use that same system to do normal analytical reporting? I mean why we do need a system to put the data into, and then another system that does the reporting? Yea I know, that’s a lot to ask.”
I could not have agreed with this person more. Are there reasons why data is modeled into one format OLTP, and then latter on modeled into an OLAP format? When one compares the schemas of both models, there is no way to make the two “talk or communicate” with each other. Instead data is ETL (Extracted, Transformed, and Loaded) into the other model.
Several years ago I was tasked to come up with a solution to take data already modeled into an OLAP model via the Star Schema, and provide cross dimensional hierarchy support in Microsoft Analysis Services without adding anything new each time a new hierarchy was needed.
The solution to the problem was the first version of what I have called the Spider Schema. I named it the Spider Schema because in data modeling tools, the schema looked just like a Spider.
What is the Spider Schema you ask? Well the Spider Schema was the process of removing the Relational Foreign Keys from the Fact Table, and creating another Dimension from it. I called this dimension the Intermediate Dimension. This made the Intermediate Dimension the center of the data set instead of the Fact Table.
However things have evolved since then ……..
Over the years I learned that the Spider Schema had far more to offer when modeling data. I was able to create cascading Intermediate Dimensions which I could use with numerous Fact Tables saving storage space with Foreign Keys that would have been stored in all those Fact Tables over and over again.
On one project I worked on, a Big Data proof of concept was completed to take a large amounts of data, and put it into a single table. The concept was to prove that small tables (tables with only a few columns ) were faster than large tables (tables which had numerous columns).
The project was a success as the data stored in the one small table was significantly faster than the large table that stored the same data. But it was not very useable from an Transactional or Analytical perspective. However this idea stuck in the back of my mind over the years, and then one day I found a way to take data modeled into the Spider Schema, and do a similar thing with it.
Currently I am working on a project as the Data Architect building out a new ERP system using the Spider Schema as a data model. A proof of concept has already proven that a tool like MS Excel can be connected to it via Pivot Tables, with no changes, and provide analytical reporting in addition to Tabular / Analytical Cubes.
Imagine: no ETL, no scheduled job that runs every night to pull and refresh the data in the Data Warehouse. Up to the minute data can be analyzed by just refreshing an analytical report, dashboard, or Excel Pivot table based workbook.
For years I have created Analytical solutions for applications like People Soft, JD Edwards, Great Plains, Oasis, DSI, CIMPRO, Siebel, and many others. None of these systems had built in Analytical capabilities, and all of them are very expensive software applications.
When one takes a look at the budget to buy or create a custom ERP / CRM System, and then create a custom Analytical Reporting system the numbers are staggering. Imagine doing ½ the work for one system, and being done with it, using half the budget!