This week we begin our two-part talk with Star Director of Product Strategy and Marketing Tim Ward, covering what data warehouses do that other databases don't and what it all means for STAR end users
"A data warehouse is not a sellable product in its own right. It's an enabler. An underlying component. It allows you to do other things." So says Tim Ward, Star's Director of Product Strategy and Marketing, and someone intimately familiar with the compliance software provider's new data warehouse. This week and next we go in-depth with him and discover exactly what those other things are and how STAR users will benefit now and moving forward.
Tim, let's start with the basics. What exactly is a data warehouse?
It's a store of data that's optimized for the purposes of reporting and analytics, as well as importing, storing, and retrieving data. In a nutshell, that's what a data warehouse is for.
What makes a data warehouse different or better than other kinds of databases?
It's not a matter of being better or worse. A data warehouse just serves a different purpose. Our employee conflicts monitoring solution has its own database, but it's been designed and optimized to serve that particular application, which means it can quickly store and retrieve individual operations and transactions while supporting tens of thousands of concurrent users.
Application databases like these are what's called highly normalized or referential, which means a single record might be distributed across multiple relational tables. It's a very efficient way of storing information while conserving space, and its performance is well suited to many familiar kinds of day-to-day business operations. If you're using a personal banking app, it's accessing a relational database. If you're ordering something off of Amazon, it's accessing a relational database. It's also what our employee conflicts solution uses—reliably and securely recording logins, individual transactions for gifts, escalations and reviews, business rules settings, etcetera.
What relational databases aren't well suited for is allowing users to report on data. Here's why. Let's say a compliance officer is interested in all of the gifts that have been given and received within a particular date range. They might also want to cut and dice the data by who's given it, who's received it, what the companies are, what the different gift types are, and so on. In an application database, while the data is all in there, it's distributed across lots of different tables, and getting it into report format means there are lots of transformations that need to happen.
Those transformations—the individual data points retrieved from their various locations and the computations required to produce the report the user requested—takes a lot of system effort. But maybe more importantly, they take time. Let's say the whole process takes ten or even five seconds. For that period, all of the involved tables are locked up; that is, no other user can access them and the data inside. And when you're locking up large volumes of data—which is often the case when running a report—that creates a big problem for concurrent users.
So how does a data warehouse avoid this table-locking effect?
The philosophy behind a data warehouse is, rather than running reports against the primary-line application database and tying up those tables for everyone on the system, extract the specific information you need into another database—one designed and built for a specific subset of users. And while you're at it, collapse those data sets into single, flatter records that are easier to query and readymade for the reports those users are likely to call for. So with a data warehouse, one, you're not impacting day-to-day performance for all users. That's very important. And two, you're starting to arrange the data in ways in which report users—the people who are consuming that data—are going to be more interested in.
Then there's the data mart. Data marts further optimize the data for even more niche purposes. Let's say you've got a set of marketing users and they're interested in how many people are logging into the application, how long they're spending in the application, and whether they're using all the features. You take specific data from the data warehouse and create a data mart to specifically serve those marketing users, with transformations optimized for that purpose. And then you might have a second set of users, who are more security based, and they might be interested in the number of illegal logins, or checking to see if people are adopting the correct security policy. They'll take a different subset of the data warehouse as their own data mart.
You can see the analogy with the world of physical products. The data warehouse is the main distribution center. It has every single product in it. The data mart is the retail shop, with specialized data products served up to individual consumers from the main data warehouse.
What are the advantages of a data warehouse when it comes to analytics?
Data warehouses are optimized for reporting as well as analytics. Say, for example, you want to run a report of all of the gifts that have been given or received over a particular period of time; as we've already discussed, a data warehouse can deliver that in a highly efficient way. But let's say you want to do some additional analysis. What's the average gift value over a period of time? By office location? By job type? And maybe you also want to look at what the deviations are between certain people who are above average and below average in terms of gift giving. Or who's increased or decreased their gift giving or receiving over a particular time period.
Because that's more than just cutting and dicing and retrieving data, that's potentially a much more valuable set of insights. This analytics capability allows compliance users the ability to start rating situations from a risk perspective. Where do I need to look? Who do I need to look at? What exactly do I need to be aware of? That information is generated using calculations and mathematical formulas that, again, take time to compute, but because you're not doing it in the application database you can compute much more sophisticated values and have those ready to be served. If you were trying to calculate these kinds of dashboard-level analytics from the main database in real time, the queried tables would be locked up not just for seconds, but for minutes or hours.
Are data warehouses "trending," for lack of a better word?
Not exactly. Data warehouses have been around for a while. What is "trending" is analytical and predictive capabilities. That is, once you've got the data, can you use it to reduce your risk in meaningful ways? Can you use it catch things in the here and now? Can you use it to predict things that might happen in the future? Having a data warehouse is a prerequisite to be able to do any of these things. It's a stepping stone along the way to truly game-changing capabilities.
And that's why we took the time to build our data warehouse. We wanted to do more than just power our dashboard analytics. We wanted to create a robust framework that would allow us to deliver more sophisticated capabilities over time. We were looking ahead, trying to ensure we had the initial framework in place to meet the future needs of our clients. Like I said earlier, a data warehouse is an enabler. It allows you to do other things. Now we can begin to do them.
Tune into this space again in two weeks' time for part two of our conversation with Tim Ward, as we discuss Star's data warehouse design and development process in further detail.