WhereScape is a well-established name in the data warehousing space. Their software and solutions help organisations around the world to centralise and access their data faster. WhereScape's data warehousing software takes data from multiple sources and consolidates it into a single repository that can then be queried to generate reports for business intelligence.
Traditional data warehouses are updated daily (often overnight) but the emergence of real-time sensor data, IoT devices and cloud based solutions require almost instantaneous updates to provide up-to-the minute business insights. WhereScape wanted to develop a modern solution that was capable of handling real time data streaming while still offering the automation capabilities the company was renowned for, allowing data collection, manipulation and storage, across multiple sources, in a matter of seconds.
The traditional approach
Businesses use data warehousing to consolidate information about different systems into a single source. This allows businesses to get more insight into their operations, and to analyse data over a period of several years in order to detect significant trends that can then be translated into actionable business plans.
A retailer that has multiple outlets across the country. By analysing the sales, stock and staff performance data the retailer will be able to see which stores are the best performers and what products sell best at specific locations.
The future of data warehousing
With the emergence of real-time sensor data, IoT devices and cloud based solutions the traditional approach of updating a data warehouse overnight is no longer sufficient. These businesses require almost instantaneous updates to provide up-to-the minute business insights. This is particularly critical for systems that rely on real-time data to identify anomalies.
A company that performs real-time checks on credit card transactions to identify fraud, or a truck manufacturer that is able to identify a critical error and stop the driver before an accident happens.
This was a greenfield, very technical project that was being developed for an emerging market. Working on the project required learning more about databases, how they work and their structures, and the rules of specific data warehousing methodologies. It also involved being keenly aware of implementation details and constraints. Getting to know the field meant learning more about the company and the business logic surrounding the problems I wasn't a domain expert but this turned out to be useful to bring a new perspective.
I was heavily involved in the early stages of scoping and requirement gathering, and worked closely with the product owner and project manager during the phases of scoping and defining business requirements. The team (including developers) would often get together to sketch out our understanding of certain parts of project and to make sure that we were all on the same page.
Whiteboarding and sketching sessions with the team were a staple during the whole process. They helped inform the quick prototypes I created in order to test assumptions with team members, to define the information architecture and to visualise how the product could look.
Understanding the users
The discovery stage was an intense process where the team and I discussed our ideas about the application and cemented the shared terminology we were using. I spent a fair bit of time getting familiar with the problem and domain knowledge and thinking about how we could transform what were, at that stage, abstract ideas, into something concrete.
Making sense of it all. An early user flow on the left. Although the application grew and became more complex with time, the fundamental flow stayed the same. On the right, an early attempt at defining the information architecture of the application, and a summary of different areas of the application and actions that could be performed.
As we progressed in the discovery stage, connections (the sources and destinations of data) and dataflows (containers of data) emerged as the main drivers of the application.
The sources of data and the destination endpoints of data transformed by users.
The containers of data inside the application. Data (tables, scripts) inside dataflows can be manipulated and transformed.
The first wireframes and low fidelity prototypes reflected this understanding. These were also my first attempt at visually representing what we were building, and were used to test the proposed flows with stakeholders and team members.
Although the earlier research identified connections and dataflows as the two main drivers of the application, reducing it to only those two components was not feasible. I separated the application into six main logical areas, based on the actions users had to perform in each of them. This approach allowed us to show or hide these logical areas based on a users' permissions and proved solid when we had to add an extra area that wasn't part of the initial scope.
User and platform management area.
Configure sources and destinations of data.
Browse and import data contained in connections.
Design the structure and flow of imported data, modify it as needed and set its destination.
Execute the abstract models created in the Design area to create concrete data.
Monitor the status of deployed models and their data.
Self-generated metadata about all items inside the project for auditing purposes.
Building up the user interface
The new product was built on web technologies, and this allowed us flexibility to move away from traditional desktop software interfaces and deliver a modern looking product. This opportunity also had its challenges. I had to investigate how we could translate existing interaction paradigms of desktop apps onto the web, but often a direct translation wasn't the right choice e.g. double clicking to select items.
Fortunately web based technologies offered us the flexibility to come up with our own interaction paradigms. We were able to build different views of the application for different use cases some form heavy, others more visual and blend them together, while at the same time keeping a consistent look and feel.
With the project moving fast and the development team one sprint behind, the interactive prototype developed in Axure quickly morphed into the UI design reference for the developers. The final polish was added directly by tweaking the styles and markup of the web application I often committed code to version control in order to ensure the consistency of the look and feel, and also to make changes to the text the application.
Documenting design decisions
The prototypes (developed in Axure) quickly evolved and became more complex. They were soon being used to identify technical constraints as soon as possible and avoid costly changes at development time. The prototypes were also use to demo functionality to future users and stakeholders.
While the prototype was good for visualising interactions, the proposed functionality and interactions weren't always clear, so I started documenting them in Confluence the project wiki. Although this was time consuming, it became invaluable to ensure the team members were all on the same page.
User interface & experience challenges and solutions
Viewing the data and modelling its flow
Data warehousing involves manipulating data tables and records through the use of transformations scripts. Data may have to go through several transformations until it fits a format adequate to generate business intelligence reports.
One of the biggest UX challenges was how we could allow users to inspect the data and the transformations it went through along the way, as well as the relationships between and within the data.
Traditional applications in this space present the data in a list view, and each time a user drills down one level, for example, to view the column definitions inside a table, context is lost. Additionally, the transformations the data has gone through are often hidden.
During whiteboard sessions the team always ended up defaulting to drawing the flows of data in a diagram fashion; it soon became obvious that this was the natural way of presenting and manipulating flows of data. Conversely, we always defaulted to a list view when we were discussing the properties of the data.
Our solution involved a hybrid approach, where we displayed the transformations the data had gone through in a diagram view, but provided extra information about the structure of the data in a side panel. The detailed view allows users to drill down several levels while still keeping context of where they are by having a breadcrumb trail at the top of the panel.
Large diagrams soon become unreadable and were, from a technical point of view, a performance concern. Solving this challenge required researching successful node based interfaces, their features and analysing how they overcame the hurdles we encountered.
The solution included incentivising users to break their data structures into smaller, more manageable chunks as soon as they start modelling the data. The diagram itself has features informed by the earlier research, such as search, filtering functionality and a mini-map.
Understanding the technical implications of design decisions
We researched several out-of-the-box software libraries for the diagram functionality but none provided the functionality we wanted the ability to show and collapse nodes and display their children in a table-like format. Fortunately the development team was on board with the vision I had for the diagram and customised the rendering functionality of an existing library to look and behave the way we wanted.
Going with a custom renderer meant creating the templates for the nodes in Illustrator, carefully naming each of the component layers and exporting them to SVG. I worked closely with the developers to understand what technical constraints we had and documented the nodes' structure in Confluence, to ensure that further changes wouldn't break the diagram.
Automating the data warehousing process
Data warehousing methodologies can be hard to understand and are complex to implement from scratch. To make matters worse, there are several competing methodologies and each follows different implementation patterns.
One of the data warehousing methodologies, called Data Vault, is synthesised in a 700 page book.
We removed this high barrier to entry by allowing users to grab data from different sources and use it as a starting point for their model of the data stream. The model can easily be constructed by following guided steps which abstract away the complexity of data warehousing methodologies, allowing users to focus more on their model and less on the technical details of the implementation.
You can read the book or use WhereScape Automation with Streaming to create a Data Vault by following six clear and succinct steps. Below is a screenshot of one of these steps.
The result is a product that was released to the market less than a year since its very inception and generated sales straight away. WhereScape Automation with Streaming is a modern solution that unleashes the power of data warehousing automation and reduces the time to create a data warehouse by 90%.
Screen recording of a demo of an earlier version of the product, showing how users can get a functional data warehouse built in five minutes.
Positioning WhereScape as a trendsetter
WhereScape Automation with Streaming has definitely put WhereScape on the map of real-time data streaming. Since the release of the product WhereScape has been:
Top 5 Vendors to Watch
Recognised as one of the "Top 5 Vendors to Watch" in the third annual Datanami Readers' and Editors' Choice Awards.
Trend-Setting Products in Data and Information Management for 2019
Chosen as one of the "Trend-Setting Products in Data and Information Management for 2019" by Database Trends and Applications.
Best Cloud Automation Solution
Shortlisted for the "2018-19 Cloud Awards" under "Best Cloud Automation Solution".