WhereScape Automation with Streaming | Portfolio | Sara Coutinho - Senior digital product & interaction designer, UX/UI, and front-end developer

Background

WhereScape is a well-established name in the data warehousing space. Their software and solutions help organisations around the world to centralise and access their data faster. WhereScape's data warehousing software takes data from multiple sources and consolidates it into a single repository that can then be queried to generate reports for business intelligence.

Challenge

Traditional data warehouses are updated daily (often overnight) but the emergence of real-time sensor data, IoT devices and cloud based solutions require almost instantaneous updates to provide up-to-the minute business insights. WhereScape wanted to develop a modern solution that was capable of handling real time data streaming while still offering the automation capabilities the company was renowned for, allowing data collection, manipulation and storage, across multiple sources, in a matter of seconds.

The traditional approach

Businesses use data warehousing to consolidate information about different systems into a single source. This allows businesses to get more insight into their operations, and to analyse data over a period of several years in order to detect significant trends that can then be translated into actionable business plans.

Real-life examples

A retailer that has multiple outlets across the country. By analysing the sales, stock and staff performance data the retailer will be able to see which stores are the best performers and what products sell best at specific locations.

The future of data warehousing

With the emergence of real-time sensor data, IoT devices and cloud based solutions the traditional approach of updating a data warehouse overnight is no longer sufficient. These businesses require almost instantaneous updates to provide up-to-the minute business insights. This is particularly critical for systems that rely on real-time data to identify anomalies.

Real-life examples

A company that performs real-time checks on credit card transactions to identify fraud, or a truck manufacturer that is able to identify a critical error and stop the driver before an accident happens.

Process

This was a greenfield, very technical project that was being developed for an emerging market. Working on the project required learning more about databases, how they work and their structures, and the rules of specific data warehousing methodologies. It also involved being keenly aware of implementation details and constraints. Getting to know the field meant learning more about the company and the business logic surrounding the problems I wasn't a domain expert but this turned out to be useful to bring a new perspective.

Photo of a whiteboard session — One of the many whiteboard sessions that happened with the team this one in particular was about database foreign keys and how they work.

Early days

I was heavily involved in the early stages of scoping and requirement gathering, and worked closely with the product owner and project manager during the phases of scoping and defining business requirements. The team (including developers) would often get together to sketch out our understanding of certain parts of project and to make sure that we were all on the same page.

Whiteboarding and sketching sessions with the team were a staple during the whole process. They helped inform the quick prototypes I created in order to test assumptions with team members, to define the information architecture and to visualise how the product could look.

Understanding the users

Working on a product ahead of its time and early to the market proved to be challenging in terms of user research. Fortunately WhereScape had an in-house consulting team and I was able to conduct guerrilla research (i.e. hijacked lunches) and to interview the current users of WhereScape products in order to understand their traditional workflow and what pain points existed in current products.

Application flow

The discovery stage was an intense process where the team and I discussed our ideas about the application and cemented the shared terminology we were using. I spent a fair bit of time getting familiar with the problem and domain knowledge and thinking about how we could transform what were, at that stage, abstract ideas, into something concrete.

Making sense of it all. An early user flow on the left. Although the application grew and became more complex with time, the fundamental flow stayed the same. On the right, an early attempt at defining the information architecture of the application, and a summary of different areas of the application and actions that could be performed.

Different areas of the application and the actions that can be performed in them

Information architecture

As we progressed in the discovery stage, connections (the sources and destinations of data) and dataflows (containers of data) emerged as the main drivers of the application.

Connections

The sources of data and the destination endpoints of data transformed by users.

Dataflows

The containers of data inside the application. Data (tables, scripts) inside dataflows can be manipulated and transformed.

The first wireframes and low fidelity prototypes reflected this understanding. These were also my first attempt at visually representing what we were building, and were used to test the proposed flows with stakeholders and team members.

An early landing page of the product, where users were incentivised to create a stream of data (later called dataflow).

An early representation of the Create Connection area — An early representation of the "Create Connection" area. This was later moved into its own area.

What would later become the Design area of the application — What would later become the "Design" area of the application, where users can see visual representation of the flow of data through the application.

Side panel showing the details of an attribute (the equivalent of a table column definition).

Although the earlier research identified connections and dataflows as the two main drivers of the application, reducing it to only those two components was not feasible. I separated the application into six main logical areas, based on the actions users had to perform in each of them. This approach allowed us to show or hide these logical areas based on a users' permissions and proved solid when we had to add an extra area that wasn't part of the initial scope.

Setup

User and platform management area.

Connect

Configure sources and destinations of data.

Discover

Browse and import data contained in connections.

Design

Design the structure and flow of imported data, modify it as needed and set its destination.

Deploy

Execute the abstract models created in the Design area to create concrete data.

Monitor

Monitor the status of deployed models and their data.

Documentation

Self-generated metadata about all items inside the project for auditing purposes.

Building up the user interface

The new product was built on web technologies, and this allowed us flexibility to move away from traditional desktop software interfaces and deliver a modern looking product. This opportunity also had its challenges. I had to investigate how we could translate existing interaction paradigms of desktop apps onto the web, but often a direct translation wasn't the right choice e.g. double clicking to select items.

Fortunately web based technologies offered us the flexibility to come up with our own interaction paradigms. We were able to build different views of the application for different use cases some form heavy, others more visual and blend them together, while at the same time keeping a consistent look and feel.

The landing page of the product — The landing page of the new product. Each main section of the product is listed here and has a small textual intro. Shortcuts to the actions that can be performed in those areas are also provided. Note the main menu on the left, present in all views.

Screenshot of the Connect area — Each section of the product is identified by a unique color, displayed in the top application bar. The name of the current section is displayed as well as a breadcrumb trail for context. Below is the Connect area, a form heavy view of the application, where users can connect to sources of data.

Screenshot of the Design area — We were able to blend form heavy views with more visual areas. Below is the Design area, the most complex area of the application and where users spend most of their time. The interactive diagram view is expanded, allowing users to see and model the flow of the data and conform to data warehousing methodologies.

Screenshot of the Deploy area — We were able to hide complexity by progressively disclosing information. Below is the Deploy area, where users can turn the data models they built in the Design area into a real stream of data.

Screenshot of a log — An example of hidden complexity. We provided users with basic information about status e.g "Successfully executed" or "Executed with errors" but allowed them to then drill down to see more information. Below is a Discovery log. These could be quite complex so I proposed ways of filtering the information, such as having date ranges, a search functionality and filters.

Custom icons designed for the application — Some of the icons I designed for the application. We used FontAwesome for standard icons, but had to create custom icons due to the domain specific nature of the project.

Development underway

With the project moving fast and the development team one sprint behind, the interactive prototype developed in Axure quickly morphed into the UI design reference for the developers. The final polish was added directly by tweaking the styles and markup of the web application I often committed code to version control in order to ensure the consistency of the look and feel, and also to make changes to the text the application.

Screenshot of code styles — We were originally using Twitter Bootstrap for the application styles but as we progressed it became more of a hindrance. We ended up creating custom styles instead and although this meant more work upfront it provided us with much needed flexibility. Below, some of the styles I developed for form elements.

Documenting design decisions

The prototypes (developed in Axure) quickly evolved and became more complex. They were soon being used to identify technical constraints as soon as possible and avoid costly changes at development time. The prototypes were also use to demo functionality to future users and stakeholders.

While the prototype was good for visualising interactions, the proposed functionality and interactions weren't always clear, so I started documenting them in Confluence the project wiki. Although this was time consuming, it became invaluable to ensure the team members were all on the same page.

User interface & experience challenges and solutions

Viewing the data and modelling its flow

Data warehousing involves manipulating data tables and records through the use of transformations scripts. Data may have to go through several transformations until it fits a format adequate to generate business intelligence reports.

One of the biggest UX challenges was how we could allow users to inspect the data and the transformations it went through along the way, as well as the relationships between and within the data.

Traditional applications in this space present the data in a list view, and each time a user drills down one level, for example, to view the column definitions inside a table, context is lost. Additionally, the transformations the data has gone through are often hidden.

Screenshot of RED — The UI of WhereScape RED, one of the established tools in the traditional data warehousing space. It follows the convention of panels at the bottom and sides, with a main area in the center. Actions are either on bottom bars, context menus or in the main application menu.

During whiteboard sessions the team always ended up defaulting to drawing the flows of data in a diagram fashion; it soon became obvious that this was the natural way of presenting and manipulating flows of data. Conversely, we always defaulted to a list view when we were discussing the properties of the data.

Our solution involved a hybrid approach, where we displayed the transformations the data had gone through in a diagram view, but provided extra information about the structure of the data in a side panel. The detailed view allows users to drill down several levels while still keeping context of where they are by having a breadcrumb trail at the top of the panel.

Screnshot of the Design area — The design area, showing the data and the transformations applied to it in a diagram fashion. Users can select items nodes and edges and view their properties.

Screenshot of a table definition — The detail view of one of the items in the diagram, in this case a table. This hybrid approach allowed us to show a detailed, form heavy view, while at the same time keeping context of the diagram underneath.

Screenshot of a column definition — Displaying the relationships of items was a challenge. Tables contain columns, and users have to be able to navigate between both parent (the table) and children (the column definitions). We allowed users to drill down to a column of a table while maintaining context note the breadcrumbs at the top of the side panel on the right.

Screenshot of a transformation definition — Another type of relationship were transformations and scripts attached to items (tables and columns). These could be viewed in the diagram and were also represented in the detailed view in the side panel. Below, a transformation is selected in the diagram and its detailed view displayed.

Large diagrams soon become unreadable and were, from a technical point of view, a performance concern. Solving this challenge required researching successful node based interfaces, their features and analysing how they overcame the hurdles we encountered.

The solution included incentivising users to break their data structures into smaller, more manageable chunks as soon as they start modelling the data. The diagram itself has features informed by the earlier research, such as search, filtering functionality and a mini-map.

Recording of a user interaction with the diagram — Screen recording of a user interacting with the diagram. The filtering functionality is used to remove scripts and transformations, leaving only the data containers.

Understanding the technical implications of design decisions

We researched several out-of-the-box software libraries for the diagram functionality but none provided the functionality we wanted the ability to show and collapse nodes and display their children in a table-like format. Fortunately the development team was on board with the vision I had for the diagram and customised the rendering functionality of an existing library to look and behave the way we wanted.

Going with a custom renderer meant creating the templates for the nodes in Illustrator, carefully naming each of the component layers and exporting them to SVG. I worked closely with the developers to understand what technical constraints we had and documented the nodes' structure in Confluence, to ensure that further changes wouldn't break the diagram.

Screenshot of the nodes structure in Illustrator — Screenshot of some of the nodes and their structure in Illustrator.

Automating the data warehousing process

Data warehousing methodologies can be hard to understand and are complex to implement from scratch. To make matters worse, there are several competing methodologies and each follows different implementation patterns.

One of the data warehousing methodologies, called Data Vault, is synthesised in a 700 page book.

We removed this high barrier to entry by allowing users to grab data from different sources and use it as a starting point for their model of the data stream. The model can easily be constructed by following guided steps which abstract away the complexity of data warehousing methodologies, allowing users to focus more on their model and less on the technical details of the implementation.

You can read the book or use WhereScape Automation with Streaming to create a Data Vault by following six clear and succinct steps. Below is a screenshot of one of these steps.

Screenshot of a step in the Data Vault Wizard

Results

The result is a product that was released to the market less than a year since its very inception and generated sales straight away. WhereScape Automation with Streaming is a modern solution that unleashes the power of data warehousing automation and reduces the time to create a data warehouse by 90%.

Screen recording of a demo of an earlier version of the product, showing how users can get a functional data warehouse built in five minutes.

Positioning WhereScape as a trendsetter

WhereScape Automation with Streaming has definitely put WhereScape on the map of real-time data streaming. Since the release of the product WhereScape has been:

Top 5 Vendors to Watch

Recognised as one of the "Top 5 Vendors to Watch" in the third annual Datanami Readers' and Editors' Choice Awards.

Trend-Setting Products in Data and Information Management for 2019

Chosen as one of the "Trend-Setting Products in Data and Information Management for 2019" by Database Trends and Applications.

Best Cloud Automation Solution

Shortlisted for the "2018-19 Cloud Awards" under "Best Cloud Automation Solution".

Positioning WhereScape as a trendsetter in the data warehousing industry

Background

Challenge

The traditional approach

Real-life examples

The future of data warehousing

Real-life examples

Process

Early days

Understanding the users

Application flow

Information architecture

Connections

Dataflows

Setup

Connect

Discover

Design

Deploy

Monitor

Documentation

Building up the user interface

Development underway

Documenting design decisions

User interface & experience challenges and solutions

Viewing the data and modelling its flow

Understanding the technical implications of design decisions

Automating the data warehousing process

Results

Positioning WhereScape as a trendsetter

Top 5 Vendors to Watch

Trend-Setting Products in Data and Information Management for 2019

Best Cloud Automation Solution

Previous projectCacophony Continuous Improvements

Next projectTrimble Offset Workflow

Back toPortfolio