Setting the scene
Born in the late 1980s and into the 1990s to offer support for decision-making processes, data warehousing was then a new paradigm specifically intended to provide vital strategic information to help achieve competitive advantage. Data warehouses were designed to ingest data from operational data sources and used as a foundation for Business Intelligence, better known today as self-service reporting and data analysis.
A typical data warehouse ecosystem goes beyond just a set of databases, schemas, querying and reporting tools used through the organisation. A data warehouse ecosystem is a complex and integrated set of tools, technologies, and software that are used to source data from disparate data sources, transform, and make it commonly available to end users. Typical data warehouse may include:
- Extract Transform and Load (ETL) components to source the data from source systems and load into a data warehouse
- Data warehouse that provide a central repository of data (current and historical), transformed and aggregated, ready for analysis
- Online Analytical Processing tools (OLAP) allowing users to perform complex analysis
- Querying and Reporting tools that enable users to create reports
- Data visualisation and dashboards which provide users with an easy to read, high level, visual interface
- Mobile BI enabling users’ business insight on the move
In our many conversations with clients, the most common driver for data warehouse modernisation are:
- Evolving data
- New data assets
Technical modernisations designed to enable greater scale and speed to increase capacity for growing data volumes, technical performance, data capture, wider user access, self-service reporting and deep data analysis. Most often, the technical modernisation would see the on-premise move to the cloud to meet these requirements.
Evolving data types including volume, variety and velocity, and the data platforms built for these data types are another emerging driver for change. Diversification of data types and formats (nonrelational, unstructured, social) and the diversification of data sources (IoT sensors, machines, GPS) challenge data warehouse architectures.
Modern, collaborative organisations drive desire for capturing and generating new data assets to drive competitive advantage. Re-platforming has become a genesis for modernisation, particularly when older data platforms are not suitable or simply unable to capture required data volumes or formats. This drives the trend towards the cloud or SaaS adoption with elasticity and scalability at reasonably low cost.
Modern Data Warehouse Architecture
Introductory advances in technology, architecture, design and the building blocks for modern data warehouse have evolved to address the demand for data driven economy to ensure that the architecture and technical design is modern, extendable, scalable, resilient and manageable.
What would we say makes the Modern Data Warehouse Architecture modern? We focus on the following principles when we consider modern -
- Cloud First Strategy - Leverage modern Serverless and Platform-as-a-Service technologies to minimise use of virtual machines. This has the benefits of low IT maintenance overhead and true consumption-based pricing
- Flexible Data Schemas - Organisations have evolving use cases and requirements for different data types. Using Data Vault will provide data architecture flexibility to expand and on-board additional source systems.
- Modularity in design - Modular Architecture to utilise specific components to achieve a set of tasks and provides the foundation for pluggable architecture to allow easy extension and scaling of the platform
- Out-of-the-Box (OOTB) components - Minimising custom coding and using as much OOTB platform and product features as possible in preference for minimal but powerful components as well as taking advantage of marketplace built-in packages and extensions to extend functionality where possible
- Resilience and availability – Highly redundant available components in the architecture that supports global business operations without failures impacting performance
- Design for evolution – Scalable architecture that encapsulates business logic in independent components where services are turn on as required to delivery outcomes and incremental business value
- Data comes in different “shapes and sizes” – Dealing with different types of data including fast streaming data and slow batch structured, unstructured and semi-structured data
- Human errors - Automation tools and data extraction and transformation routines that build the handrails [boundaries] for improved data quality and avoid errors by building rules around converting errors into wholesome data
Microsoft is leading the charge with proprietary products that help modern enterprises to architect Modern Data Warehouse Platforms that support abovementioned design principles in a cost effective and efficient manner.
Introducing Microsoft’s Azure Synapse Analytics
A limitless cloud analytics service that brings together enterprise data warehousing and Big Data analytics.
With a focus on end-to-end, holistic approach to data, Azure Synapse Analytics gives an enterprise the freedom to query data using either serverless on-demand or provisioned resources—at scale. Bridging the gap between business and technology, Azure Synapse Analytics brings these two worlds together with a unified experience using Synapse Studio to ingest, prepare, manage, and serve data for immediate BI and machine learning needs.
- Supporting multiple languages to suit different analytics workloads including SQL, Python, R
- PaaS offering that supports SQL & Spark integrated analytics runtimes that can be provisioned serverless and on-demand
- Separates Storage and Commute that can be Paused and Resumed as required to reduce unnecessary costs
- Integrates platform services for workload management, monitoring and security
- Uses Data Lake Storage for Common Data Model awareness
- Unified Analytics Experience through Azure Synapse Studio that sits over the top provides a unified workspace for data prep, data management, data warehousing, big data, and AI tasks. Data engineers can use a code-free visual environment for managing data pipelines. Database administrators can automate query optimisation. Data scientists can build proof of concepts in minutes. Business analysts can securely access datasets and use Power BI to build dashboards in minutes—all while using the same analytics service.
- Imagine Limitless Scale – insights across all your data ecosystem with blazing speed. Using familiar SQL syntax, data analysts can query both relational and non-relational data at petabyte scale, and limitless concurrency
- Delivering Powerful Insights - enabling BI and machine learning is a breeze through deep integration with Power BI and Azure Machine Learning to greatly expand discovery of insights from all your data and apply machine learning models to all your intelligent apps
- Supporting New Data Assets creation through Open Data Initiative and easily sharing data with just a few clicks.
- Unmatched security and privacy by harnessing most advanced security and privacy features available in the market place including automated threat detection and always-on data encryption as well as column-level security and native row-level security, and dynamic data masking to automatically protect sensitive data in real time.
Supporting Modern Data Warehouse Architecture
Azure Synapse Analytics supports Modern Data Warehouse Architecture and excels in ease-of-use and platform integration, insulating users from complexity and providing a richer set of capabilities that are much easier and less expensive to build and run.
- Out-of-the box components and content packs able to deliver a curated experience with pre-built dashboards to get up and running quickly.
- Design for evolution offering unlimited scale without constraints that are typically associated with legacy architectures
- Dealing with different types of data in various formats with Azure Synapse Analytics able to process the data into formats that can be leveraged in new and exciting ways.
- One hub for all data giving ability to connect to those data sources and bring that data into a cloud scale data warehouse solution for deriving insights, from a variety of data sources.
- Familiar tools and ecosystem to help leverage investment quickly. Data engineers can use a code-free visual environment for managing data pipelines, Data scientists can build proofs of concept in minutes whilst Business analysts can securely access datasets and use Power BI to build dashboards in minutes.
Don’t just talk about it
As a Data & AI leader, where in your Modern Data Warehouse Architecture journey are you at? Does Azure Synapse Analytics product offering resonate with you and your data ambitions, and would you like to explore Azure Synapse in a Day? I’d be keen to hear your thoughts and reflections – please get in touch.