SigTech pioneers faster discovery process and efficient access to IHS Markit Data Lake

SigTech pioneers faster discovery process and efficient access to IHS Markit Data Lake

28 June 2021
By Jake Stacey, Head of Data Engineering

The recent trend of nowcasting - instead of the traditional forecasting - requires quants to analyse more data faster and more efficiently than ever before. New alternative datasets are becoming available every day. How do systematic traders and quantitative researchers keep their edge if they’re burdened with an ever-increasing amount of data wrangling?

In traditional data warehousing, onboarding vast datasets is onerous. It often involves maintaining a separate data environment and duplicate data storage which can take weeks to implement. For example, before being able to analyse a new dataset from a data vendor, traders and researchers would typically have to raise new support tickets, wait for the right engineers to become available to build and kick off new data pipelines, and so on.

When SigTech and IHS Markit came together to make IHS Markit datasets available for SigTech platform users, data teams from both sides quickly agreed on two overarching objectives:

1. Fast data discovery

Newly added IHS Markit datasets had to be instantly available within the SigTech platform and data catalog. New datasets shouldn't require any additional setup.

2. Efficient data access

Datasets should be served directly from IHS Markit Data Lake upon request, with no intermediate copying steps and no risk of stale data.


To achieve these two objectives, we pioneered a technical solution that integrated two existing data architectures hosted on AWS.

Fast data discovery is enabled via catalog synchronisation. The IHS Markit Data Lake Catalogue is kept in sync with the SigTech catalog in real-time, allowing users to search IHS Markit datasets and metadata alongside all existing SigTech datasets.

Efficient data access is enabled via query federation and caching. When an IHS Markit dataset is queried, data is fetched directly from the IHS Markit Data Lake. Query filtering and aggregation are done at the data lake level with caching to ensure similar queries do not result in re-fetching of data. Query results are transferred to SigTech users via compressed parquet format to ensure maximum data throughput. This method is easily applicable for other integrations.

Data architecture
Data architecture

The IHS Markit Data Lake is now available to all SigTech users. Get in touch to find out how SigTech could help identify signals for your next strategy.