Streaming Data Modelling Research
Extracting value from the data firehose.
Data scientists are becoming increasingly interested in modelling and analysing live data streams. Often, we want to analyse in (near) real time.
Our work in this theme of Newcastle Data considers:
- streaming data engineering
- online algorithms for the analysis of streaming data
- impactful applications of streaming data modelling research
From lakes to streams
Traditional data science relies on the analysis of complete data sets, at rest in data lakes. But as we instrument and measure more, this model breaks down. No data set is ever complete, and so attention focuses more on making sense of data as it flows and accumulates.
The shift towards managing streamed data requires fundamental changes to the approach to data engineering and statistical modelling.
Recent years have seen significant developments in both hardware and software architecture tailored to streaming data modelling research. There are also different software libraries that purport to simplify deployment in production. Similarly, various online statistical modelling and machine learning approaches are being developed. They can process and make inferences from streaming data in (near) real time.