Re-comp: sustained value extraction from analytics over time
Project Dates: From January 2016 to December 2018
Project Leader: Dr. Paolo Missier (PI)
Staff: Collaborators: Prof. Patrick Chinnery (Head of Neurology Department, Cambridge University), Mr. Philip James (Civil Engineering & Geosciences), Prof. Paul Watson (Computing Science)
Sponsors: EPSRC
Background.
As the cost of allocating computing resources to data-intensive tasks continues to decrease, large-scale data analytics becomes ever more affordable, continuously providing new insights from vast amounts of data. Increasingly, predictive models that encode knowledge from data are used to drive decisions in a broad range of areas, from science to public policy, to marketing and business strategy. The process of learning such actionable knowledge relies upon information assets, including the data itself, the know-how that is encoded in the analytical processes and algorithms, as well as any additional background and prior knowledge. ?Because these assets continuously change and evolve, models may become obsolete over time, leading to poor decisions in the future, unless they are periodically updated.?
Focus of the project.
This project is concerned with the need and opportunities for selective recomputation of resource-intensive analytical workloads. The decision on how to respond to changes in these information assets requires striking a balance between the estimated cost of recomputing the model, and the expected benefits of doing so. In some cases, for instance when using predictive models to diagnose a patient's genetic disease, new medical knowledge may invalidate a large number of past cases. On the other hand, such changes in knowledge may be marginal or even irrelevant for some of the cases. It is therefore important to be able, firstly, to determine which past results may potentially benefit from recomputation, secondly, to determine whether it is technically possible to reproduce an old computation, and thirdly, when this is the case, to assess the costs and relative benefits associated with the recomputation.
The project investigates the hypothesis that, based on these determinations, and given a budget for allocating computing resources, it should be possible to accurately identify and prioritise analytical tasks that should be considered for recomputation.