Data Study Group
Newcastle University and Leeds Institute for Data Analytics will host a Turing collaborative hackathon that brings together PhD researchers to focus on real world challenges
Date: 20 - 24 March 2023
Location: The Catalyst, Science Square, Newcastle upon Tyne NE4 5TG
What are Data Study Groups?
- Intensive 'collaborative hackathons', which bring together organisations from industry, government, and the third sector, with talented multi-disciplinary researchers from academia.
- This Data Study Group will be hosted at The Catalyst in Newcastle, on behalf of the Turing
- Organisations act as Data Study Group 'Challenge Owners', providing real-world problems and datasets to be tackled by small groups of highly talented, carefully selected researchers.
- Researchers brainstorm and engineer data science solutions, presenting their work at the end of the week.
Researchers are typically PhD students and early career academics from statistics, computer science, engineering, mathematics, and computational social science, as well as wider disciplines where data science and AI skills are increasingly becoming relevant.
You will need to be able to commit to an intensive five days at The Catalyst, based in the Newcastle city centre.
Stage 1: The precursor stage (part-time)
- The precursor stage takes place online during the week before the event (13th – 17th March 2023)
- This will consist of online workshops / presentations / team-building in order to prepare for the ‘event stage'.
Stage 2: The event stage (full-time)
This in-person event will take place at The Catalyst, 3 Science Square, Newcastle Helix, Newcastle, NE4 5TG
- The 'event stage' will run in person over one week (20th March – 24th March).
- The event will finish by 3pm on Friday 24th March.
Why Apply?
The Data Study Groups are popular and productive collaborative events and a fantastic opportunity to:
- rapidly develop and test your data science skills with real-world data
- take your knowledge from research to industry
- collaborate with peers across a network of talent
- forge new networks for future research projects and build links with The Alan Turing Institute – the UK’s national institute for data science and artificial intelligence.
It’s hard work, a crucible for innovation and a space to develop new ways of thinking.
Detecting and Locating Earthquakes with Machine Learning
Leeds Institute for Data Analytics
Detecting the occurrence of earthquakes and finding out where they happened is needed in all sorts of settings, ranging from eruption monitoring at volcanoes to tracking induced earthquakes below geothermal power plants. In many cases where monitoring of seismicity (occurrence of earthquakes) is mandated by regulation, such as where hydraulic fracturing for shale gas takes place, millions of small-magnitude earthquakes may occur over a few hours.
Existing techniques to automatically detect and locate earthquakes are still relatively computationally expensive, require manual intervention, or crucially rely on uncertain prior information about the structure of the ground through which the seismic waves pass. Recently, several algorithms to automatically classify seismic recordings of earthquakes have been developed, which are then used with traditional methods to locate the events using essentially triangulation of the wave arrival times, and sometimes the earthquake–station distance. However, these approaches make use of just a single recording at a time, and do not use the relative positions of the recordings at all. They also still rely on being able to accurately predict how long energy takes to travel from the earthquake to each station, which requires accurate knowledge of the structure of the ground.
The challenge in this hackathon is to obtain the occurrence and location of earthquakes from the data, without manual steps or the use of separate earthquake location algorithms. Meeting this challenge would mark a huge improvement to our ability to monitor earthquakes and manage the risks they pose.
Exploring Multimorbidity and Patterns of Long-term Conditions in England using Open Prescription and Primary Care Data
Leeds Institute for Data Analytics
The prevalence of multimorbidity (the presence of ≥ 2 long-term health conditions) is increasing because of population ageing and improved medical care. Multimorbidity is associated with a range of adverse health states and outcomes, including polypharmacy, frailty, dependency, poor quality of life, and premature mortality. As such, developing a greater understanding of multimorbidity and how it may be treated is a major strategic priority. Although no concrete definition of polypharmacy exists, it is commonly operationalised as “the routine use of five or more medications”. Polypharmacy and multimorbidity are overlapping concepts; the greater the number of diagnoses a person receives, the more medications they are likely to be prescribed. As such, the prescribing patterns at the population level can tell us a great deal about the multimorbidity state of the underlying population.
Using open prescribing data available at the General Practice (GP) level, this Data Study Group (DSG) challenge aims to use the prescribing patterns of common medications used in the treatment of long-term conditions (LTCs) as a proxy to explore multimorbidity in England. The specific objectives of this challenge include:
- To use the English Prescribing Dataset (EPD) to identify the prescription rates of key medications used in the treatment of LTCs, and model these as a proxy for multimorbidity.
- To use these data to map multimorbidity in England at a suitable area level (e.g. wards, local authorities)
- To explore whether changes in prescribing patterns may be used to track trends in multimorbidity, at the population level.
- To explore determinants of multimorbidity in England, based on linked GP-level and area-based data (e.g. Quality Outcome Framework [QoF], Public Health England [PHE], Office for National Statistics [ONS] data).
- To identify medication class combinations that co-occur more often than should expected by chance (suggesting the enrichment of a specific multimorbidity phenotype, in a region).
Applying Graph Neural Nets in the freshwater environment
The Rivers Trust
Freshwater environments are among the most diverse and most threatened systems on the planet. A major challenge in the management of freshwater environments is their highly interconnected and dynamic nature. Individual pressures can have wide-ranging impacts at multiple scales within the river environment, both at the point of impact and downstream. To ensure the sustainable management of these environments, a clear statistical understanding of the spatial extent of pressures is required.
This requires developing models that embrace the interconnected nature of rivers. Graph Neural Nets (GNNs) have become a popular avenue of research in machine learning, capable of handling complex spatial structures and uneven interactions between nodes, as well as being extended to investigate how these dynamics can vary over time. The recent development of high-density sensor networks in freshwater ecosystem allows for the exploration of these techniques in the freshwater environment.
This challenge will investigate the applicability of GNNs in the freshwater environment, investigating limiting factors such as the impact of dams and weirs on model accuracy, and the optimum number of sensors and distribution to be able to generate an interpolated baseline for a catchment.
Exploring uncertainty in optimal operations of electrical systems
Northern Powergrid
As electricity networks evolve to respond to the needs of the UK’s net zero commitment by 2050, new methods of managing and expanding our electricity system at the pace, cost and system efficiency are being deployed throughout the country. Novel electrical engineering systems such as smartgrids, economic ‘nudges’ such as time of use charging, environmental impact tracking such as carbon monitoring and governance systems such as distributed system operation are being created to achieve these net zero ambitions.
Generative AI for Biofilm Analysis
Biofouling is the unwanted colonisation of marine/aquatic organisms on immersed substrates – which includes on ships hull. Biofouling on ships' hulls increases hull surface roughness, which in turn increases frictional resistance and ultimately increases fuel consumption and total emissions of a ship. A biofilm is a type of biofouling and is a slimy layer made of living microorganisms embedded in an extracellular polymeric matrix. Biofilms on ships are known to increase the drag penalty by up to 40%.
Surveys of ships around the world show that, even visually, not all marine biofilms are the same. Variance in biofilms, attributed to differences in both the composition and community, results in differences to the frictional drag.
Historically it was a real challenge to measure the surfaces of biofilms but more recently, biofilm imaging has opened up with the adoption of a technology called optical coherence tomography (OCT). OCT lets us image down through living biofilms over a surface area approximately the size of a penny. We have been using OCT imaging to get a closer look at the biofilms that grow on different coatings and developed methods to image biofilms in conjunction with measuring the drag penalty.
There are complexities in imaging biological samples, and thorough imaging can be quite time consuming. The invention of generative AI, we believe, could help us to build an image dataset of a range of different biofilms that we wouldn’t otherwise be able to collect from imaging real-world biofilms alone.
As generative AI is a relatively new technology, we haven’t attempted to use it for this application, as such, we are wanting to explore it’s potential.
"The challenge was particularly interesting from a research point of view as there could be many different approaches to solving this problem. Therefore, the project drew a fantastic selection of candidates to our team from different fields."