Skip to main content


MAS3916 : Discrete Stochastic Modelling & Survival Analysis

  • Offered for Year: 2020/21
  • Module Leader(s): Dr Phil Ansell
  • Lecturer: Professor Robin Henderson
  • Owning School: Mathematics, Statistics and Physics
  • Teaching Location: Newcastle City Campus
Semester 1 Credit Value: 10
Semester 2 Credit Value: 10
ECTS Credits: 10.0


To gain an understanding of some of the areas of stochastic modelling in discrete time that underpin quantitative descriptions of population growth, epidemics and the analysis of DNA sequences

To provide an appreciation of the need for and an understanding of, the principal statistical methods required in the analysis of survival data.

Module summary
Many random processes can be thought of as evolving through a sequence of successive generations. For example, population growth depends on which individuals successfully produce offspring in the next generation; the transmission of a disease through a population depends on how individuals interact from day to day. Models which incorporate random variation allow prediction of important quantities such as the size of a population or duration of an epidemic, as well as variability in these estimates. This module presents techniques for modelling such processes with applications drawn in particular from the biological sciences. Branching processes will be introduced as a means of modelling population growth. Stochastic models which describe epidemic growth by considering the size of the infected and immune subpopulations will then be studied. These have been important in recent years for the analysis of influenza and other epidemics.
DNA sequences can be considered as strings of letters from the four-letter alphabet {A,C,G,T}. Markov chains provide a useful stochastic model to describe the probability of the letter at the next
site in the sequence given the letter at the current site. However, the presence of genes and other functional elements within a sequence suggest that more sophisticated models are required, in particular, models which allow these transition probabilities to vary along the length of the sequence. Originally developed for automatic speech recognition, hidden Markov models have proved to be a remarkably flexible and powerful model for automatic segmentation and gene-finding in DNA sequences. As their name suggests, hidden Markov models are based on an unobserved Markov chain. Methods for estimating this "hidden" Markov chain will be considered. Computational algorithms will be developed in R.

There are many areas where interest focuses on data which measures the time to some event. In recent decades the principal application for such data has been how long patients survive before some event occurs. The event may be death or it may be the recurrence of a disease which had been in remission, or some other event. Applications are not solely medical: how long it takes a battery to run down or how long a component in a machine lasts before it fails are just two industrial examples. Such data are known as survival data, or sometimes lifetime data, and their analysis is called survival analysis. The main complication with survival data is that many observations will be ‘censored’, i.e. they are only partially observed. For example, when a trial of a new treatment for cancer is terminated many of the patients will still be alive. Therefore the survival times of those who died will be known exactly whereas for those still alive at the end of the trial, their survival time is only known to exceed their present survival. Methods for dealing with this form of data will be considered.

Outline Of Syllabus

Review of Markov chains. Probability generating functions, random sums of discrete random variables. Branching processes and extinction probability. Stochastic models of epidemics: the SIS, Greenwood and Reed-Frost models. Comparison with deterministic models. Duration and size of epidemics.
Markov chain models; model choice. Hidden Markov models; simulation; inference via maximum likelihood; forward-backward algorithm; local and global decoding; Baum-Welch algorithm. Application to DNA sequence analysis.

Time-to-event data, censoring patterns. Non-parametric survival analysis: calculation of Kaplan-Meier estimates; use of log-rank statistics. Parametric survival analysis: exponential, Weibull and log-logistic distributions; likelihood analysis of effect of covariates. Proportional hazards model: partial likelihood; diagnostics; time-varying effects. Frailty. Prediction and explained variation.

Teaching Methods

Please note that module leaders are reviewing the module teaching and assessment methods for Semester 2 modules, in light of the Covid-19 restrictions. There may also be a few further changes to Semester 1 modules. Final information will be available by the end of August 2020 in for Semester 1 modules and the end of October 2020 for Semester 2 modules.

Teaching Activities
Category Activity Number Length Student Hours Comment
Structured Guided LearningLecture materials361:0036:00Non-Synchronous Activities
Scheduled Learning And Teaching ActivitiesLecture91:009:00Synchronous On-Line Material
Scheduled Learning And Teaching ActivitiesLecture91:009:00Present in Person
Guided Independent StudyAssessment preparation and completion301:0030:00Completion of in course assessments
Structured Guided LearningStructured non-synchronous discussion181:0018:00Non Synchronous Discussion of Lecture Material
Scheduled Learning And Teaching ActivitiesDrop-in/surgery41:004:00Office Hour or Discussion Board Activity
Guided Independent StudyIndependent study941:0094:00Lecture preparation, background reading, course review
Teaching Rationale And Relationship

Non-synchronous online materials are used for the delivery of theory and explanation of methods, illustrated with examples, and for giving general feedback on assessed work. Present-in-person and synchronous online sessions are used to help develop the students’ abilities at applying the theory to solving problems and to identify and resolve specific queries raised by students, and to allow students to receive individual feedback on marked work. Students who cannot attend a present-in-person session will be provided with an alternative activity allowing them to access the learning outcomes of that session. In addition, office hours/discussion board activity will provide an opportunity for more direct contact between individual students and the lecturer: a typical student might spend a total of one or two hours over the course of the module, either individually or as part of a group.
Alternatives will be offered to students unable to be present-in-person due to the prevailing C-19 circumstances.
Student’s should consult their individual timetable for up-to-date delivery information.

Assessment Methods

Please note that module leaders are reviewing the module teaching and assessment methods for Semester 2 modules, in light of the Covid-19 restrictions. There may also be a few further changes to Semester 1 modules. Final information will be available by the end of August 2020 in for Semester 1 modules and the end of October 2020 for Semester 2 modules.

The format of resits will be determined by the Board of Examiners

Description Length Semester When Set Percentage Comment
Written Examination1202A80Alternative assessment - in class test
Other Assessment
Description Semester When Set Percentage Comment
Written exercise1M10written exercises
Written exercise2M10written exercises
Assessment Rationale And Relationship

A substantial formal examination is appropriate for the assessment of the material in this module. The course assessments will allow the students to develop their problem solving techniques, to practise the methods learnt in the module, to assess their progress and to receive feedback; these assessments have a secondary formative purpose as well as their primary summative purpose.

Reading Lists