AI in Assessment
NEW: A vision for education and skills at Newcastle University: Education for Life 2030+
Assessments allow us to measure and evaluate student knowledge and competencies in a subject discipline, often by assessing a product or artefact – most commonly in written form. However, Generative AI technologies pose a significant risk to the integrity of those assessments by hindering our ability to accurately and confidently determine if students have met their intended learning outcomes.
This page provides guidance to help you consider your approaches to assessment in a world where students have ready access to AI tools. It brings together academic and professional recommendations from across the Higher Education sector.
The need for change
Awards at Newcastle University are made, and classified, based on evidence that students have met or exceeded the learning outcomes for their programme of study. However, as stated by the QAA, “the rapid rise and ubiquity of Generative Artificial Intelligence software means that some or all the assessments that currently contribute to the evidence base may no longer be confidently ascribed to an individual student.” This situation will only become more challenging as AI platforms evolve and become more seamlessly embedded in the systems we use every day.
In line with Newcastle University's new Education Strategy and 5 Principles for the Use of AI, we therefore need to review our assessments to ensure they measure student knowledge and competencies in a rigorous and AI-resilient manner. Colleagues should also consider their assessment mix and aim to reduce the emphasis on new high-stakes exams, favouring more diverse, authentic, and lower-weighted assessments – assessments that offer students better learning opportunities, that encourage them to improve through feedback, and that enhance their overall perception of assessment value.
Reviewing your assessment strategies
In responding to the threat of AI to academic integrity, many colleagues will choose to focus on redesigning assessments so that they are less likely to be influenced by AI technology, while others will transition to assessments that permit the use of, or actively incorporate, generative AI tools in their completion. Either approach is acceptable, but the latter will ensure greater relevancy and sustainability in the long term.
When reviewing your assessments to remove tasks that are vulnerable to AI misuse, consider if the assessment is necessary in the first place. If it doesn’t align with the module’s learning objectives, or those objectives are assessed in other modules, reduce the assessment burden and focus on other pedagogical activities instead.
Selecting new modes of assessment
When responding to the threat of AI, colleagues are encouraged to consider more authentic and sustainable student assessments. However tempting it may be, you are encouraged to avoid taking regressive steps and reverting to handwritten, invigilated exams, as they assess a limited range of competencies for which students are increasingly unprepared. Instead, consider the following examples of authentic and sustainable assessments that are supported by the literature and wider sector guidance:
Authentic coursework activities that include AI by design
Well-designed coursework activities can offer a variety of authentic assessment opportunities, but consideration needs to be given to their AI resilience. A simple way to do this is to design authentic use of AI tools into coursework using problem-based learning activities relevant to the discipline and aligned with graduate skills. However, in these circumstances, ensure your marking criteria assesses the student’s work and not the outputs of AI tools. Consider also that reflective work and the critiquing of AI-produced content is well within the capabilities of today’s AI, and thus does not “AI proof” your assessments.
Hybrid submissions that combine the output from AI tools with a student’s own work are also becoming increasingly commonplace, but these require more scaffolding to make it clear where the AI ends and the student’s work begins. In these circumstances, the contribution of AI needs to be fully acknowledged in accordance with our institutional policies and guidelines. We provide detailed guidance to students on Acknowledging use of AI which you are encouraged to reference and mandate.
Entering your own assignment briefs into generative AI tools, and working with them to produce an example student submission, can reveal how capable AI is of completing your assignments. However, consider this a baseline check; quality of prompt, the use of powerful (and often paywalled platforms), and the daisy-chaining of AI tools, can usually produce far greater AI responses. Also consider that, due to the speed of AI developments, what may have been “AI resilient” yesterday may not be today or tomorrow. Stay vigilant.
Coursework for academic writing
Academic writing is crucial in higher education. It not only encourages students to explore and learn about a subject, but also teaches them to demonstrate critical thinking and synthesise ideas and knowledge – essential metacognitive skills for future education and the modern workplace. However, AI presents a significant risk to the integrity of written work (with research indicating student misuse can gain them an unfair advantage), presenting a dilemma for colleagues who rely on this form of assessment.
Assessments featuring written components will need to be reviewed and adjusted. Generally, it's best to assess the process rather than the product of a writing assignment, unless the originality and authorship of the submission can be guaranteed. One approach is to incorporate short-form writing exercises into the teaching process – nested tasks that feed forward and build toward a larger submission, providing ample formative opportunities for students to practice their writing and receive timely, constructive feedback on drafts.
Invigilated exams
Assessing students using traditional in-person invigilated exams seems an obvious solution to the threat of AI. However, for essay-style exams, the need to handwrite large amounts of memorised information represents a regressive solution in terms of authenticity, with well-documented drawbacks to student learning and engagement. If we prioritise assessment security at the expense of authenticity, equity and alignment to learning outcomes, we risk compromising our assessments in ways that disadvantage the majority of students. Colleagues are therefore encouraged to choose other forms of assessment instead.
However, there will always be scenarios in which in-person examinations are necessary and appropriate – especially for large cohorts in lower stages. In these circumstances, you are encouraged to adopt a digital exam approach for accessibility reasons and ease of implementation. If you would like to learn more about creating a digital exam, please refer to our online guidance or speak to a member of the LTDS Digital Exams Team.
Oral exams and vivas
In-person oral exams, vivas and structured interviews offer a means to assess a student’s knowledge and understanding. They can act as a powerful deterrent to the use of AI, but can also be very stressful for students, and so appropriate safeguards need to be put in place (e.g. clearly designed rubrics, consideration for different native languages and accents, and alternative arrangements for vulnerable and disabled students). They can also be very resource intensive for colleagues to conduct, especially for large cohorts.
Oral examinations also allow you to potentially confirm the student was responsible for a written submission. This can be done on an individual basis or, more effectively, as part of a group assessment. But for both, the assessment criteria is key – be sure to measure student knowledge and competencies (e.g. presentations skills and the ability to answer questions) and not the product of AI (e.g. the text content of their presentation).
Observed exams
Observed exams require students to complete one or more authentic tasks related to their discipline or future employment, which are assessed to a well-defined and inclusively designed rubric (usually by multiple examiners). Students can also be interviewed after completing their tasks, with assessment taking the form of an oral exam. This approach explores student understanding of related principles and the application of knowledge.
For example, observed exams are very common in medical disciplines (e.g. Observed Structured Clinical Examinations), where students progress between a series of stations completing work-related tasks which they must orally defend. This approach has also been successfully applied to other subjects equally well, especially in education, the sciences and languages. However, consider that these types of exams are usually very resource-intensive, and careful consideration must be given to scheduling to prevent students from sharing interview questions.
Clear assessment guidance
When setting assignment and coursework tasks, be clear to students what AI technologies can and cannot be used – and why! Talk to students in clear, unambiguous terms about your expectations for AI usage and acknowledgement, and remind students of our comprehensive AI guidance.
Take a look at our Writing an Effective Assessment Brief to see examples of how to include AI guidance in your assignment and coursework briefs, which you are encouraged to copy and customise for your context.
Reinforcing the importance of the learning process
In most learning design methodologies, assessment should be used for learning, not of learning. Help students to understand why they are being assessed, how it will build their skills and competencies, and how it will it help them achieve success in their future education and careers. Continually stress the importance and value of the assessment process, and share links to any marking criteria and rubrics. Where possible, give examples of good practice with AI, and make explicit your wider expectations and recommendations for collaboration and the use of digital tools.
Students must also learn the fundamentals of their discipline to be able to understand and critically evaluate AI outputs. Continually remind them of this, and use careful scaffolding of formative and low-stakes assessments to ensure learning outcomes are met.
Programme level design
Reviewing and aligning assessment methods and criteria with teaching strategies and learning outcomes can have a wider impact when considered at the programme level. Mapping assessments to programme-level outcomes, for example, can help identify assessment gaps, as well as assessment bunching and redundancy. It can also serve to introduce more assessment diversity (e.g. videos, blogs, vlogs, podcasts and animations) that are not only less susceptible to AI misuse, but also encourage creativity and the development of oral communication skills. Potential AI susceptibility in some assessments may also be tolerated if competencies are synoptically assessed elsewhere.
Programme leaders who would like to consider programme-level assessment design in response to the impact of AI can contact LTDS for more information and support.
Identifying and detecting AI
Newcastle University does not recommend or support the use of automatic AI checkers. Text and digital media generated by AI cannot be reliably detected (Weber-Wulff et al; Scarfe P et al), and they provide insufficient detail about how “scores” are generated and what they mean. Low detection rates and a high incidence of false positives in recent research has also confirmed our concerns, and so the AI detection features of our current platforms have been disabled. Due to GDPR and data security concerns, you should also avoid entering student submissions into any third-party AI tools yourself.
Expecting markers to identify AI-generated text is also difficult in most scenarios, and this is not an expectation we should place on colleagues. However, we have a shared responsibility to ensure the academic integrity of our assessments, and so we all need to stay vigilant, be aware of obvious red flags, and respond to any suspicions of AI misuse by following the University’s misconduct procedures.
AI for marking
There are many tools emerging on the market which claim to assist with the marking of essays and other written content. Newcastle University’s DiscoverAI working group is currently exploring these platforms, and how – if recommended – they can integrate with current systems. In the meantime, for reasons of GDPR and data privacy and security, colleagues are encouraged not be upload student submissions into third-party AI tools for any purpose.
Using AI as a feedback assistant
One area where colleagues are using AI to improve marking productivity is in the “fleshing out” of student feedback. In these instances, short personalised notes from manually marked submissions are entered into a customised chatbot, which expands on the notes with further detail. The AI has been trained on the assignment brief, rubric, and an extensive range of model feedback responses, and so can “speak” with authority in the voice of the academic.
This is an acceptable use of AI, but care must be given to ensure that feedback generated is relevant, consistent, and loses none of its original personalisation. At no point should AI feedback be given without reviewing the output for accuracy.