The student will bring the agent
Chat-based AI feels a lot like playing Leisure Suit Larry. Here is why the future of AI in higher education isnt a chatbot
Chat is the parser era of AI in education. The interface that replaces it will not belong to the university, and the sooner universities work out what that leaves them to build, the better the next decade goes for everyone.
Chat is the parser era
In the early Sierra adventure games (King’s Quest, Space Quest, for anyone who missed the 80s) you typed commands into a text parser and hoped you had guessed the vocabulary the designers had in mind. Type “get key” and the game responded. Type “pick up the key” and it stared back at you (or, if you were playing Space Quest, took the opportunity to insult you personally). The parser died for a specific reason. It made the player responsible for knowing what the system could understand, and that responsibility was exactly what the player did not have. The fix was direct manipulation, pointing and clicking at the world, and the fix after that was systems that act on your behalf without being asked at all. (To be honest though, I still LOVE my old Sierra games and never thought point and click Larry had the same feel).
The prompt box has the same defect. It asks the learner to know what to ask, and knowing what to ask is a metacognitive skill that novices, by definition, lack. The students who prompt well are the students who already have the schemas, the vocabulary of the discipline, and a working model of their own knowledge gaps. The students who need the most help face a blank text field and the burden of guessing the magic words. Basically we have rebuilt the Sierra parser and called it a tutor.
I have spent a career on both sides of this. I started deploying LMSs in 2003/4, I spent five years running product engineering for Microsoft’s education business across Asia-Pacific, and these days I build memory and orchestration systems for AI agents for giggles, and so other people can demo the art of the possible. From where I sit, the LMS was the application era of educational technology, chat is its command-line interlude, and what comes next is a different shape entirely. This piece is my attempt to describe that shape for higher education. Schools need a different answer (the developmental stakes change everything, and I have written about that elsewhere), and I am not pretending to give it here.
The inversion, one level up
The argument I have been making in my research is that the language model is the kernel of a larger cognitive architecture, the reasoning engine inside a system of memory, orchestration and tooling, and that most of the industry’s problems come from treating the kernel as the whole computer. The context window is an interface. The expert arrives at the meeting already briefed, and a well-built agent arrives at each turn the same way, with a curated briefing assembled by the memory system before the model is asked to do anything.
Apply the same demotion one level up and you get the post-LMS university. Today the institution owns the application and the student visits it. The LMS is a place, with a login and a navigation bar, and everything the institution knows about teaching is trapped inside it. In the model I am describing, the institution stops shipping an application and starts shipping a substrate, a set of governed, grounded, permissioned capability surfaces that the student’s own agent connects to. The student brings the harness. The university provides what the harness plugs into. I’ve been writing about the intellectual substrate now for two years, I think folks are finally starting to see it organically emerge. I am mid way through updating my original paper. Subscribe if you want to read it when its ready.
The plumbing for this all already exists. The Model Context Protocol, introduced by Anthropic in late 2024 and since adopted across the major AI ecosystems, standardises how agents connect to external data and tools, and its whole reason for existing is the N-by-M integration problem, where every application otherwise needs a bespoke integration with every service (Anthropic, 2024). A university that exposes its curriculum map, unit outlines, assessment states, timetable, library and learning analytics as permissioned surfaces has, in effect, published an operating system for study. The student’s agent is the userland. The protocol moment for agent-to-institution connection has already arrived. Universities are not in the room yet, and the vendors who sold them the last era’s applications are in no hurry to invite them.
What the university is for
A pretty obvious objection will come up here. If the student brings the agent, and the agent is built on a frontier model that has read more of the discipline than any single academic, what is left for the institution to do? Disintermediation is the quiet fear underneath most university AI strategy documents, even the ones that never say the word (Business faculties know what I mean here).
Well, the institution supplies exactly what the frontier model cannot. It supplies warranted knowledge. Which concepts are threshold concepts in this discipline, what counts as evidence here, what the profession will require of a graduate in year one of practice, which readings represent the field and which represent one loud corner of it. A model trained on the open internet holds all positions at once (and a nice sprinkle of Reddit facts). A curriculum is a set of commitments, argued over by people who are accountable for them. It supplies warranted state. Which assessment is open, what assistance is permitted for it, what this student has demonstrated and what they have not. And it supplies a warranted model of the learner.
In the world I am describing there are two models of every student. The student’s own harness holds a private one, rich, longitudinal, intimate, built from every draft and every late-night question across their whole degree (and before it, and after it). The institution holds a second one, narrower but warranted, built from assessed evidence and carrying the institution’s authority. These two models will disagree, and the seam between them, what each is allowed to read, write and contest in the other, is the new design space in educational technology, the first one in more than a decade. Nobody owns it yet, as almost every EdTech vendor is trying to add chatbots and other trinkets.
There is also an entire research literature waiting for this moment. The open learner model work that Susan Bull and Judy Kay led argued for making the system’s model of the learner inspectable, and in the stronger variants negotiable, so the learner could challenge the evidence and argue for a change (Bull & Kay, 2007, 2016). That work was ahead of its infrastructure. It assumed the institution’s model was the only one in play and the learner was a visitor to it. A student who owns an agent holding a better-informed private model is no longer a visitor. Adult learners can be given real standing to inspect and contest the institution’s model of them, with their own agent doing the advocacy, and the institution’s model gets better for having to defend itself. That is a dignity upgrade the LMS era never offered, and never can.
The mentor problem is a permission problem
The word doing the heavy lifting in every agentic education pitch is mentor, and by gosh it deserves way more scrutiny than it gets. A mentor’s defining act is withholding (yup you read that right). Anyone who has taught knows the discipline of watching a student struggle with something you could resolve in one sentence, because the struggle is where the learning is. The research behind that instinct is deep. The assistance dilemma names the genuine tension between giving and withholding information in tutoring systems, and the finding that more help is frequently worse help (Koedinger & Aleven, 2007). The expertise reversal effect shows that guidance which helps a novice actively harms a more advanced learner (Kalyuga, Ayres, Chandler, & Sweller, 2003). In Bjork’s desirable difficulties work, the conditions that make performance feel fluent in the moment are often the ones that prevent durable learning (Bjork & Bjork, 2011).
Now if you put that discipline inside an agent that the student owns, configured to serve the student, optimised on the student’s satisfaction it will never withhold. The moment struggle appears, the owner’s agent will dissolve it, because dissolving struggle is what a good assistant does and the agent has no way to distinguish the struggle that teaches from the struggle that merely hurts. I red-teamed Microsoft’s Study and Learn agent for the university I work with, and the lesson from that exercise generalises. Pedagogy written into a prompt is a suggestion, and it holds only as long as the student cooperates with it. A student who wants the answer gets the answer.
So the conclusion is that the taper cannot live in the agent. It has to live in the substrate, on the far side of the trust boundary, where the student’s harness cannot negotiate with it. Leisure Suit Larry players will remember that the back room at Lefty’s opens to “Ken sent me” and to nothing else. The check lives with the doorman, and poor Larry does not get a vote. When the assessment-state surface says which task is open, it also publishes the assistance envelope for that task, for that learner, at that stage. Full Socratic support on this one. Hints only on that one. Nothing but the rubric on the third (Police Quest failed you for skipping procedure, wrong holster drill, no siren, forgot to radio in the plates before approaching the vehicle, and it remains the most honest simulation of institutional life ever sold in a box). The student’s agent operates inside the envelope because the institution’s surfaces are the only place the warranted material comes from, and the permission travels with the data. Pedagogy stops being a suggestion in a prompt and becomes a property of the architecture, enforced at the one boundary the owner’s agent simply cannot jailbreak, because it sits on the other side of it.
The envelope is the curriculum
Higher education gives the envelope its shape. A first-year student is an adult, and is also a novice in every sense that matters for learning, still building the foundational schemas of the discipline, the biologically secondary knowledge that arrives only through effortful, structured work (Geary, 2008). A graduating student should be something else altogether. The envelope should widen across the degree as capacity is demonstrated, and the widening is in fact the curriculum itself, the structural spine of the degree.
First year, the envelope is narrow by design. The substrate withholds, the struggle is protected, and the student’s agent, however capable, works within tight assistance limits for the tasks that build foundations. Sierra players will see where this is going. The cruellest thing those games did was the walking-dead save, where you missed an item in act one and discovered in act three that the game had been unwinnable for hours. A degree where the machine dissolves every foundational struggle in first year plays out the same way. The student walks into the capstone missing a schema nobody can hand them at that point, and no amount of restoring gets it back. By the middle years the envelope opens as the warranted learner model accumulates evidence, and here the seam I describe througout my Frontier Operations essays earns its keep, because assistance permissions become a function of demonstrated capacity, negotiated between the student’s model and the institution’s. By the capstone the envelope is fully open, the student operates with every tool available, and the assessment measures how they operate. Graduation is the envelope disappearing.
In a meeting last week, Professor Simon Buckingham Shum put the destination better than I can, and I have been paraphrasing him ever since. Higher education exists to produce graduates who can self-regulate and make sound judgements about how and where they use AI. The staged envelope turns that from an aspiration in a graduate attributes policy into the literal structure of the degree. The self-regulation research has always described this arc, the movement from external regulation to self-regulation (Zimmerman, 2002). The LMS could never enforce it. A permission architecture can, and can hand control over on a published schedule the student can see, which makes the scaffolding legible instead of paternalistic. You can look up why your envelope is set where it is, and you can see exactly what demonstrating mastery does to it.
This is what I mean by frontier operators. A frontier operator is a graduate whose capability holds when the problem changes shape underneath them, who can work at the edge of what the machine does well, and who judges soundly when to accept its output, when to override it, and when to put it down. The frontier is not a fixed place (it moves constantly, sometimes mid-semester, which plays havoc with unit outlines). Operating there is a trainable capacity, and a degree structured as a staged transfer of control is the training students need today to be frontier operators.
Demonstrating mastery
Assessment decides whether any of this survives contact with a faculty, so it probably deserves the most careful treatment, and way more discussion than can be afforded here in a substack essay. My work keeps returning to one distinction though, the difference between a high score under cooperative conditions and a capacity that survives perturbation. This same distinction applies to graduates. A student can produce excellent artefacts with a capable harness while building nothing durable of their own. The artefact tells you almost nothing anymore (and it probably never really did). What you need to observe is the operating.
The Australian assessment reform work has been trying to reach this conclusion for three years. The TEQSA-commissioned framework argued for designing AI into assessment where it reflects real-world practice and designing it out only where purely human capacities are being tested (Lodge, Howard, Bearman, Dawson, & Associates, 2023). The follow-up resource maps the structural pathways, including programme-wide reform where assurance moves to the level of the degree (Lodge et al., 2025), and Corbin, Dawson and Liu (2025) put the blunt version in their title, talk is cheap, and structural change is what the moment requires. Programmatic assessment, with the surface deliberately shifted at the moments that count, gives the model its shape.
The assured half comes first. A small number of institution-owned moments where the surface is deliberately changed and the student has to operate anyway. Interactive orals where the scenario shifts mid-conversation. Live problem variation, where the case the student prepared changes on the day. The editing cascade I used in my red-teaming, generalised into an assessment genre, where the student must carry a piece of work through successive constrained revisions and the examiner watches what survives. Sierra understood this kind of assessment in its bones. The world refused to sit still, death arrived without warning, and the famous dialogue box (restore, restart, or quit) taught a generation that the demonstration and the deployment are different things. Everyone who played those games learnt to operate anyway, because the alternative was walking Graham off the same cliff again (or down the stairs. It was usually the stairs). These moments are expensive, which is why there are few of them, and they are the only claims the institution signs its name to.
The evidence half is quite a bit stranger. The student’s harness already holds the richest process record that has ever existed in education. Every draft, every question asked, every suggestion accepted, every suggestion overridden and why. Employers do not want to know whether a graduate can produce a report. The machine produces the report. They want to know how the graduate operates, and the process record is the first artefact in the history of education that shows it. Packaged as evidence, it becomes the connective tissue of a credential.
None of this is a new idea, and I can date the prior art precisely (because I played it last week). Quest for Glory shipped in 1989 with a character import system. You finished the game, it wrote your hero to disk, stats, skills and inventory, and the sequel read the file in and built its world around what you had already demonstrated. Your capability travelled with you, and the next institution honoured it (Quest for Glory did this much better than Mass Effect). The learner wallet is that same character file with cryptography, and the rails for carrying it exist today (and it has done so for a while). The Comprehensive Learner Record 2.0 standard is built as W3C Verifiable Credentials, cryptographically signed and tamper-evident, with the earner controlling their credentials and the agency to store and share them where they choose (1EdTech, 2025). Open Badges 3.0 lets an issuer embed the assessment criteria, the evidence, and a verifiable reference to the recipient inside the credential itself (1EdTech, 2024). The standards bodies spent a decade building a wallet for exactly this cargo, and the cargo has finally arrived in way we can actually meaningfully use.
A student-owned witness is a corruptible witness, and I can hear that objection arriving from the back of the room (yeh I see you rolling your eyes), because you are correct. Leisure Suit Larry shipped with an age-verification quiz, trivia questions calibrated for adults of 1987, and every teenager of 1987 sailed through it with an older cousin and a bit of guessing (I dare you to try and pass that quiz now though nearly 40 years on though). Gates the user administers to themselves get gamed, always, and that is why the architecture needs both halves. The wallet carries the longitudinal story. The institution’s assured moments anchor that story with warranted claims the student cannot self-issue. A credential in this world is a claim about operating capacity, with the process record as supporting evidence and the assured moments as the load-bearing proof. Anyone reading it can see which is which, and that transparency is worth more than either half alone.
Where this can go wrong
Applying my own tests to my own idea seems only fair, so here are the three places this vision fails if built carelessly (entire genres have died of a single bad design decision, ask anyone who hit the cat-hair moustache puzzle in Gabriel Knight 3).
Problem 1: Personalisation has a worse track record than its marketing claims. Aptitude-treatment interaction research spent decades chasing the promise that instruction matched to the individual learner would transform outcomes, and Cronbach and Snow’s (1977) own conclusion was that the interactions were common, complex, and that not one of them was understood well enough to base instructional practice on. That finding should be added as a caveat to every deeply-personalised-learning pitch deck ever written, mine included. The substrate makes better personalisation possible. The pedagogy that makes personalisation beneficial remains an open research problem, and a proactive mentor optimised on engagement signals drifts, quietly and by degrees, into being a sophisticated notification system. The score-versus-capacity trap catches personalisation vendors too. Engagement is the score (because sticky customers are paying customers). Learning is the capacity.
Problem 2: BYO stratifies unless the institution prevents it. The student running a frontier-model harness and the student running a free tier are having different educations, and pretending otherwise repeats the worst equity failures of the laptop era (Space Quest’s whole premise was a janitor saving the universe with whatever was in the supply cupboard, and I love Roger Wilco dearly, but a mop is not an equity strategy). The answer is unglamorous. The institution provides a floor harness, funded the way libraries are funded and for the same reason, so that bring-your-own means bring-your-own-if-you-prefer. The substrate model makes the floor cheaper to provide than the current per-seat LMS licensing regime, which is one of the few places the economics and the ethics point the exact same direction.
Problem 3: The trust boundary is also a working attack surface from day one. The moment institutional data flows into student-controlled agents, prompt injection across the seam, scope creep in what the harness reads, and liability for what a third-party agent does with institutional data all become live governance problems (universities have run out-of-band verification before. Anyone who spun a codewheel, or went hunting for the second word of the fourth paragraph on page 22 of the manual, was doing multi-factor authentication in 1989, before it had a name). Universities have enterprise security language for every one of these, which is more than can be said for most of what the LMS era asked them to govern (and that went well). The permission layer that enforces the pedagogy is the same layer that enforces the security, and that coupling at least means it will be built by people who take it seriously.
What is missing
Every component in this piece exists today. The protocol layer is shipping and adopted, the credential standards are published and certified against, and the assessment research has been written, commissioned by the regulator, and left sitting on the sector’s collective dusty coffee stained desk. The permission architectures are ordinary enterprise engineering, and the learner modelling research has spent 20 years waiting for its infrastructure to turn up.
What is missing is an institution willing to assemble it, and a willingness to accept what assembly implies. The university that builds the substrate stops being the place students go and becomes the thing their agents connect to, and it trades the comfortable monopoly of the portal for the harder job of being worth connecting to. Warranted knowledge, staged permissions, and credentials anchored in moments the student could not script are that job. The daily experience of a degree becomes three or four years of operating a harness against a world designed, on purpose and on a published schedule, to sometimes not cooperate. I have a name for people trained that way. Frontier operators, and producing them is the one thing in this whole stack no model can do alone.
Sierra put the same advice on every loading screen, and it holds for institutions staring down this transition. Save early. Save often.
References
1EdTech. (2024). Open Badges specification v3.0. 1EdTech Consortium. https://www.imsglobal.org/spec/ob/v3p0
1EdTech. (2025). Comprehensive Learner Record standard v2.0. 1EdTech Consortium. https://www.1edtech.org/standards/clr
Anthropic. (2024). Introducing the Model Context Protocol. https://www.anthropic.com/news/model-context-protocol
Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher, R. W. Pew, L. M. Hough, & J. R. Pomerantz (Eds.), Psychology and the real world: Essays illustrating fundamental contributions to society (pp. 56–64). Worth Publishers.
Bull, S., & Kay, J. (2007). Student models that invite the learner in: The SMILI open learner modelling framework. International Journal of Artificial Intelligence in Education, 17(2), 89–120.
Bull, S., & Kay, J. (2016). SMILI: A framework for interfaces to learning data in open learner models, learning analytics and related fields. International Journal of Artificial Intelligence in Education, 26(1), 293–331.
Corbin, T., Dawson, P., & Liu, D. (2025). Talk is cheap: Why structural assessment changes are needed for a time of GenAI. Assessment & Evaluation in Higher Education. Advance online publication.
Cronbach, L. J., & Snow, R. E. (1977). Aptitudes and instructional methods: A handbook for research on interactions. Irvington.
Geary, D. C. (2008). An evolutionarily informed education science. Educational Psychologist, 43(4), 179–195.
Kalyuga, S., Ayres, P., Chandler, P., & Sweller, J. (2003). The expertise reversal effect. Educational Psychologist, 38(1), 23–31.
Koedinger, K. R., & Aleven, V. (2007). Exploring the assistance dilemma in experiments with cognitive tutors. Educational Psychology Review, 19(3), 239–264.
Lodge, J. M., Bearman, M., Dawson, P., Gniel, H., Harper, R., Liu, D., McLean, J., Ucnik, L., & Associates. (2025). Enacting assessment reform in a time of artificial intelligence. Tertiary Education Quality and Standards Agency.
Lodge, J. M., Howard, S., Bearman, M., Dawson, P., & Associates. (2023). Assessment reform for the age of artificial intelligence. Tertiary Education Quality and Standards Agency.
Zimmerman, B. J. (2002). Becoming a self-regulated learner: An overview. Theory Into Practice, 41(2), 64–70.

