Aysa X. Fan

PhD Candidate · University of Illinois Urbana-Champaign

My research examines how AI systems affect human learning, cognition, and behavior in educational settings. I build evaluation frameworks, including LLM benchmarking, human evaluation protocols, and behavioral annotation pipelines, that measure whether AI produces good outcomes for the people who use it, not just correct outputs. I'm particularly interested in the gap between what AI systems optimize for and what learners actually need, and in translating empirical observations into improvements to AI systems and policy. My work has been published at venues including EMNLP, EACL, SIGCSE, AIED, and ICQE, with two best paper nominations. Before my PhD, I spent five years in industry working in NLP data annotation and product analytics. Outside of research, I co-founded a learning center where I taught children's art — never touching a student's work, instead guiding observation and self-expression through language and demonstration.

LLM Evaluation Human-AI Annotation AI & Education Educational Data Mining Learning Analytics

Selected Publications

Building evaluation frameworks to measure whether AI produces good outcomes for learners

NLP & LLM Evaluation

JEDM (Under Review)

Advancing Debugging Strategy Analysis: LLM-Based Classification and Benchmarking

Aysa X. Fan, Qianhui Liu, Luc Paquette, Juan D. Pinto

2026

CoLM (Under Review)

Resolving Cross-Dataset Annotation Conflicts through Guideline-Data Co-Reconciliation

Ranran Haoran Zhang, Aysa X. Fan, Xuanming Lu, Rui Zhang

2026

ICQE 2024Best Paper Nominee

Using LLM-based Filtering to Develop Reliable Coding Schemes for Rare Debugging Strategies

Aysa X. Fan, Qianhui Liu, Luc Paquette, Juan Pinto

2024

EACL 2023

CONENTAIL: An Entailment-based Framework for Universal Zero and Few Shot Classification with Supervised Contrastive Pretraining

Ranran Haoran Zhang, Aysa X. Fan, Rui Zhang

2023

LLMs for Programming Education

SIGCSE 2024

Enhancing Code Tracing Question Generation with Refined Prompts in Large Language Models

Aysa X. Fan, Rully A. Hendrawan, Yang Shi, Qianou Ma

2024

LLM4Ed Workshop 2024

Evaluating the Quality of Code Comments Generated by Large Language Models for Novice Programmers

Aysa X. Fan, Arun Balajiee Lekshmi Narayanan, Mohammad Hassany, Jiaze Ke

2024

EMNLP 2023 Findings

Exploring the Potential of Large Language Models in Generating Code-Tracing Questions for Introductory Programming Courses

Aysa X. Fan, Ranran Haoran Zhang, Luc Paquette, Rui Zhang

2023

For a full list of publications, see my Google Scholar profile.

Perspectives

Essays and position papers on AI, education, and learning

Essay

The Recovery Mechanism: Technology, Education, and What Happens When the Pattern Breaks

Aysa X. Fan · April 2026

For centuries, each new technology has automated some layer of cognitive work, and education has retreated upward to teach the skills machines could not yet reach. Generative AI may be the first technology to break that pattern. Drawing on historical analysis, labor economics, and large-scale data on how students and workers actually use AI, this essay surfaces a paradox: the same technology that augments today's skilled workforce may be quietly eroding the developmental process that produces tomorrow's.

Read full essay →

Experience

From industry NLP to education research

Sep 2019 – Present

Graduate Research Assistant

University of Illinois Urbana-Champaign, HEDS Lab

Advisor: Dr. Luc Paquette

Benchmarked 7 LLMs (including Claude) on classifying student debugging strategies; built end-to-end inference pipeline on A100 GPUs with vLLM and structured JSON output; developed iterative prompt strategies through systematic error analysis
Fine-tuned Qwen3 models (0.6B–14B) using Self-Taught Reasoner (STaR) with LoRA; fine-tuned BERT, RoBERTa, and ModernBERT as encoder baselines
Co-designed a human-AI annotation framework for resolving cross-dataset label conflicts in hallucination detection, jailbreak classification, preference modeling, and medical NER; improved annotation agreement (κ) by up to 0.21
Designed and deployed an LLM-powered Socratic tutoring system for CS1 students with explicit behavioral guardrails through literature-grounded iterations
Designed human evaluation frameworks for LLM-generated educational content across 3 studies; led expert annotation of 2,350 student debugging episodes (κ > 0.75); built LLM-based filtering pipeline to surface rare events

Jun 2017 – Jul 2025

Director of Education & Co-founder

Novel Panda Learning Centre

Co-founded a non-profit learning center; designed curricula centered on developing students' observation and self-expression rather than replicating technique; taught classes and trained instructors

May 2016 – May 2019

Product Quality Analyst

Kik Interactive

Analyzed large-scale user interaction logs and clickstream data (300M+ registered users) to support A/B testing and user segmentation using SQL/Python
Built product quality dashboards in Redash/Kibana; produced user experience reports to surface behavioral patterns and inform data-driven decisions

Apr 2014 – May 2016

Data Analyst → Data & Quality Analysis Team Lead

Maluuba (acquired by Microsoft, 2017)

Led the data team responsible for all data collection and annotation for NLP models across 10 supported languages; managed internal annotators and organized external crowdsourcing
Designed annotation guidelines, crowdsourcing quality control frameworks, and contributor selection tests; iterated guidelines as NLP methods evolved
Coordinated end-to-end model testing: executed test plans, analyzed results, identified failure patterns, and drove improvements across key OEM projects
Led the M-Fit fitness app project at the Maluuba Hackathon (2nd place); received Sherlock Award as best QA (2014)