Aysa X. Fan

PhD Candidate · University of Illinois Urbana-Champaign

My research examines how AI systems affect human learning, cognition, and behavior in educational settings. I build evaluation frameworks, including LLM benchmarking, human evaluation protocols, and behavioral annotation pipelines, that measure whether AI produces good outcomes for the people who use it, not just correct outputs. I'm particularly interested in the gap between what AI systems optimize for and what learners actually need, and in translating empirical observations into improvements to AI systems and policy. My work has been published at venues including EMNLP, EACL, SIGCSE, AIED, and ICQE, with two best paper nominations. Before my PhD, I spent five years in industry working in NLP data annotation and product analytics. Outside of research, I co-founded a learning center where I taught children's art — never touching a student's work, instead guiding observation and self-expression through language and demonstration.

LLM Evaluation Human-AI Annotation AI & Education Educational Data Mining Learning Analytics
Aysa X. Fan at the Alma Mater, UIUC

Education

2019 – 2026 (Expected)
Doctor of Philosophy, Curriculum & Instruction, DELTA
University of Illinois Urbana-Champaign
Advisor: Dr. Luc Paquette
2016 – 2018
Master’s Degree, Information and Data Science
University of California, Berkeley
Honours Bachelor of Science, Specialist in Mathematics and Its Applications in Finance and Economics, Major in Statistics
University of Toronto

Selected Publications

Building evaluation frameworks to measure whether AI produces good outcomes for learners

NLP & LLM Evaluation

JEDM (Under Review)
Advancing Debugging Strategy Analysis: LLM-Based Classification and Benchmarking
Aysa X. Fan, Qianhui Liu, Luc Paquette, Juan D. Pinto
2026
CoLM (Under Review)
Resolving Cross-Dataset Annotation Conflicts through Guideline-Data Co-Reconciliation
Ranran Haoran Zhang, Aysa X. Fan, Xuanming Lu, Rui Zhang
2026
ICQE 2024Best Paper Nominee
Using LLM-based Filtering to Develop Reliable Coding Schemes for Rare Debugging Strategies
Aysa X. Fan, Qianhui Liu, Luc Paquette, Juan Pinto
2024
EACL 2023
CONENTAIL: An Entailment-based Framework for Universal Zero and Few Shot Classification with Supervised Contrastive Pretraining
Ranran Haoran Zhang, Aysa X. Fan, Rui Zhang
2023

LLMs for Programming Education

SIGCSE 2024
Enhancing Code Tracing Question Generation with Refined Prompts in Large Language Models
Aysa X. Fan, Rully A. Hendrawan, Yang Shi, Qianou Ma
2024
LLM4Ed Workshop 2024
Evaluating the Quality of Code Comments Generated by Large Language Models for Novice Programmers
Aysa X. Fan, Arun Balajiee Lekshmi Narayanan, Mohammad Hassany, Jiaze Ke
2024
EMNLP 2023 Findings
Exploring the Potential of Large Language Models in Generating Code-Tracing Questions for Introductory Programming Courses
Aysa X. Fan, Ranran Haoran Zhang, Luc Paquette, Rui Zhang
2023

Experience

From industry NLP to education research

Sep 2019 – Present
Graduate Research Assistant
University of Illinois Urbana-Champaign, HEDS Lab
Advisor: Dr. Luc Paquette
  • Benchmarked 7 LLMs (including Claude) on classifying student debugging strategies; built end-to-end inference pipeline on A100 GPUs with vLLM and structured JSON output; developed iterative prompt strategies through systematic error analysis
  • Fine-tuned Qwen3 models (0.6B–14B) using Self-Taught Reasoner (STaR) with LoRA; fine-tuned BERT, RoBERTa, and ModernBERT as encoder baselines
  • Co-designed a human-AI annotation framework for resolving cross-dataset label conflicts in hallucination detection, jailbreak classification, preference modeling, and medical NER; improved annotation agreement (κ) by up to 0.21
  • Designed and deployed an LLM-powered Socratic tutoring system for CS1 students with explicit behavioral guardrails through literature-grounded iterations
  • Designed human evaluation frameworks for LLM-generated educational content across 3 studies; led expert annotation of 2,350 student debugging episodes (κ > 0.75); built LLM-based filtering pipeline to surface rare events
Jun 2017 – Jul 2025
Director of Education & Co-founder
Novel Panda Learning Centre
  • Co-founded a non-profit learning center; designed curricula centered on developing students' observation and self-expression rather than replicating technique; taught classes and trained instructors
May 2016 – May 2019
Product Quality Analyst
Kik Interactive
  • Analyzed large-scale user interaction logs and clickstream data (300M+ registered users) to support A/B testing and user segmentation using SQL/Python
  • Built product quality dashboards in Redash/Kibana; produced user experience reports to surface behavioral patterns and inform data-driven decisions
Apr 2014 – May 2016
Data Analyst → Data & Quality Analysis Team Lead
Maluuba (acquired by Microsoft, 2017)
  • Led the data team responsible for all data collection and annotation for NLP models across 10 supported languages; managed internal annotators and organized external crowdsourcing
  • Designed annotation guidelines, crowdsourcing quality control frameworks, and contributor selection tests; iterated guidelines as NLP methods evolved
  • Coordinated end-to-end model testing: executed test plans, analyzed results, identified failure patterns, and drove improvements across key OEM projects
  • Led the M-Fit fitness app project at the Maluuba Hackathon (2nd place); received Sherlock Award as best QA (2014)