CANSSI Prairies Workshop: From Classical NLP to Large Language Models: Concepts, Architectures, and Practical Demonstrations

Date: Saturday, February 7, 2026
Time: 9:00–16:00 Central Time
Place: University of Manitoba, Fort Garry Campus, Armes Building, Room 201
Workshop Description
This one-day workshop, titled “From Classical NLP to Large Language Models: Concepts, Architectures, and Practical Demonstrations,” is the fifth in the CANSSI Prairies Workshop Series in Data Science. It will be led by Lei Ding, Assistant Professor of Statistics at the University of Manitoba.
The workshop is intended for students, researchers, and professionals in statistics, computer science, and data science, as well as for individuals interested in understanding or applying Natural Language Processing (NLP) and Large Language Models (LLMs) in research or practice. No deep background in machine learning is required, although basic programming familiarity would be helpful.
Participants will gain a unified understanding of classical and modern NLP; insight into how LLMs learn, reason, and behave; practical code examples for embeddings and Retrieval-Augmented Generation (RAG); and a strong foundation for research or applied work involving LLMs.
By the end of the workshop, participants will:
- Understand classical NLP representations and why they fail to capture semantics
- Grasp the key innovations behind word embeddings
- Learn the Transformer architecture and why it became the dominant model
- Understand how LLMs are pretrained, instruction-tuned, and aligned with human feedback
- See how models perform reasoning and why Chain-of-Thought (CoT) prompting can improve performance
- Learn how retrieval and grounding improve model accuracy
- Gain hands-on experience building small NLP and LLM workflows
We invite you to join us!
Cost and Registration
- Students: $25
- Non-students: $50
Program Schedule
Morning Sessions
9:00–10:30 | Session 1—Foundations of NLP and Embeddings
This session introduces traditional NLP techniques and motivates the shift toward dense vector representations. Topics include:
- Bag-of-Words (BoW) and TF-IDF
- Limitations of sparse representations: no order, no meaning
- Transition to continuous embeddings
- Word2Vec, GloVe, fastText
- Semantic geometry: similarity and analogy reasoning
Outcome: Participants will understand how text becomes vectors and why embeddings transformed NLP.
10:30–10:45 | Break
10:45–12:00 | Session 2—Transformer Architecture and Pretraining
A focused introduction to the architecture underlying all modern LLMs. Topics include:
- Self-attention mechanism
- Multi-head attention
- Positional encoding
- Encoder vs. decoder structure
- Pretraining objectives: next-token prediction, masked language modelling
- Why scaling Transformers leads to emergent capabilities
Outcome: Participants will gain intuition for how Transformers operate and why they scale effectively.
12:00–13:00 | Lunch
Afternoon Sessions
13:00–14:30 | Session 3 — Large Language Models: Reasoning, Alignment, and Applications
This is the main conceptual session of the afternoon. Topics include:
- What makes a model “large”?
- Instruction tuning
- Supervised fine-tuning (SFT)
- Reinforcement Learning from Human Feedback (RLHF)
- CoT prompting and why it improves reasoning performance
- Hallucinations, grounding, and a brief introduction to RAG
- Example: vanilla prompt vs. CoT prompt (live reasoning demo)
Outcome: Participants will understand how modern LLMs reason, how alignment works, and how prompting strategies affect output quality.
14:30–14:45 | Break
14:45–16:00 | Session 4—Live Coding Demonstration: Embeddings, Reasoning, and RAG
This hands-on session connects all concepts from the day with practical examples.
Live examples will include:
- Generating text embeddings
- Performing semantic similarity search
- A minimal RAG pipeline
- Demonstrating reasoning with and without Chain-of-Thought
- A small end-to-end example: upload text → embed → retrieve → prompt → answer
Outcome: Participants will see how NLP and LLM systems are constructed in practice and leave with reproducible Python code.
About the Speaker
Lei Ding is an Assistant Professor in the Department of Statistics at the University of Manitoba. He previously held a postdoctoral position at the University of Alberta, where he also completed his PhD in Statistical Machine Learning in 2024. His research lies at the intersection of Large Language Models (LLMs), Natural Language Processing (NLP), and Statistical Learning. Dr. Ding has authored over 20 publications in leading international conferences and journals, including the Conference on Neural Information Processing Systems (NeurIPS), the International Conference on Machine Learning (ICML), the AAAI Conference on Artificial Intelligence, the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), and the Proceedings of the National Academy of Sciences Nexus (PNAS Nexus).
About the Series
The CANSSI Prairies Workshop Series in Data Science offers an excellent opportunity for individuals to enhance their knowledge and skills in various areas of data science. Through a series of engaging and interactive hybrid (online and in-person) sessions, participants have the opportunity to explore new topics, learn cutting-edge techniques, and connect with experts in the field.