Loading Events

« All Events

CANSSI Prairies Workshop: From Classical NLP to Large Language Models: Concepts, Architectures, and Practical Demonstrations

February 7
CANSSI Prairies Workshop Jan 15 2026

Date: Saturday, February 7, 2026
Time: 9:00–16:00 Central Time
Place: University of Manitoba, Fort Garry Campus, Armes Building, Room 201

Workshop Description

This one-day workshop, titled “From Classical NLP to Large Language Models: Concepts, Architectures, and Practical Demonstrations,” is the fifth in the CANSSI Prairies Workshop Series in Data Science. It will be led by Lei Ding, Assistant Professor of Statistics at the University of Manitoba.

The workshop is intended for students, researchers, and professionals in statistics, computer science, and data science, as well as for individuals interested in understanding or applying Natural Language Processing (NLP) and Large Language Models (LLMs) in research or practice. No deep background in machine learning is required, although basic programming familiarity would be helpful.

Participants will gain a unified understanding of classical and modern NLP; insight into how LLMs learn, reason, and behave; practical code examples for embeddings and Retrieval-Augmented Generation (RAG); and a strong foundation for research or applied work involving LLMs.

By the end of the workshop, participants will:

  • Understand classical NLP representations and why they fail to capture semantics
  • Grasp the key innovations behind word embeddings
  • Learn the Transformer architecture and why it became the dominant model
  • Understand how LLMs are pretrained, instruction-tuned, and aligned with human feedback
  • See how models perform reasoning and why Chain-of-Thought (CoT) prompting can improve performance
  • Learn how retrieval and grounding improve model accuracy
  • Gain hands-on experience building small NLP and LLM workflows

We invite you to join us!

Cost and Registration

  • Students: $25
  • Non-students: $50

REGISTER ON EVENTBRITE

Program Schedule

Morning Sessions

9:00–10:30 | Session 1—Foundations of NLP and Embeddings

This session introduces traditional NLP techniques and motivates the shift toward dense vector representations. Topics include:

  • Bag-of-Words (BoW) and TF-IDF
  • Limitations of sparse representations: no order, no meaning
  • Transition to continuous embeddings
  • Word2Vec, GloVe, fastText
  • Semantic geometry: similarity and analogy reasoning

Outcome: Participants will understand how text becomes vectors and why embeddings transformed NLP.

10:30–10:45 | Break

10:45–12:00 | Session 2—Transformer Architecture and Pretraining

A focused introduction to the architecture underlying all modern LLMs. Topics include:

  • Self-attention mechanism
  • Multi-head attention
  • Positional encoding
  • Encoder vs. decoder structure
  • Pretraining objectives: next-token prediction, masked language modelling
  • Why scaling Transformers leads to emergent capabilities

Outcome: Participants will gain intuition for how Transformers operate and why they scale effectively.

12:00–13:00 | Lunch

Afternoon Sessions

13:00–14:30 | Session 3 — Large Language Models: Reasoning, Alignment, and Applications

This is the main conceptual session of the afternoon. Topics include:

  • What makes a model “large”?
  • Instruction tuning
  • Supervised fine-tuning (SFT)
  • Reinforcement Learning from Human Feedback (RLHF)
  • CoT prompting and why it improves reasoning performance
  • Hallucinations, grounding, and a brief introduction to RAG
  • Example: vanilla prompt vs. CoT prompt (live reasoning demo)

Outcome: Participants will understand how modern LLMs reason, how alignment works, and how prompting strategies affect output quality.

14:30–14:45 | Break

14:45–16:00 | Session 4—Live Coding Demonstration: Embeddings, Reasoning, and RAG

This hands-on session connects all concepts from the day with practical examples.
Live examples will include:

  • Generating text embeddings
  • Performing semantic similarity search
  • A minimal RAG pipeline
  • Demonstrating reasoning with and without Chain-of-Thought
  • A small end-to-end example: upload text → embed → retrieve → prompt → answer

Outcome: Participants will see how NLP and LLM systems are constructed in practice and leave with reproducible Python code.

About the Speaker

Lei DingLei Ding is an Assistant Professor in the Department of Statistics at the University of Manitoba. He previously held a postdoctoral position at the University of Alberta, where he also completed his PhD in Statistical Machine Learning in 2024. His research lies at the intersection of Large Language Models (LLMs), Natural Language Processing (NLP), and Statistical Learning. Dr. Ding has authored over 20 publications in leading international conferences and journals, including the Conference on Neural Information Processing Systems (NeurIPS), the International Conference on Machine Learning (ICML), the AAAI Conference on Artificial Intelligence, the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), and the Proceedings of the National Academy of Sciences Nexus (PNAS Nexus).

About the Series

The CANSSI Prairies Workshop Series in Data Science offers an excellent opportunity for individuals to enhance their knowledge and skills in various areas of data science. Through a series of engaging and interactive hybrid (online and in-person) sessions, participants have the opportunity to explore new topics, learn cutting-edge techniques, and connect with experts in the field.

Details

Organizer

  • CANSSI Prairies

Venue

  • University of Manitoba (Fort Garry Campus)
  • 66 Chancellors Circle
    Winnipeg, Manitoba R3T 2N2 Canada
    + Google Map