Information and FAQs available on

MDS Computational Linguistics

UBC’s Master of Data Science in Computational Linguistics is the credential to set you apart. Offered at the Vancouver campus, this unique degree is tailored to those with a passion for language and data. Over 10 months, the program combines foundational data science courses with advanced computational linguistics courses—equipping graduates with the skills to turn language-related data into knowledge and to build AI that can interpret human language.

Program Benefits

Highlights Across All MDS Programs:

  • 10-month, full-time, accelerated program offers a short-term commitment for long-term gain
  • Condensed one-credit courses allow for in-depth focus on a limited set of topics at one time
  • Capstone project gives students an opportunity to apply their skills
  • Real-world data sets are integrated in all courses to provide practical experience across a range of domains

Highlights Specific To Computational Linguistics:

  • Courses are taught by a combination of arts (linguistics), computer science, and statistics faculty members giving students access to key experts within each field of study 
  • Students learn fundamental data science skills, techniques, and tools with the core Master of Data Science cohort, then branch off into more specialized courses, experiencing the benefits of a large program and small program in one
  • UBC’s Vancouver campus offers students the unrivaled experience of a top 40 university, surrounded by remarkable natural beauty, at the edge of a cosmopolitan city
  • Strong connections with industry partners in public and private sectors, start-ups, and leading tech companies offer a wide range of networking/career opportunities


The program structure includes 24 one-credit courses offered in four-week segments. Courses are lab-oriented and delivered in-person with some blended online content.

At the end of the six segments, an eight-week capstone project is also included, allowing students to apply their newly acquired knowledge, while working alongside other students with real-life data sets. Please note that instructors are subject to change.

*subject to change at the discretion of the MDS Computational Linguistics program

Fall: September - December

Block 1 (4 weeks)

Programming for Data Science

DSCI 511
Basic programming in R and Python. Overview of data structures, iteration, flow control, and program design relevant to data exploration and analysis. When and how to exploit pre-existing libraries.

Computing Platforms for Data Science

DSCI 521
How to install, maintain, and use the data scientific software “stack”. The Unix operating system, integrated development environments, and problem solving strategies.

Rodolfo Lourenzutti
Descriptive Statistics and Probability for Data Science

DSCI 551
Fundamental concepts in probability. Statistical view of data coming from a probability distribution.

Vincenzo Coia, Tomas Beuzen
Corpus Linguistics

COLX 521
Basic processing of text corpora using Python. Includes string manipulation, corpus readers, linguistic comparison of corpora, structured text formats, and text preprocessing tools.

Julian Brooke

Block 2 (4 weeks)

Data Wrangling

DSCI 523
Converting data from the form in which it is collected to the form needed for analysis. How to clean, filter, arrange, aggregate, and transform diverse data types, e.g. strings, numbers, and date-times.

Tiffany Timbers, Tomas Beuzen
Data Visualization I

DSCI 531
Exploratory data analysis. Design of effective static visualizations. Plotting tools in R and Python.

Firas Moosvi
Algorithms & Data Structures

DSCI 512
How to choose and use appropriate algorithms and data structures to help solve data science problems. Key concepts such as recursion and algorithmic complexity (e.g., efficiency, scalability).

Julian Brooke
Statistical Inference and Computation I

DSCI 552
The statistical and probabilistic foundations of inference, developed jointly through mathematical derivations and simulation techniques. Important distributions and large sample results. Methods for dealing with the multiple testing problem. The frequentist paradigm.

Block 3 (4 weeks)

Regression I

DSCI 561
Linear models for a quantitative response variable, with multiple categorical and/or quantitative predictors. Matrix formulation of linear regression. Model assessment and prediction.

Parsing for Computational Linguistics

COLX 535
The identification of syntactic structure in natural language. Parsing algorithms for popular grammar formalisms, application of statistical information to parsing, parser evaluation, and extraction of parse features.

Julian Brooke
Supervised Learning I

DSCI 571
Introduction to supervised machine learning, with a focus on classification. K-NN, Decision trees, SVM, how to combine models via ensembling: boosting, bagging, random forests. Basic machine learning concepts such as generalization error and overfitting.

Databases & Data Retrieval

DSCI 513
How to work with data stored in relational database systems. Storage structures and schemas, data relationships, and ways to query and aggregate such data.

Rodolfo Lourenzutti

Winter: January - April

Block 4 (4 weeks)

Computational Semantics

COLX 561
How meaning is represented by computers. An overview of popular semantic resources, and techniques for building new resources from unstructured text data.

Julian Brooke
Feature and Model Selection

DSCI 573
How to evaluate and select features and models. Cross-validation, ROC curves, feature engineering, and regularization.

Varada Kolhatkar
Unsupervised Learning

DSCI 563
How to find groups and other structure in unlabeled, possibly high dimensional data. Dimension reduction for visualization and data analysis. Clustering, association rules, model fitting via the EM algorithm.

Julian Brooke
Supervised Learning II

DSCI 572
Introduction to optimization. Gradient descent and stochastic gradient descent. Roundoff error and finite differences. Neural networks and deep learning.

Muhammad Abdul-Mageed

Block 5 (4 weeks + 1 week break)

Privacy, Ethics & Security

DSCI 541
The legal, ethical, and security issues concerning data, including aggregated data. Proactive compliance with rules and, in their absence, principles for the responsible management of sensitive data. Case studies.

Julian Brooke
Computational Morphology

COLX 525
Approaches to sub-word phenomenon in language processing. Automatic morphological analysis of diverse languages, part of speech tagging, word segmentation, and character-level neural network models.

Miikka Silfverberg
Machine Translation

COLX 531
Key methodologies for automatic translation between languages, with a focus on statistical and neural machine translation approaches. Applying Machine Translation (MT) architectures to analogous monolingual tasks. MT evaluation.

Muhammad Abdul-Mageed
Advanced Corpus Linguistics

COLX 523
Text corpora collection and curation. How to pull representative datasets from internet sources. Techniques for efficient and reliable annotation.

Julian Brooke

Block 6 (4 weeks)

Advanced Computational Semantics

COLX 563
Application of machine learning to various semantic tasks. Likely topics include: information extraction, semantic role labelling, semantic parsing, discourse parsing, question answering, summarization, and natural language inference.

Julian Brooke
Trends in Computational Linguistics

COLX 585
Cutting-edge techniques in natural language processing. For this iteration, the latest innovations in neural network architectures.

Muhammad Abdul-Mageed
Sentiment Analysis

COLX 565
Identification and analysis of opinion, especially in social media. Text polarity and emotion classification, fine-grained (e.g. aspectual) opinion mining, argumentation mining, sentiment in social networks.

Julian Brooke
Natural Language Processing for Low-Resource Languages

COLX 581
Building automatic language tools when data is scarce. Rule-based and hybrid systems, semi-supervised learning, active learning. Knowledge transfer from other (related) languages.

Miikka Silfverberg

Spring: May - June

Capstone Project (8-10 Weeks)

Capstone Project

COLX 595
A mentored group project based on real data and questions from a partner within or outside the university. Students will formulate questions and design and execute a suitable analysis plan. The group will work collaboratively to produce a project report, presentation, and possibly other products, such as a web application.

MDS Staff


Review Admission Requirements Contact Us With Questions

Data in Action: Delivering Better Care Through Education


In partnership with QxMD—a Vancouver-based digital learning technology company—students from UBC’s Master of Data Science program created a tool to identify trending health topics within news articles and match these with relevant medical journal articles. Thus helping medical professionals better serve patients with questions related to specific news articles they’ve read.

View Full Story