University of Washington Logo
Pro. Master's in Computational Linguistics
About |  Careers |  Courses |  People |  Admission |  Online Option |  Contact Print


Required Courses
> Elective Courses
Recommended Reading
Sample Course Schedules
Apply for a Single Online Course
  Elective Courses

Computational Linguistics Courses

LING 575: Topics in Computational Linguistics is offered four times a year. Below is a list of recent topics that have been covered in this class. In addition, students may take LING 567: Knowledge Engineering for Deep Processing or courses in related fields, such as EE 516: Computer Speech Processing, as electives.

Multilingual NLP (LING 575) (3 credits)
Katrin Kirchhoff (UW EE)
Spring 2006

This course examines the problems posed by different types of languages for statistical natural language processin algorithms. Topics will include multilingual and cross-lingual language modeling, text-based language identification, word alignment, cross-lingual induction of annotation and analysis tools, multilingual question answering, summarization and named entity detection.

Lexical Ambiguity (LING 575) (3 credits)
William Lewis (UW Linguistics)
Spring 2006

The ease by which humans process ambiguity in natural language is remarkable, especially considering the lengths taken to reduce ambiguity in automated natural language systems. Humans are quite adept at dealing with ambiguity at all levels—sound, morpheme, word, sentence—and infrequently make mistakes. The mistakes made, however, are telling and give us clues as to the processes that underlie our language machinery. In turn, an understanding of these processes can inform the development of automated systems, moving us in the direction of more linguistically and cognitively informed natural language systems. In this seminar, we will examine ambiguity at the level of the word (mostly), focusing on the large base of psycholinguistic, neurolinguistic, and computational literature on the topic. Our investigation into ambiguity will consider the following questions, among others: What are some of the models proposed for the lexicon, and what is the experimental evidence that supports or refutes them? To what extent does ambiguity exist cross-linguistically, and how does the degree of ambiguity differ between languages? What problems does (or should) ambiguity pose to the acquisition of language? If ambiguity is dysfunctional, why is it tolerated, indeed, why might it even be selected for? What strategies might we use to improve automated mechanisms for disambiguation beyond “frequency trumps all” strategies?

Machine Translation (LING 575) (3 credits)
Fei Xia (UW Linguistics)
Winter 2006 and 2007

In this seminar, we will discuss important papers on machine translation, and focus on statistical MT and transfer-based MT. Students will gain hands-on experience by experimenting with various methods to improve a phrase-based SMT system.

Corpus Development, Management and Use (LING 575)
(3 credits)

Candace McKenna (Microsoft)
Winter 2006

Techniques for and issues relating to the development of corpora (sampling, representativeness, genres) and treebanks, preparing generic corpora for particular uses (e.g., text normalization), corpus search methodology, and relevant statistical methods for generalizing results from corpora. Term projects will provide hands-on experience withdealing with very large corpora.

Tools and Resources for Low Density Languages (LING 575)
(3 credits)

William Lewis (UW Linguistics)
Spring 2007

Until fairly recently, most computational linguistic work has focused on “majority” or “high density” languages, languages such as English, German, Chinese, and Japanese, all of which have a fairly large speaker base and a substantial digital presence. The availability of large enriched and/or aligned corpora for these languages has facilitated the development of number of automated tools, such as the parsers, taggers and MT systems, and a dizzying array of machine learning algorithms have been used to train these tools. Since most of the world's languages are “low density”, the standard methods for training and evaluating tools cannot apply. However, motivated in part by the realization that so many of the world's languages are on verge of extinction, and in part by an increased interest in developing resources for so-called “surge” languages (low density languages that have suddenly caught the world's attention), building tools and resources for low density languages has become a topic of growing interest within the Computational Linguistic community. In this seminar, we will explore the small but growing literature base on the topic, and examine the methods that can be used to tap substantially smaller resources for the purposes of building tools, and in turn, building additional resources.

Speech Synthesis (LING 575) (3 credits)
Gabe Webster (Toshiba)
Spring 2007

This course will focus on the design and implementation of modern concatenative speech synthesis systems (also known as text-to-speech systems). All parts of modern speech synthesis systems will be covered, including text processing (e.g., text normalization, unknown word pronunciation generation), prosodic modeling (duration and pitch modeling), and unit selection and modification. However, the emphasis will be on the more “computational linguistic” aspects of these systems, that is, on those components making use of symbolic processing and statistical modeling, rather than those using digital signal processing. Time will also be spent on historical context, alternate architectures, and current trends including synthesis of multilingual and emotional speech.

Unsupervised learning: A case study on Unsupervised POS Tagging (LING 575) (3 credits)
Fei Xia (UW Linguistics)
Winter 2008

The existing work on unsupervised POS tagging can be divided into three categories: the first starts with the forward-backward algorithm and improves performance by using a “filtered” lexicon; the second clusters words and tags the words with the cluster labels; the third takes advantage of bilingual data by projecting POS information from one language to the other. In this course, we will discuss each approach and build systems aiming at improving the state of the art.

Lexical Acquisition (LING 575) (3 credits)
Emily M. Bender (UW Linguistics)
Spring 2008

Hand-built precision grammars can produce high-quality semantic representations from input strings and generate well-formed strings from input semantic representations. Resources such as the Grammar Matrix can greatly speed up the creation of such precision grammars. However, the expansion of the lexicon remains an important hurdle in scaling up such grammars to practical coverage. Recent work by members of the DELPH-IN consortium and other research groups has demonstrated the possibility of automatic lexical acquisition from corpora for broad-coverage precision grammars. This seminar will review this work with an eye to how it could be applied in the case of relatively small precision grammars of low-density languages.

Information Extraction from Heterogeneous Resources (LING 575) (3 credits)
Scott Farrar (UW Linguistics)
Spring 2008

This seminar focuses on computational methods for the automatic extraction and processing of linguistic data. The amount of linguistic data being migrated to the Web is increasing. At the same time, methods for text processing (data mining, ontology extraction, data summarization, etc.) are being honed by information sciences including NLP. The two sub-disciplines have largely been practiced as separate enterprises. In particular, ordinary working linguists have little access to the fruits of comp. ling., while computation linguistics has steadily moved away from applying expert linguistic knowledge. The seminar is organized by linguistic data type. Each week, a different data type will be in focus. The most applicable information extraction method will be discussed along with ways to evaluate success.

Design for Learning & Education (LING 575) (3 credits)
Sharon Oviatt (Incaa Designs)
Spring 2008

This is a project-oriented studio course that includes responsive lecturing, guest lectures, group discussion and analysis, and hands-on group project creation. Students will design novel concepts for interfaces relating to Microsoft Research’s theme of “Learning and Education,” with a focus on language and communication features. For each project, students will research their design problem, define a scenario, ideate multiple design solutions, select one idea to prototype, and study the impact on real users. Design groups might work on novel interfaces that stimulate student curiosity and creativity, rethink requirements and possibilities for educational systems and tools, envision new interfaces to instruct remotely-located student groups or indigenous cultures without a written language, or generate new concepts for educational interfaces that could eliminate the achievement gap between student groups. They will be challenged to think about learning and education in different cultures and language groups during this Year of Languages, as well as learners with different abilities and learning styles.

Bridging NLP and Linguistics: A Case Study (LING 575)
(3 credits)

Fei Xia (UW Linguistics)
Winter 2009

Natural Language Processing (NLP) focuses on problems of automated generation and the understanding of human languages. Early NLP systems were rule-based, in which linguistic and world knowledge was represented by hand-crafted rules. Since the early 1990s, with the advancement of machine learning methods and the explosion of large corpora and even larger amounts of Web data, the field has shifted to strongly data-driven approaches. The success of these approaches has cast doubt on the relevance of linguistics in NLP. However, given that large numbers of languages are under-resourced, and that the ceiling for some learning approaches is being reached, the pendulum may be swinging back towards hybrid approaches that take advantage of both machine learning techniques and linguistic knowledge. In this research-oriented seminar, we will explore various ways of bridging linguistics and NLP, using interlinear glossed text as a case study.

Computational Linguistics and Linguistic Typology (LING 575)
(3 credits)

Emily M. Bender (UW Linguistics)
Spring 2009

Linguistic typology is the study of the range of variation in structure among human languages and the constraints on that variation. This seminar will look at the relationship between linguistic typology and computational linguistics from two directions, considering on the one hand computational approaches to linguistic typology and on the other how linguistic typology can inform work on computational linguistics/natural language processing.

Introduction to Speech Technology (LING 575) (3 credits)
Michael Tjalve (VoiceBox)
Spring 2009

The course covers training in how speech technology components work and how real-life speech applications are developed. Topics include: The mechanics of current speech technologies, i.e. automatic speech recognition (ASR), text-to-speech (TTS) and speaker verification and identification (SVI) with particular focus on ASR; Techniques for successful implementation of speech technology components; Pertinent differences and similarities between human-to-human speech communication and speech technology. Hands-on experience with speech technology will be gained through a project where you will build your own speech recognizer. The course is intended to prepare you for a professional role within speech technology.

Automated Reasoning and Information Extraction (LING 575)
(3 credits)

Scott Farrar (UW Linguistics)
Spring 2009

Automated reasoning, especially symbolic reasoning, has long been accepted as one of the core problems of AI and has often found its way into NLP in various forms. However, this paradigm has largely fallen out of favor as the field has relied more and more on data-driven statistically based algorithms. However, with recent advances in reasoning technologies, including probabilistic hybrids, description logics and tools, automated reasoning has the potential to once again influence NLP work. The task of information extraction, in particular, stands to benefit by the addition of a reasoning component as potential target information often lies several logical steps away from the query string. Sub-problems include the derivation of semantic representations, logical entailment, and the resolution of syntactic ambiguity. A major part of this course will be to explore hybrid approaches that combine symbolic and statistically based reasoning techniques and to apply them to the problem of information extraction.

Knowledge Engineering for Deep Processing (LING 567)
(3 credits)

Emily M. Bender (UW Linguistics)
Annually since 2004

Techniques and theoretical issues relating to the development of knowledge engineering resources required for deep processing (symbolic or hybrid), focusing on grammar engineering and semantic representations.

Computer Speech Processing (EE 516) (4 credits)

Introduction to automatic speech processing. Overview of human speech production and perception. Fundamental theory in speech coding, synthesis and reproduction, as well as system design methodologies. Advanced topics include speaker and language identification and adaptation.

Other Electives

Students who need to complete either the CSE (CSE 373) or statistics (STAT 391) requirement may do so with one of the two computational linguistics electives. Students who already have a background in statistics equivalent to STAT 391 may opt to use one of their electives to take an advanced statistics course.

Students may choose from many other elective courses available in related fields, such as CSE, EE, and the iSchool.

Data Structures and Algorithms (CSE 373) (3 credits)

Data types, abstract data types, and data structures. Efficiency of algorithms. Sequential and linked implementation of lists. Binary tree representations and traversals. Searching: dictionaries, priority queues, hashing. Directed graphs, depth-first algorithms. Garbage collection. Dynamic storage allocation. Internal and external sorting.

Probability and Statistics for Computer Science (STAT 391)
(4 credits)

Concepts of probability and statistics. Conditional probability, independence, random variables, distribution functions. Descriptive statistics, transformations, sampling errors, confidence intervals, least squares and maximum likelihood. Exploratory data analysis and interactive computing.

View list of required courses.

 
© 2009 University of Washington. All rights reserved.