Course Information

Syllabus PDF

Description: This class will cover both the traditional and the recent Deep Learning-based NLP concepts and techniques. Topics include: language model, word sense and embedding, part-of-speech tagging, hidden Markov model, sequence-to-sequence models (RNNs and attention mechanisms) dependency parsing, and relation extraction. GPU computing facility will be provided to explore the state-of-the-art deep NLP models. Credit will not be given for both CSE 325 and CSE 425.

Lectures: Monday/Wednesday 12:40 PM - 01:55 PM, Room 115, Building C

Lab: SandBox, Room 112, Packard Lab

Office Hours: Monday/Wednesday 4:00 pm - 5:00 pm, or by appointment, Room 326, Building C

Prerequisites: For CSE 325: (MATH 231 or ECO 045) and (CSE017), for CSE 425 instructor permission is required.
We will use Python and PyTorch for projects. The course is self-contained but is easier if you have a background in data mining or machine learning (multivariate derivatives, optimization, and neural networks). Relevant programming and math concepts will be discussed briefly only when necessary.

Format: In-class lectures and student presentations; programming assignments; research projects (optional for undergraduates).


Textbooks

All textbooks are optional. Most of the materials, including the math and the implementations are freely available online.

SLP3 = Speech and Language Processing, 3nd Edition by Daniel Jurafsky, James H. Martin. 2019 Draft. Most chapters freely available at Link.

FSNLP = Foundations of statistical natural language processing, by Manning, Christopher D., Schütze, Hinrich. Cambridge, Mass.: MIT Press, 2000. Paper book available at Linderman Reserve and Ebook available to Lehigh users.

DLNLP = Deep Learning for Natural Language Processing ,by Stephan Raaijmakers. Manning Publications. 2019. Purchase Link.

DL = Deep Learning, by Ian Goodfellow and Yoshua Bengio and Aaron Courville. MIT Press. 2016. Link.

NNMNLP = Neural Network Methods for Natural Language Processing, by Yoav Goldberg. Morgan & Claypool Publishers. 2017. Link.


Online Resources

Coursesite: for posting grades only Link.

Piazza: you may post your questions that can be answered by the instructor and other students Link.

This website: for general information and resources (codes, data, projects).


Schedule

The following topics will be coverved (tentatively): language models and word embedding; machine learning and deep learning basics; PyTorch programming; syntactics (parts-of-speech tagging, HMM and CRF); sequence-to-sequence models (RNN, CNN, attention models); relation extraction.


Programming assignments

There are programming assignments that use deep neural networks for three common NLP tasks: word embedding, POS tagging, and relation extraction. Grading: Each student has 6 late days in total without penalty and the student decides how to use them. Beyond the late quota, late submissions will be penalized 20% of the total grades per late day (24 hours or part thereof) and no assignment or project will be accepted more than four days after its due date. The assignments and project will be graded partly based on your programs' performance in terms of metrics defined in individual assignments. Submission: Assignments are submitted via CourseSite. Collaboration: Make sure you write your own codes. If you refer to any sources, you need to list them on top of your submitted readme file. Sharing and copying solutions are considered as a violation of honor code. This includes but not limited to copying solutions from the web, the textbook solution manuals and previous years' submissions.


Research Project

Each graduate student needs to reproduce the results of an NLP research paper using PyTorch to get 40% of the total credit. Grading: The project is evaluated based on your depth of understanding of a research paper and how much you reproduce the experiments. The 6 late days quota can be used for submissions of the final report. Submission: the proposal, presentation slides, and final report are submitted via CourseSite. Collaboration: this is an individual project. You can use any online resources but need to indicate the sources in your final report.


NLP Resources: