Course materials for Georgia Tech CS 4650 and 7650, "Natural Language"
(Note about registration: registration is currently restricted to students pursuing CS degrees for which this course is an essential requirement. Unfortunately, the enrollment is already at the limit of the classroom space, so this restriction is unlikely to be lifted.)
This course gives an overview of modern data-driven techniques for natural language processing. The course moves from shallow bag-of-words models to richer structural representations of how words interact to create meaning. At each level, we will discuss the salient linguistic phemonena and most successful computational models. Along the way we will cover machine learning techniques which are especially relevant to natural language processing.
Readings will be drawn mainly from my notes. Additional readings may be assigned from published papers, blogposts, and tutorials.
These are completely optional, but might deepen your understanding of the material.
The graded material for the course will consist of:
Barring a personal emergency or an institute-approved absence, you must take each exam on the day indicated in the schedule. Job interviews and travel plans are generally not a reason for an institute-approved absence. See here for more information on GT policy about absences.
Problem sets will be accepted up to 72 hours late, at a penalty of 2 points per 24 hours. (Maximum score after missing the deadline: 10/12; maximum score 24 hours after the deadline: 8/12, etc.) It is usually best just to turn in what you have at the due date. Late homeworks will not be accepted. This late policy is intended to ensure fair and timely evaluation.
My office hours follow Wednesday classes (4:15-5:15PM) and take place in class when available.
TA office hours are in CCB commons (1st floor) unless otherwise announced on Piazza. - Murali: Friday 10AM-11AM - James: Thursday 11AM-12PM - Yuval: Tuesday 3PM-4PM - Zhewei: Monday 1PM-2PM
Please use Piazza rather than personal email to ask questions. This helps other students, who may have the same question. Personal emails may not be answered. If you cannot make it to office hours, please use Piazza to make an appointment. It is unlikely that I will be able to chat if you make an unscheduled visit to my office. The same is true for the TAs.
Attendance will not be taken, but you are responsible for knowing what happens in every class. If you cannot attend class, make sure you check up with someone who was there.
Respect your classmates and your instructor by preventing distractions. This means be on time, turn off your cellphone, and save side conversations for after class. If you can't read something I wrote on the board, or if you think I made a mistake in a derivation, please raise your hand and tell me!
Using a laptop in class is likely to reduce your education attainment. This has been documented by multiple studies, which are nicely summarized in the following article:
I am not going to ban laptops, as long as they are not a distraction to anyone but the user. But I suggest you try pen and paper for a few weeks, and see if it helps.
The official prerequisite for CS 4650 is CS 3510/3511, "Design and Analysis of Algorithms." This prerequisite is essential because understanding natural language processing algorithms requires familiarity with dynamic programming, as well as automata and formal language theory: finite-state and context-free languages, NP-completeness, etc. While course prerequisites are not enforced for graduate students, prior exposure to analysis of algorithms is very strongly recommended.
Furthermore, this course assumes:
People sometimes want to take the course without having all of these prerequisites. Frequent cases are:
Students in the first group suffer in the exam and don't understand the lectures, and students in the second group suffer in the problem sets. My advice is to get the background material first, and then take this course.
One of the goals of the assigned work is to assess your individual progress in meeting the learning objectives of the course. You may discuss the homework and projects with other students, but your work must be your own -- particularly all coding and writing. For example:
Some assignments will involve written responses. Using other people’s text or figures without attribution is plagiarism, and is never acceptable.
Suspected cases of academic misconduct will be (and have been!) referred to the Honor Advisory Council. For any questions involving these or any other Academic Honor Code issues, please consult me, my teaching assistants, or http://www.honor.gatech.edu.