All pages
Powered by GitBook
1 of 5

Loading...

Loading...

Loading...

Loading...

Loading...

Syllabus

CS|QTM|LING-329: Computational Linguistics (Spring 2025)

General

  • Book: http://emory.gitbook.io/nlp-essentials/

  • GitHub: https://github.com/emory-courses/nlp-essentials/

  • Time: MW 4 - 5:15 PM

  • Location: MSC W201

Instructors

  • Jinho Choi

    • Associate Professor of Computer Science, Quantitative Theory and Methods, Linguistics

    • Office Hours and Location: MW 5:30 - 6:30 PM, White Hall 218

    • GitHub: jdchoi77

  • Grace Byun

    • Ph.D. Student in Computer Science and Informatics

    • Office Hours and Location:

      • Hours: Wed 1:30 - 3:30 PM and Fri 12:30 - 1:30 PM

      • Location: Math & Science Center E308 (Computer Lab)

    • GitHub: byunsj

  • Swati Rajwal

    • Ph.D. Student in Computer Science and Informatics

    • Office Hours and Location:

      • Hours: Mon 10:30 AM - 1:30 PM

      • Location: Math & Science Center E308 (Computer Lab)

    • GitHub: swati-rajwal

Grading

  • Homework: 65%

  • Team Project: 35%

  • Your work is governed by the Emory Honor Code. Honor code violations (e.g., copies from any source, including colleagues and internet sites) will be referred to the Emory Honor Council.

  • Requests for absence/rescheduling due to severe personal events (such as health, family, or personal reasons) impacting course performance must be supported by a letter from the Office for Undergraduate Education.

Homework

  • Each topic will include homework that combines concept quizzes and programming assignments to assess your understanding of the subject matter.

  • Assignments must be submitted individually. While discussions are allowed, your work must be original.

  • Late submissions within a week will be accepted with a grading penalty of 15%. They will not be accepted after the solutions are discussed in class.

Concept Quizzes

  • Each section incorporates questions to explore the content more comprehensively, with their corresponding answers slated for discussion in the class.

  • While certain questions may have multiple valid answers, the grading will be based on the responses discussed in class, and alternative answers will be disregarded. This approach allows us to distinguish between answers discussed in class and those generated by AI tools like ChatGPT.

Programming Assignments

  • You are encouraged to use any code examples and invoke APIs provided in this book.

  • Feel free to create additional functions and variables in the assigned Python file. For each homework, ensure that all your implementations are included in the respective Python file located under the corresponding directory.

  • Usage of packages not covered in the assigned chapter is prohibited. Ensure that your code does not rely on the installation of additional packages, as we will not be able to execute your program for evaluation if external dependencies are needed.

Team Project

  • You are expected to:

    • Form a team of 3-4 members.

    • Present a project pitch to share your proposed idea, and write a proposal report.

    • Deliver a live demonstration showcasing your working project, create a demonstration video, and write a final report documenting details about your project.

    • Provide individual feedback on other teams' presentations and demonstrations.

  • Participation in pitch presentations and live demonstrations is compulsory. Failure to attend any of these events will result in a zero grade for the respective activity. In the event of unavoidable absence due to severe personal circumstances, a formal letter from the Office for Undergraduate Education must accompany any excuses.

Project Grading

  • Team members receive the same grade for the pitch presentation, live demonstration, and demonstration video.

  • Peer evaluations from other teams factor into your team grade.

  • Your feedbacks to other teams are graded individually.

For the project and final reports, you are required to indicate the contribution percentage of each team member, which impacts the individual grades for the assignment.

Contribution

If your team of two members received 4 out of 5 points, for example, and you indicate that your contribution was 60% while your teammate's was 40%, the points are distributed as follows:

  • You receive: 4 (team points) ⨉ 60 (your contributions) / 60 (max contributions) = 4 points.

  • Your teammate receive: 4 (team points) ⨉ 40 (your teammate's contributions) / 60 (max contributions) = 2.67 points.

This approach ensures that the grading reflects the effort and input of each team member, promoting fairness and accountability.

Consensus

  • Each team is required to submit a single, agreed-upon chart detailing the contribution percentages of all members for each team assignment. This means that you and your teammates must reach a consensus on the contribution rates before submitting your work.

  • Open communication and transparency are essential in this process. Disagreements should be resolved within the team, ensuring that the final submission reflects the true division of labor and contributions.

By adhering to these guidelines, you not only produce a strong research paper but also develop key skills in teamwork and fair assessment of contributions.

Schedule

CS|QTM|LING-329: Computational Linguistics (Spring 2025)

Date
Topic
Assignment

01/15

01/20

Martin Luther King Day

01/22

01/27

(continue)

01/29

(continue)

02/03

(continue)

02/05

02/10

02/12

(continue)

02/17

(continue)

02/19

(continue)

02/24

02/26

(continue)

03/03

(continue)

03/05

(continue)

03/10

Spring Break

03/12

Spring Break

03/17

03/19

03/24

03/26

(continue)

03/31

(continue)

04/02

(continue)

04/07

(continue)

04/09

04/14

(continue)

04/16

(continue)

04/21

04/23

04/28

NLP Tasks & Applications

  • Attendance is mandatory for all Project Pitches and Live Demonstrations sessions.

Overview
Homework 0
Text Processing
Homework 1
Speed Dating
Team Formation
Language Models
Homework 2
Vector Space Models
Proposal Pitch
Homework 3
Proposal Pitches
Proposal Report
Proposal Pitches
Distributional Semantics
Homework 4
Contextual Encoding
Live Demonstration
Live Demonstrations
Live Demonstrations
Final Report
Homework 5

Overview

By Jinho D. Choi (2025 Edition)

Natural Language Processing (NLP) is a dynamic field within Artificial Intelligence focused on developing computational models to understand, interpret, and generate human language. As NLP technologies become increasingly embedded in our daily lives, understanding its fundamentals is crucial for both leveraging its potential and enhancing our interaction with language-based systems.

This course is designed to build a robust foundation in the core principles of modern NLP. We begin with text processing techniques that show how to manipulate textual data for model development. We then progress to language models, enabling computational models to comprehend and generate human language, and vector space models that convert textual data into machine-readable formats. Advanced topics include distributional semantics for creating context-aware word embeddings, and contextual encoding for analyzing word relationships within their surrounding text.

The latter part of the course focuses on practical application through team projects. Students will have the opportunity to work with cutting-edge NLP technologies, such as large language models, to develop real-world applications. This hands-on approach encourages creativity and innovation, with students proposing their own ideas and presenting demonstrations to their peers.

Learning assessment combines concept quizzes to reinforce theoretical understanding with hands-on programming assignments that develop practical implementation skills. By the conclusion of the course, students will have gained the knowledge and skills to navigate and contribute to the rapidly evolving field of NLP.

Prerequisites

  • Introduction to Python Programming

  • Introduction to Machine Learning

Sections

  • Syllabus

  • Schedule

  • Development Environment

  • Homework

Development Environment

This guide will help you set up your development environment by installing required tools: Python programming language, GitHub for version control, and PyCharm IDE.

Python

  • Install Python version 3.13.x or higher. Earlier versions may not be compatible with this course.

  • Take some time to familiarize yourself with Python's new features.

GitHub

  1. Create a GitHub account (if you do not already have one). As a student, you can get GitHub Pro features for free through the GitHub Student Developer Pack.

  2. Login to GitHub.

  3. Create a new repository named "nlp-essentials" and set its visibility to Private.

    Create a GitHub repository.
  4. Go to [Settings] in your repository, and select [Collaborators and teams].

  5. Click [Add people], and add each instructor using their GitHub usernames:

    1. Find their GitHub IDs in the "Instructors" section of the Syllabus.

    2. Enter each username and send the collaboration invitation.

  6. Verify that all instructors have been added as collaborators.

Add collaborators to your GitHub repository.

PyCharm

  1. Install PyCharm on your local machine:

    1. As a student, you can get PyCharm Professional for free by applying for a JetBrains Educational License.

    2. The following instructions are based on PyCharm 2024.3.x Professional Edition.

  2. Configure your GitHub account:

    1. Go to [Settings] > [Version Control] > [GitHub].

    2. Press [+], select [Log in via GitHub], and follow the browser prompts to authorize PyCharm with your GitHub account.

    3. Once connected, you will be able to access GitHub directly from PyCharm for version control operations.

Add your GitHub account to PyCharm.
  1. Create a new PyCharm project from GitHub:

    1. On the PyCharm welcome screen, click [Clone Repository].

    2. In the new window, select [GitHub] from the left menu, choose your nlp-essentials repository, and click [Clone] (verify the directory name is "nlp-essentials").

  2. Set Up a Python virtual environment:

    1. Go to [Settings] > [Project: nlp-essentials] > [Project Interpreter].

    2. Click [Add Interpreter] and choose [Add Local Interpreter].

  3. In the prompted menu, choose [Add Local Environment], configure as follows, then click [OK]:

    • Environment: Generate new

    • Type: Virtualenv

    • Base python: the Python version you installed above

    • Location: YOUR_LOCAL_PATH/nlp-essentials/.venv

    Add a virtual environment.

References

  • Git: a version control system for tracking changes in files.

  • Virtualenv: a tool to create isolated Python environment.

Homework

HW0: Getting Started

Task 1: Getting Started

In this assignment, you will:

  1. Set up your development environment,

  2. Install your first Python package using pip within a virtual environment,

  3. Run a test program to verify the installation and environment configuration, and

  4. Commit your changes and push them to your GitHub repository.

This will ensure your development workspace is properly configured for this course.

Package Installation

Once you set up the development environment:

  1. Open Terminal in PyCharm:

    1. Click the Terminal icon () at the bottom left, or

    2. Select the [View] > [Tool Windows] > [Terminal] menu).

  2. Update pip to the latest version (if necessary):

    python -m pip install --upgrade pip
  3. Install setuptools (if necessary):

    pip install setuptools
  4. Install the ELIT Tokenizer:

    pip install elit_tokenizer
  5. You will know the installation is successful when you see "Successfully installed ..." messages for each package in the terminal output.

Test Program

  1. Create the project structure:

    1. Create a new Python package called src in your nlp-essentials directory.

    2. Inside src, create a homework package.

    3. PyCharm will automatically create __init__.py files in both directories to mark them as Python packages.

  2. Create your first program:

    1. Create a Python file called getting_started.py inside homework.

    2. Copy the following code into the file:

    from elit_tokenizer import EnglishTokenizer
    
    if __name__ == '__main__':
        text = 'Emory NLP is a research lab in Atlanta, GA. It was founded by Jinho D. Choi in 2014. Dr. Choi is a professor at Emory University.'
        tokenizer = EnglishTokenizer()
        sentence = tokenizer.decode(text)
        print(sentence.tokens)
        print(sentence.offsets)
  3. Run the program:

    1. Choose the [Run] > [Run 'getting_started'] menu, or

    2. Use the green run button next to the main block.

    Run the program.
  4. Verify the output; your program is working correctly if you see this output:

    ['Emory', 'NLP', 'is', 'a', 'research', 'lab', 'in', 'Atlanta', ',', 'GA', '.', 'It', 'was', 'founded', 'by', 'Jinho', 'D.', 'Choi', 'in', '2014', '.', 'Dr.', 'Choi', 'is', 'a', 'professor', 'at', 'Emory', 'University', '.']
    [(0, 5), (6, 9), (10, 12), (13, 14), (15, 23), (24, 27), (28, 30), (31, 38), (38, 39), (40, 42), (42, 43), (44, 46), (47, 50), (51, 58), (59, 61), (62, 67), (68, 70), (71, 75), (76, 78), (79, 83), (83, 84), (85, 88), (89, 93), (94, 96), (97, 98), (99, 108), (109, 111), (112, 117), (118, 128), (128, 129)]

Commit & Push

  1. Create a .gitignore file:

    1. Create the file in your nlp-essentials root directory

    2. Add the following lines to exclude unnecessary files:

    .idea/
    .venv/
  2. Stage your files for commit:

    1. Add the following files to Git by right-clicking them and selecting [Git] > [Add]:

      .gitignore
      src/__init__.py
      src/homework/__init__.py
      src/homework/getting_started.py
    2. Files should turn green when successfully added. If files do not change color, restart PyCharm and try again.

  3. Commit and push your changes:

    1. Right-click the nlp-essentials directory.

    2. Select [Git] > [Commit Directory].

    3. Write a descriptive commit message (e.g., "Initial setup and tokenizer test")

    4. Click [Commit and Push] (not just Commit)

    5. Click [Push] in the next dialog to upload to your GitHub repository.

  4. Verify your submission:

    1. Visit your GitHub repository in a web browser.

    2. Confirm that all files are properly present and contain the correct content.

Submission

Submit the URL of your GitHub repository to Canvas.

Task 2: Project Ideas

Share your team project concept by filling out the form in Canvas (about 100-150 words). Your description will be posted on the Project Ideas page to help classmates discover shared interests and form teams.

Rubric

  • GitHub Setup (0.2 points):

    • Private repository created.

    • All instructors added as collaborators.

  • Project Organization (0.2 points):

    • Correct directory structure.

    • No unnecessary files committed

  • Version Control (0.3 points):

    • All required files committed and successfully pushed to GitHub.

    • Content of the files are correct.

  • Code Implementation (0.3 points):

    • The program executes without errors.

    • Produces correct tokenizer output.

  • Project Ideas (1 point)

    • Is the team project idea well described?