arrow-left

All pages
gitbookPowered by GitBook
1 of 5

Loading...

Loading...

Loading...

Loading...

Loading...

Overview

By Jinho D. Choi (2025 Edition)

Natural Language Processing (NLP) is a dynamic field within Artificial Intelligence focused on developing computational models to understand, interpret, and generate human language. As NLP technologies become increasingly embedded in our daily lives, understanding its fundamentals is crucial for both leveraging its potential and enhancing our interaction with language-based systems.

This course is designed to build a robust foundation in the core principles of modern NLP. We begin with text processing techniques that show how to manipulate textual data for model development. We then progress to language models, enabling computational models to comprehend and generate human language, and vector space models that convert textual data into machine-readable formats. Advanced topics include distributional semantics for creating context-aware word embeddings, and contextual encoding for analyzing word relationships within their surrounding text.

The latter part of the course focuses on practical application through team projects. Students will have the opportunity to work with cutting-edge NLP technologies, such as large language models, to develop real-world applications. This hands-on approach encourages creativity and innovation, with students proposing their own ideas and presenting demonstrations to their peers.

Learning assessment combines concept quizzes to reinforce theoretical understanding with hands-on programming assignments that develop practical implementation skills. By the conclusion of the course, students will have gained the knowledge and skills to navigate and contribute to the rapidly evolving field of NLP.

hashtag
Prerequisites

  • Introduction to Python Programming

  • Introduction to Machine Learning

hashtag
Sections

Homework
Syllabus
Schedule
Development Environment

Development Environment

This guide will help you set up your development environment by installing required tools: Python programming language, GitHub for version control, and PyCharm IDE.

hashtag
Python

  • Install Pythonarrow-up-right version 3.14.x or higher. Earlier versions may not be compatible with this course.

  • Take some time to familiarize yourself with Python's .

hashtag
GitHub

  1. Create a account (if you do not already have one). As a student, you can get features for free through the .

  2. Login to GitHub.

  3. Create a new repository named "nlp-essentials" and set its visibility to Private.

hashtag
PyCharm

  1. Install on your local machine:

    1. As a student, you can get for free by applying for a .

    2. The following instructions are based on PyCharm 2024.3.x Professional Edition.

  1. Create a new PyCharm project from GitHub:

    1. On the PyCharm welcome screen, click [Clone Repository].

    2. In the new window, select [GitHub] from the left menu, choose your nlp-essentials

hashtag
References

  • : a version control system for tracking changes in files.

  • : a tool to create isolated Python environment.

Go to [Settings] in your repository, and select [Collaborators and teams].

  • Click [Add people], and add each instructor using their GitHub usernames:

    1. Find their GitHub IDs in the "Instructors" section of the Syllabus.

    2. Enter each username and send the collaboration invitation.

  • Verify that all instructors have been added as collaborators.

  • Configure your GitHub account:

    1. Go to [Settings] > [Version Control] > [GitHub].

    2. Press [+], select [Log in via GitHub], and follow the browser prompts to authorize PyCharm with your GitHub account.

    3. Once connected, you will be able to access GitHub directly from PyCharm for version control operations.

    repository, and click
    [Clone]
    (verify the directory name is "nlp-essentials").
  • Set Up a Python virtual environment:

    1. Go to [Settings] > [Project: nlp-essentials] > [Project Interpreter].

    2. Click [Add Interpreter] and choose [Add Local Interpreter].

  • In the prompted menu, choose [Add Local Environment], configure as follows, then click [OK]:

    • Environment: Generate new

    • Type: Virtualenv

    • Base python: the Python version you installed above

    • Location: YOUR_LOCAL_PATH/nlp-essentials/.venv

  • new featuresarrow-up-right
    GitHubarrow-up-right
    GitHub Proarrow-up-right
    GitHub Student Developer Packarrow-up-right
    PyCharmarrow-up-right
    PyCharm Professionalarrow-up-right
    JetBrains Educational Licensearrow-up-right
    Gitarrow-up-right
    Virtualenvarrow-up-right
    Create a GitHub repository.
    Add collaborators to your GitHub repository.
    Add your GitHub account to PyCharm.
    Add a virtual environment.

    Syllabus

    CS|QTM|LING-329: Computational Linguistics (Spring 2025)

    hashtag
    General

    • GitHub: https://github.com/emory-courses/nlp-essentials/arrow-up-right

    • Book:

    • Time: MW 2 - 3:45 PM

    • Location: MSC W201

    hashtag
    Instructors

      • Associate Professor of Computer Science, Data and Decision Sciences, Linguistics

      • Office Hours and Location: MW 4:00 - 5:00 PM, Emerson Hall 500

    hashtag
    Grading

    • Homework: 65%

    • Team Project: 35%

    • Your work is governed by the . Honor code violations (e.g., copies from any source, including colleagues and internet sites) will be referred to the Emory Honor Council.

    hashtag
    Homework

    • Each topic will include homework that combines concept quizzes and programming assignments to assess your understanding of the subject matter.

    • Assignments must be submitted individually. While discussions are allowed, your work must be original.

    • Late submissions within a week will be accepted with a grading penalty of 15%. They will not be accepted after the solutions are discussed in class.

    hashtag
    Concept Quizzes

    • Each section incorporates questions to explore the content more comprehensively, with their corresponding answers slated for discussion in the class.

    • While certain questions may have multiple valid answers, the grading will be based on the responses discussed in class, and alternative answers will be disregarded. This approach allows us to distinguish between answers discussed in class and those generated by AI tools like ChatGPT.

    hashtag
    Programming Assignments

    • You are encouraged to use any code examples and invoke APIs provided in this book.

    • Feel free to create additional functions and variables in the assigned Python file. For each homework, ensure that all your implementations are included in the respective Python file located under the corresponding directory.

    • Usage of packages not covered in the assigned chapter is prohibited. Ensure that your code does not rely on the installation of additional packages, as we will not be able to execute your program for evaluation if external dependencies are needed.

    hashtag
    Team Project

    • You are expected to:

      • Form a team of 3-4 members.

      • Present a to share your proposed idea, and write a .

    hashtag
    Project Grading

    • Team members receive the same grade for the pitch presentation, live demonstration, and demonstration video.

    • Peer evaluations from other teams factor into your team grade.

    • Your feedbacks to other teams are graded individually.

    For the project and final reports, you are required to indicate the contribution percentage of each team member, which impacts the individual grades for the assignment.

    hashtag
    Contribution

    If your team of two members received 4 out of 5 points, for example, and you indicate that your contribution was 60% while your teammate's was 40%, the points are distributed as follows:

    • You receive: 4 (team points) ⨉ 60 (your contributions) / 60 (max contributions) = 4 points.

    • Your teammate receive: 4 (team points) ⨉ 40 (your teammate's contributions) / 60 (max contributions) = 2.67 points.

    This approach ensures that the grading reflects the effort and input of each team member, promoting fairness and accountability.

    hashtag
    Consensus

    • Each team is required to submit a single, agreed-upon chart detailing the contribution percentages of all members for each team assignment. This means that you and your teammates must reach a consensus on the contribution rates before submitting your work.

    • Open communication and transparency are essential in this process. Disagreements should be resolved within the team, ensuring that the final submission reflects the true division of labor and contributions.

    By adhering to these guidelines, you not only produce a strong research paper but also develop key skills in teamwork and fair assessment of contributions.

    GitHub: jdchoi77

  • TBD

    • Ph.D. Student in Computer Science and Informatics

    • Office Hours and Location:

      • Hours: TBD

      • Location: TBD

    • GitHub: TBD

  • TBD

    • Ph.D. Student in Computer Science and Informatics

    • Office Hours and Location:

      • Hours: TBD

      • Location: TBD

    • GitHub: TBD

  • Requests for absence/rescheduling due to severe personal events (such as health, family, or personal reasons) impacting course performance must be supported by a letter from the Office for Undergraduate Educationarrow-up-right.
    Deliver a live demonstration showcasing your working project, create a demonstration video, and write a final report documenting details about your project.
  • Provide individual feedback on other teams' presentations and demonstrations.

  • Participation in pitch presentations and live demonstrations is compulsory. Failure to attend any of these events will result in a zero grade for the respective activity. In the event of unavoidable absence due to severe personal circumstances, a formal letter from the Office for Undergraduate Educationarrow-up-right must accompany any excuses.

  • http://emory.gitbook.io/nlp-essentials/arrow-up-right
    Jinho Choienvelope
    Emory Honor Codearrow-up-right
    project pitch
    proposal report

    Homework

    HW0: Getting Started

    hashtag
    Task 1: Getting Started

    In this assignment, you will:

    1. Set up your development environment,

    2. Install your first Python package using pip within a virtual environment,

    3. Run a test program to verify the installation and environment configuration, and

    4. Commit your changes and push them to your GitHub repository.

    This will ensure your development workspace is properly configured for this course.

    hashtag
    Package Installation

    Once you set up the :

    1. Open Terminal in PyCharm:

      1. Click the Terminal icon () at the bottom left, or

      2. Select the [View] > [Tool Windows] > [Terminal] menu).

    hashtag
    Test Program

    1. Create the project structure:

      1. Create a new Python package called in your nlp-essentials directory.

      2. Inside src, create a package.

    hashtag
    Commit & Push

    1. Create a file:

      1. Create the file in your nlp-essentials root directory

      2. Add the following lines to exclude unnecessary files:

    hashtag
    Submission

    Submit the URL of your GitHub repository to Canvas.

    hashtag
    Task 2: Project Ideas

    Share your team project concept by filling out the form in Canvas (about 100-150 words). Your description will be posted on the page to help classmates discover shared interests and form teams.

    hashtag
    Rubric

    • GitHub Setup (0.2 points):

      • Private repository created.

      • All instructors added as collaborators.

    Schedule

    CS|QTM|LING-329: Computational Linguistics (Spring 2025)

    Date
    Topic
    Assignment

    01/14

    01/19

    Martin Luther King Day

    01/21

    • Attendance is mandatory for all Project Pitches and Live Demonstrations sessions.

    01/26

    (continue)

    01/28

    (continue)

    02/02

    (continue)

    Homework 1

    02/04

    Speed Dating

    Team Formation

    02/09

    Language Models

    02/11

    (continue)

    02/16

    (continue)

    Proposal Pitch

    02/18

    (continue)

    Homework 2

    02/23

    LLM Interaction

    02/25

    (continue)

    03/02

    (continue)

    03/04

    (continue)

    Homework 3

    03/09

    Spring Break

    03/11

    Spring Break

    03/16

    Proposal Pitches

    03/18

    Proposal Pitches

    Proposal Report

    03/23

    Vector Space Models

    03/25

    (continue)

    03/30

    (continue)

    04/01

    (continue)

    Homework 4

    04/06

    Distributional Semantics

    Live Demonstration

    04/08

    (continue)

    04/13

    (continue)

    04/15

    (continue)

    Homework 5

    04/20

    Live Demonstrations

    04/22

    Live Demonstrations

    Final Report

    04/27

    Contextual Encoding

    Homework 6

    Overview
    Homework 0
    Text Processing

    Update piparrow-up-right to the latest version (if necessary):

  • Install setuptoolsarrow-up-right (if necessary):

  • Install the ELIT Tokenizerarrow-up-right:

  • You will know the installation is successful when you see "Successfully installed ..." messages for each package in the terminal output.

  • PyCharm will automatically create __init__.py files in both directories to mark them as Python packages.

  • Create your first program:

    1. Create a Python file called getting_started.pyarrow-up-right inside homework.

    2. Copy the following code into the file:

  • Run the program:

    1. Choose the [Run] > [Run 'getting_started'] menu, or

    2. Use the green run button next to the main block.

    Run the program.
  • Verify the output; your program is working correctly if you see this output:

  • Stage your files for commit:
    1. Add the following files to Git by right-clicking them and selecting [Git] > [Add]:

    2. Files should turn green when successfully added. If files do not change color, restart PyCharm and try again.

  • Commit and push your changes:

    1. Right-click the nlp-essentials directory.

    2. Select [Git] > [Commit Directory].

    3. Write a descriptive commit message (e.g., "Initial setup and tokenizer test")

    4. Click [Commit and Push] (not just Commit)

    5. Click [Push] in the next dialog to upload to your GitHub repository.

  • Verify your submission:

    1. Visit your GitHub repository in a web browser.

    2. Confirm that all files are properly present and contain the correct content.

  • Project Organization (0.2 points):
    • Correct directory structure.

    • No unnecessary files committed

  • Version Control (0.3 points):

    • All required files committed and successfully pushed to GitHub.

    • Content of the files are correct.

  • Code Implementation (0.3 points):

    • The program executes without errors.

    • Produces correct tokenizer output.

  • Project Ideas (1 point)

    • Is the team project idea well described?

  • development environment
    srcarrow-up-right
    homeworkarrow-up-right
    .gitignorearrow-up-right
    Project Ideas
    python -m pip install --upgrade pip
    pip install setuptools
    pip install elit_tokenizer
    from elit_tokenizer import EnglishTokenizer
    
    if __name__ == '__main__':
        text = 'Emory NLP is a research lab in Atlanta, GA. It was founded by Jinho D. Choi in 2014. Dr. Choi is a professor at Emory University.'
        tokenizer = EnglishTokenizer()
        sentence = tokenizer.decode(text)
        print(sentence.tokens)
        print(sentence.offsets)
    ['Emory', 'NLP', 'is', 'a', 'research', 'lab', 'in', 'Atlanta', ',', 'GA', '.', 'It', 'was', 'founded', 'by', 'Jinho', 'D.', 'Choi', 'in', '2014', '.', 'Dr.', 'Choi', 'is', 'a', 'professor', 'at', 'Emory', 'University', '.']
    [(0, 5), (6, 9), (10, 12), (13, 14), (15, 23), (24, 27), (28, 30), (31, 38), (38, 39), (40, 42), (42, 43), (44, 46), (47, 50), (51, 58), (59, 61), (62, 67), (68, 70), (71, 75), (76, 78), (79, 83), (83, 84), (85, 88), (89, 93), (94, 96), (97, 98), (99, 108), (109, 111), (112, 117), (118, 128), (128, 129)]
    .gitignore
    src/__init__.py
    src/homework/__init__.py
    src/homework/getting_started.py
    .idea/
    .venv/