Homework
HW0: Getting Started
Task 1: Getting Started
In this assignment, you will:
Set up your development environment,
Install your first Python package using pip within a virtual environment,
Run a test program to verify the installation and environment configuration, and
Commit your changes and push them to your GitHub repository.
This will ensure your development workspace is properly configured for this course.
Package Installation
Once you set up the development environment:
Open Terminal in PyCharm:
Click the Terminal icon (
) at the bottom left, or
Select the
[View] > [Tool Windows] > [Terminal]
menu).
Update pip to the latest version (if necessary):
python -m pip install --upgrade pip
Install setuptools (if necessary):
pip install setuptools
Install the ELIT Tokenizer:
pip install elit_tokenizer
You will know the installation is successful when you see "Successfully installed ..." messages for each package in the terminal output.
Test Program
Create your first program:
Create a Python file called getting_started.py inside
homework
.Copy the following code into the file:
from elit_tokenizer import EnglishTokenizer if __name__ == '__main__': text = 'Emory NLP is a research lab in Atlanta, GA. It was founded by Jinho D. Choi in 2014. Dr. Choi is a professor at Emory University.' tokenizer = EnglishTokenizer() sentence = tokenizer.decode(text) print(sentence.tokens) print(sentence.offsets)
Run the program:
Choose the
[Run] > [Run 'getting_started']
menu, orUse the green run button next to the main block.
Run the program. Verify the output; your program is working correctly if you see this output:
['Emory', 'NLP', 'is', 'a', 'research', 'lab', 'in', 'Atlanta', ',', 'GA', '.', 'It', 'was', 'founded', 'by', 'Jinho', 'D.', 'Choi', 'in', '2014', '.', 'Dr.', 'Choi', 'is', 'a', 'professor', 'at', 'Emory', 'University', '.'] [(0, 5), (6, 9), (10, 12), (13, 14), (15, 23), (24, 27), (28, 30), (31, 38), (38, 39), (40, 42), (42, 43), (44, 46), (47, 50), (51, 58), (59, 61), (62, 67), (68, 70), (71, 75), (76, 78), (79, 83), (83, 84), (85, 88), (89, 93), (94, 96), (97, 98), (99, 108), (109, 111), (112, 117), (118, 128), (128, 129)]
Commit & Push
Create a .gitignore file:
Create the file in your nlp-essentials root directory
Add the following lines to exclude unnecessary files:
.idea/ .venv/
Stage your files for commit:
Add the following files to Git by right-clicking them and selecting
[Git] > [Add]
:.gitignore src/__init__.py src/homework/__init__.py src/homework/getting_started.py
Files should turn green when successfully added. If files do not change color, restart PyCharm and try again.
Commit and push your changes:
Right-click the nlp-essentials directory.
Select
[Git] > [Commit Directory]
.Write a descriptive commit message (e.g., "Initial setup and tokenizer test")
Click
[Commit and Push]
(not justCommit
)Click
[Push]
in the next dialog to upload to your GitHub repository.
Verify your submission:
Visit your GitHub repository in a web browser.
Confirm that all files are properly present and contain the correct content.
Submission
Submit the URL of your GitHub repository to Canvas.
Task 2: Project Ideas
Share your team project concept by filling out the form in Canvas (about 100-150 words). Your description will be posted on the Project Ideas page to help classmates discover shared interests and form teams.
Rubric
GitHub Setup (0.2 points):
Private repository created.
All instructors added as collaborators.
Project Organization (0.2 points):
Correct directory structure.
No unnecessary files committed
Version Control (0.3 points):
All required files committed and successfully pushed to GitHub.
Content of the files are correct.
Code Implementation (0.3 points):
The program executes without errors.
Produces correct tokenizer output.
Project Ideas (1 point)
Is the team project idea well described?
Last updated
Was this helpful?