arrow-left

All pages
gitbookPowered by GitBook
1 of 4

Loading...

Loading...

Loading...

Loading...

Model Design

hashtag
Encoder Challenge

Given the input text W={w1,…,wn}W = \{w_1, \ldots, w_n\}W={w1​,…,wn​} where wiw_iwi​ is the iii'th token in ​WWW, a contextualized encoder (e.g., BERTarrow-up-right) takes WWW and generates an embedding ei∈R1Γ—de_i \in \mathbb{R}^{1 \times d}eiβ€‹βˆˆR1Γ—d for every token wi∈Ww_i \in Wwiβ€‹βˆˆWusing wiw_iwi​ as well as its context. The challenge is that this encoder can take only up the mmm-number of tokens such that it cannot handle any input where n>mn > mn>m.

circle-exclamation

What are the ways to handle arbitrarily large input using a contextualized encoder?

hashtag
Baseline

One popular method is called the "Sliding Window", which splits the input into multiple blocks of text, generates embeddings for each block separately, and merges them at the end.

Let where if ; otherwise, such that . Then, the encoder takes each ​ and generates for every token in . Finally, the embedding matrix is created by sequentially stacking all embeddings in .

chevron-rightWhat are the potential issues with this baseline method?hashtag

The baseline method does not have enough context to generate high-quality embeddings for tokens on the edge of each block.

hashtag
Advanced (Exercise)

Modify the baseline method such that a block has overlapped tokens with its surrounding blocks (both front and back). Once all blocks are encoded, each overlapped token should have two embeddings. Create an average embedding of those two embeddings and make it the final embedding for the overlapped token.

hashtag
Decoder Challenge

In a (aka, an encoder-decoder model), a decoder takes an embedding matrix and predicts what token should come next. It is often the case that this embedding matrix is also bounded by a certain size, which becomes an issue when the size of the matrix becomes larger than (for the case above, where ). One common method to handle this issue is to use an attention matrix for dimensionality reduction as follows:

The embedding matrix is first transposed to then multiplied by an attention matrix such that . Finally, the transpose of , that is gets fed into the decoder.

circle-exclamation

Would the following method be equivalent to the above method?

An attention matrix is multiplied by the embedding matrix such that . Finally, gets fed into the decoder.

Algorithm Development

This chapter discusses how to develop new algorithms and write them in pseudocode.

hashtag
Task

Your task is to design an algorithm that takes a post with the title and its comments with replies from a discussion forum (e.g., Reddit) and converts them into a multi-turn one-to-one dialogue.

hashtag

Approach

This chapter guides you to write the approach section.

hashtag
Content

Model Design
  • Data Creation

  • circle-check

    Typically, it is better to write the approach section as abstract as possible so your methods become generalizable for many tasks. For example, even if you use BERTarrow-up-right as an encocder but your approach can take any transformer as an encoder, it is better to denote that your method uses a transformer as an encoder instead of BERT.

    Algorithm Development
    Input

    A post with the title (https://www.reddit.com/r/college/comments/v7h9rsarrow-up-right):

    How do you focus when you’re depressed?

    I have so many assignments due, with exams coming up too. Life's just keeps hitting me recently and I'm finding it really hard to sit down and take information in. Writing is hard, listening and paying attention is hard. Even if I manage to listen or read none of the information stays in my head. Any help is very appreciated!!

    Comments and replies:

    Get up early everyday and use the library to study if you have one, idk if you're like me but as soon as I get home I'm kinda done for the day so it helps to stay somewhere where you can't really relax.

    • Thank you, I’ll give this a try tomorrow

    What helps me is embracing when I'm feeling down and allowing myself to take a deserved break. Sometimes I confuse my depressive episodes with burnout and it's important to know your limits. The biggest pro tip to not be overwhelmed with so much to do all at once is doing something every day. Dedicating simply 30 minutes to an hour a day of intense studying goes a long way over time vs cramming at the end. If you're able to do more than 1 hour then great! But know that you don't have to do 6-7 intense studying hours a day to be successful. Be intentional with your time and work smarter vs harder. Your future self will thank you.

    • This is so nice to hear, and very helpful, thank you!

    Hardest part is starting to study, once I have like 15 minutes into my study session that’s my only focus and just forget everything else.

    hashtag
    Overview

    Give a comparison overview of your algorithms with key features:

    We introduce two algorithms for the reddit-to-dialogue generation: the baseline algorithm considers every sentence in the post an utterance of Speaker 1 and each comment an utterance of Speaker 2 (Section 3.1), whereas the advanced algorithm finds an appropriate span of sentences from the post to form an utterance for Speaker 1 and an appropriate span of any comment to form an utterance for Speaker 2 (Section 3.2).

    Indicate the objective of your algorithm(s):

    The main objective is to generate a multi-turn dialogue using a post, its comments, and replies that flows naturally in context.

    Describe what the input and output data should be (possible with a figure) that are commonly applied to all algorithms:

    All algorithms assume that the number of sentences in the input post is less than or equal to the number of comments. The generated dialogues involve two speakers where utterances of Speakers 1 and 2 are extracted from the post and comments, respectively.

    hashtag
    Baseline

    hashtag
    Objective

    • The title or each sentence in the post is considered an utterance of Speaker 1 (S1).

    • For each utterance u1u_1u1​ of S1, find a comment that is the most relevant and make it the response to u1u_1u1​ from Speaker 2 (S2).

    hashtag
    Output

    S1 : How do you focus when you’re depressed?

    S2 : What helps me is embracing when I'm feeling down and allowing myself to take ...

    S1 : I have so many assignments due, with exams coming up too.

    S2 : Get up early everyday and use the library to study if you have one, ...

    S1 : Life's just keeps hitting me recently and I'm finding it really hard to sit down and take information in.

    S2 : Hardest part is starting to study, once I have like 15 minutes into my study session that’s my only focus and just forget everything else.

    hashtag
    Algorithm

    1. Illustrate the baseline algorithm in pseudocode. Create helper methods if they help the readability and/or generalizability of your algorithm.

    2. Give a brief overview of the algorithm by explaining what each line of the code does.

    3. Describe helper methods (if any) in detail.

    hashtag
    Overview

    Define the input:

    Let P=[p1,..,pn]P = [p_1, .., p_n]P=[p1​,..,pn​]be an input post where pip_ipi​ is the iii'th sentence in PPP, and C={C1,..,Cm}\mathbb{C} = \{C_1, .., C_m\}C={C1​,..,Cm​} be a set of PPP's comments such that Cj=[cj1,..,cjβ„“]C_j = [c_{j1}, .., c_{j\ell}]Cj​=[cj1​,..,cjℓ​] where CjC_jCj​ is the jjj'th comment in C\mathbb{C}C and cjkc_{jk}cjk​ is the kkk'th sentence in CjC_jCj​.

    circle-exclamation

    Is the input correctly described according to the objective?

    Initialize the output and auxiliary data structures:

    Let DDD be the list of utterances representing the output dialogue (L1) and TTT be a set of segments created from C\mathbb{C}C (L2).

    Describe the loop:

    The algorithm visits every sentence p0∈Pp_0 \in Pp0β€‹βˆˆP (L3) and appends it to DDD (L4). It then finds the most-relevant segment t^∈T\hat{t} \in Tt^∈T (L5) and adds t^\hat{t}t^ to DDD (L6).TTT gets trimmed with t^\hat{t}t^ (L7).

    Return the output:

    Finally, it returns DDD as the output (L8).

    hashtag
    Helper Methods

    Describe the first\textit{first}first method:

    The first\textit{first}first method removes and returns the first sentence in PPP.

    Describe the segment\textit{segment}segment method:

    The segment\textit{segment}segment method makes each comment a segment s.t. segment(C)={C1β€²,…,Cβ€²m}\textit{segment}(\mathbb{C}) = \{C'_1, \ldots, C'm\}segment(C)={C1′​,…,Cβ€²m}, where Cβ€²j=cj1β€‰βŒ’...⌒cjβ„“C'j = c_{j1}\,^\frown ...^\frown c_{j\ell}Cβ€²j=cj1β€‹βŒ’...⌒cjℓ​ (⌒^\frown⌒: text concatenation).

    Describe the ranker\textit{ranker}ranker method:

    The ranker\textit{ranker}ranker method takes DDD comprising all previous utterances and pip_ipi​, then estimates the likelihood of ttt being the next utterance.

    circle-exclamation

    How do you estimate such likelihoods?

    Describe the trim\textit{trim}trim method:

    the trim\textit{trim}trim method removes t^=Cjβ€²\hat{t} = C'_jt^=Cj′​ from TTT such that trim(T,t^)=Tβˆ–Cjβ€²\textit{trim}(T, \hat{t}) = T \setminus {C'_j}trim(T,t^)=Tβˆ–Cj′​.

    hashtag
    Advanced (Exercise)

    hashtag
    Objective

    • Any span of consecutive sentences is considered an utterance of S1.

    • For each utterance u1u_1u1​ of S1, find a span of any consecutive sentences in comments that is the most relevant and make it the response to u1u_1u1​ from S2.

    hashtag
    Output

    S1 : How do you focus when you’re depressed? I have so many assignments due, with exams coming up too.

    S2 : What helps me is embracing when I'm feeling down and allowing myself to take a deserved break.

    S1 : Life's just keeps hitting me recently and I'm finding it really hard to sit down and take information in.

    S2 : Get up early everyday and use the library to study if you have one, idk if you're like me but as soon as I get home I'm kinda done for the day so it helps to stay somewhere where you can't really relax.

    S1 : Writing is hard, listening and paying attention is hard.

    S2 : Hardest part is starting to study, once I have like 15 minutes into my study session that’s my only focus and just forget everything else.

    S1: Even if I manage to listen or read none of the information stays in my head. Any help is very appreciated!!

    S2: Sometimes I confuse my depressive episodes with burnout and it's important to know your limits. The biggest pro tip to not be overwhelmed with so much to do all at once is doing something every day.

    file-pdf
    198KB
    algo-baseline.pdf
    PDF
    arrow-up-right-from-squareOpen
    Algorithm - Baseline
    file-pdf
    222KB
    algo-exercise.pdf
    PDF
    arrow-up-right-from-squareOpen
    Algorithm - Advanced
    W = W_1 \cup \cdots \cup W_k \
    Wh={w(hβˆ’1)m+1,…,whm}W_h = \{w_{(h-1)m+1}, \ldots, w_{hm}\}Wh​={w(hβˆ’1)m+1​,…,whm​}
    hm<nhm < nhm<n
    Wh={w(hβˆ’1)m+1,…,wn}W_h = \{w_{(h-1)m+1}, \ldots, w_{n}\}Wh​={w(hβˆ’1)m+1​,…,wn​}
    km≀nkm \leq nkm≀n
    WhW_hWh​
    Eh={e(hβˆ’1)m+1,…,ehm}E_h = \{e_{(h-1)m+1}, \ldots, e_{hm}\}Eh​={e(hβˆ’1)m+1​,…,ehm​}
    WhW_hWh​
    E∈RnΓ—dE \in \mathbb{R}^{n \times d}E∈RnΓ—d
    Wβˆ€hW_{\forall h}Wβˆ€h​
    E∈RmΓ—dE \in \mathbb{R}^{m \times d}E∈RmΓ—d
    mmm
    E∈RnΓ—dE \in \mathbb{R}^{n \times d}E∈RnΓ—d
    n>mn > mn>m
    E∈RnΓ—dE \in \mathbb{R}^{n \times d}E∈RnΓ—d
    ET∈RdΓ—nE^T \in \mathbb{R}^{d \times n}ET∈RdΓ—n
    A∈RnΓ—mA \in \mathbb{R}^{n \times m}A∈RnΓ—m
    ETβ‹…Aβ†’D∈RdΓ—mE^T \cdot A \rightarrow D \in \mathbb{R}^{d \times m}ETβ‹…Aβ†’D∈RdΓ—m
    DDD
    DT∈RmΓ—dD^T \in \mathbb{R}^{m \times d}DT∈RmΓ—d
    A∈RmΓ—nA \in \mathbb{R}^{m \times n}A∈RmΓ—n
    E∈RnΓ—dE \in \mathbb{R}^{n \times d}E∈RnΓ—d
    Aβ‹…Eβ†’D∈RmΓ—dA \cdot E \rightarrow D \in \mathbb{R}^{m \times d}Aβ‹…Eβ†’D∈RmΓ—d
    DDD
    sequence-to-sequence modelarrow-up-right

    Data Creation

    When you create a dataset, the followings need to be clearly described:

    • Data collection (e.g., sources of the data).

    • Preprocessing if performed (e.g., scripts that you write, existing tools used).

    Annotation scheme and guidelines if conducted with justification.
  • People involved in this process (e.g., annotators, survey subjects).

  • Quality of the created data (e.g., inter-annotator agreement).

  • Statistics and analysis of the original, preprocessed, annotated data.

  • Here are a few papers presenting new datasets:

    • , Li et al., EMNLP 2020 (see Section 3).

    • , Yang and Choi, SIGDIAL, 2019 (see Section 3).

    Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware Transformer Modelsarrow-up-right
    FriendsQA: Open-Domain Question Answering on TV Show Transcriptsarrow-up-right