arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

5.3. Information Extraction

Consider that you want to extract someone's call name(s) during a dialogue in real time:

Design a prompt that extracts all call names provided by the user.

How does the speaker want to be called? Respond in the one-line JSON format such as {"call_names": ["Mike", "Michael"]}: My friends call me Pete, my students call me Dr. Parker, and my parents call me Peter. 

In "My friends call me Pete, my students call me Dr. Parker, and my parents call me Peter.", how does the speaker want to be called? Respond in the following JSON format: {"call_names": ["Mike", "Michael"]}

Let us write a function that takes the user input and returns the GPT output in the JSON format:

  • #2-6: uses the model to retrieve the GPT output.

  • #8-10: uses the regular expression (if provided) to extract the output in the specific format.

Let us create a macro that calls MacroGPTJSON:

  • #3: the task to be requested regarding the user input (e.g., How does the speaker want to be called?).

  • #4: the example output where all values are filled (e.g., {"call_names": ["Mike", "Michael"]}).

Override the run method in MacroGPTJSON:

  • #2-3: creates a input prompt to the GPT API.

  • #4-5: retreives the GPT output using the prompt.

  • #7-11

Let us create another macro called MacroNLG:

  • #3: is a function that takes a variable table and returns a string output.

Finally, we use the macros in a dialogue flow:

The helper methods can be as follow:

S: Hi, how should I call you?
U: My friends call me Jin, but you can call me Jinho. Some students call me Dr. Choi as well.
def gpt_completion(input: str, regex: Pattern = None) -> str:
    response = openai.ChatCompletion.create(
        model='gpt-3.5-turbo',
        messages=[{'role': 'user', 'content': input}]
    )
    output = response['choices'][0]['message']['content'].strip()

    if regex is not None:
        m = regex.search(output)
        output = m.group().strip() if m else None

    return output
#5: the example output where all collections are empty (e.g., {"call_names": []}).
  • #6: the to check the information.

  • #7: it is a function that takes the STDM variable dictionary and the JSON output dictionary and sets necessary variables.

  • : checks if the output is in a proper JSON format.
  • #13-14: updates the variable table using the custom function.

  • #15-16: updates the variable table using the same keys as in the JSON output.

  • ChatCompletitionarrow-up-right
    class MacroGPTJSON(Macro):
        def __init__(self, request: str, full_ex: Dict[str, Any], empty_ex: Dict[str, Any] = None, set_variables: Callable[[Dict[str, Any], Dict[str, Any]], None] = None):
            self.request = request
            self.full_ex = json.dumps(full_ex)
            self.empty_ex = '' if empty_ex is None else json.dumps(empty_ex)
            self.check = re.compile(regexutils.generate(full_ex))
            self.set_variables = set_variables
    def run(self, ngrams: Ngrams, vars: Dict[str, Any], args: List[Any]):
        examples = f'{self.full_ex} or {self.empty_ex} if unavailable' if self.empty_ex else self.full_ex
        prompt = f'{self.request} Respond in the JSON schema such as {examples}: {ngrams.raw_text().strip()}'
        output = gpt_completion(prompt)
        if not output: return False
    
        try:
            d = json.loads(output)
        except JSONDecodeError:
            print(f'Invalid: {output}')
            return False
    
        if self.set_variables:
            self.set_variables(vars, d)
        else:
            vars.update(d)
            
        return True
    class MacroNLG(Macro):
        def __init__(self, generate: Callable[[Dict[str, Any]], str]):
            self.generate = generate
    
        def run(self, ngrams: Ngrams, vars: Dict[str, Any], args: List[Any]):
            return self.generate(vars)
    transitions = {
        'state': 'start',
        '`Hi, how should I call you?`': {
            '#SET_CALL_NAMES': {
                '`Nice to meet you,` #GET_CALL_NAME `. Can you tell me where your office is and when your general office hours are?`': {
                    '#SET_OFFICE_LOCATION_HOURS': {
                        '`Can you confirm if the following office infos are correct?` #GET_OFFICE_LOCATION_HOURS': {
                        }
                    }
                }
            },
            'error': {
                '`Sorry, I didn\'t understand you.`': 'end'
            }
        }
    }
    
    macros = {
        'GET_CALL_NAME': MacroNLG(get_call_name),
        'GET_OFFICE_LOCATION_HOURS': MacroNLG(get_office_location_hours),
        'SET_CALL_NAMES': MacroGPTJSON(
            'How does the speaker want to be called?',
            {V.call_names.name: ["Mike", "Michael"]}),
        'SET_OFFICE_LOCATION_HOURS': MacroGPTJSON(
            'Where is the speaker\'s office and when are the office hours?',
            {V.office_location.name: "White Hall E305", V.office_hours.name: [{"day": "Monday", "begin": "14:00", "end": "15:00"}, {"day": "Friday", "begin": "11:00", "end": "12:30"}]},
            {V.office_location.name: "N/A", V.office_hours.name: []},
            set_office_location_hours
        ),
    }
    def get_call_name(vars: Dict[str, Any]):
        ls = vars[V.call_names.name]
        return ls[random.randrange(len(ls))]
    
    def get_office_location_hours(vars: Dict[str, Any]):
        return '\n- Location: {}\n- Hours: {}'.format(vars[V.office_location.name], vars[V.office_hours.name])
    
    def set_office_location_hours(vars: Dict[str, Any], user: Dict[str, Any]):
        vars[V.office_location.name] = user[V.office_location.name]
        vars[V.office_hours.name] = {d['day']: [d['begin'], d['end']] for d in user[V.office_hours.name]}
    regular expressionarrow-up-right