Your task is to evaluate your team chatbot given the following instructions:
This is an individual assignment. Each member must evaluate the team chatbot separately.
You need to evaluate at least 5 categories (e.g., team evaluation). Give a clear description of each category. You are welcome to discuss them with your team; however, your description must be original.
Create a static form to evaluate your chatbot. Your form should include a metric (e.g., Likert scale of 1 - 3) to evaluate each category and ask for reasons for the assessment.
Find a group of 10 people suitable to interact with your chatbot. Each interactor will evaluate your chatbot after one or more conversations using the static form.
Submit quiz6.pdf
summarizing your evaluation results, including:
Your evaluation categories and their descriptions.
Quantitative analysis of your chatbot using the metric.
Qualitative analysis of your chatbot using assessment reasoning.
Based on the analyses, potential improvements can (or should) be made.
The degree to which the chatbot is able to achieve its intended goals and produce desirable outcomes for users.
Whether the advice provided helps the user feel more confident in their conversational abilities.
How well it provides mental health support to individuals with varying levels of stress.
Whether the user found the previous interaction helpful.
Whether the chatbot was able to successfully complete her goal with the help of the chatbot.
The user's rating of how much the chatbot helped them in their career path.
How users feel about its ability to prepare them for real interviews.
The degree to which the chatbot meets users' expectations and provides a satisfactory experience.
Whether the insights, questions, and communication are well-balanced and whether the users feel comfortable with the ideal proportion.
How comfortable the user felt during the conversation.
Whether it understood the user's needs and provided personalization based on their style and preferences.
Whether the user felt a bit better after the interaction.
The pleasantness of the system persona and whether it meets users' expectations.
The degree to which the chatbot provides accurate and correct information or recommendations to users.
How accurately it identifies the presence of stress and changes response style accordingly.
Whether the bot's suggestions were relevant to the user's personal taste, style, and preferences, and whether the user would wear everything suggested by the bot.
The accuracy of its performance metrics, such as the accuracy of style recommendations.
Whether it is successful in making accurate recommendations for the user.
The quality and specificity of the feedback users received from the system, and whether the feedback provided accurate and correct information.
The degree to which the chatbot's responses and actions can be understood and interpreted by users.
Whether users can understand what the chatbot is saying.
Whether users experience instances of missed messages, misunderstandings, or abrupt exits, and whether the chatbot's responses can be interpreted clearly by users.
Its ability to understand the context of users' inquiries and provide relevant responses that can be understood and interpreted by users.
Whether the chatbot can assess the user's level of knowledge and adjust its responses accordingly, whether it uses language that the user can understand, and whether the user feels engaged with the chatbot's insights.
The degree to which the chatbot's responses and actions are logically consistent and connected with each other, and with the context of the conversation.
Whether it can recognize the topics mentioned by the user and provide relevant information that is logically connected with those topics.
Whether its responses are self-explanatory and related to each other in a logical and coherent manner.
Its level of consistency with actual interviews, i.e., whether its responses are logically consistent with the expectations and norms of real-life interviews.
Whether its suggestions are relevant and helpful for the given topic, and whether they are logically consistent with the context of the conversation.
The degree to which the chatbot's responses and actions resemble those of a human being, and are perceived as natural, fluent, and realistic by users.
How closely its responses resemble those of a human being, i.e., whether they are naturalistic in terms of language, syntax, grammar, tone, and other linguistic aspects.
The degree to which the chatbot's responses and actions adhere to ethical principles and standards, and avoid causing harm or offense to users or other stakeholders.
Whether it contains any content or responses that are offensive or inappropriate, such as hate speech, discrimination, or harassment.
The degree to understand and respond to the emotions and feelings of users in a compassionate and sensitive manner, and to provide emotional support or encouragement when appropriate.
Whether it demonstrates an understanding of the user's emotions and concerns, and responds in a way that is supportive and validating.
Using appropriate language and tone, acknowledging the user's perspective, and offering words of encouragement or empathy.
The degree to provide accurate and useful information to users in response to their queries or requests for assistance.
Its ability to provide factual information about fitness, such as exercise techniques, workout routines, or nutrition advice.
Assessing the accuracy and relevance of the information provided, as well as the chatbot's ability to understand and respond appropriately to the user's specific needs and goals.
The degree to capture and hold the user's attention, as well as to create a positive and enjoyable user experience.
The user's level of interest and involvement during the conversation.
Assessing the chatbot's ability to generate interesting and relevant topics of conversation, to respond in a timely and personalized manner, and to use engaging language and visual elements to create a more immersive and interactive experience.
The degree to tailor its responses and recommendations to the individual user's preferences, needs, and past interactions.
Evaluate the chatbot's ability to personalize advice based on the user's preferences and past interactions.
Whether the chatbot can effectively use data about the user, such as their history, feedback, and stated preferences, to provide personalized responses and recommendations. The following example illustrates this concept:
What is a conversational analysis, and why is it important?
Conversational Analysis is the study of how people communicate in everyday interactions.
Communication is a fundamental part of human interaction, and studying it can help us better understand social dynamics and cultural norms.
It involves analyzing the structure of conversations, including turn-taking, topic initiation and maintenance, and repair strategies.
By examining these elements, researchers can gain insight into the social dynamics of a particular interaction and the underlying cultural norms that guide communication.
Can the study of Human-to-Human Conversational Analysis be applied to analyze Huaman-to-Machine conversations?
How to conduct a conversational analysis?
-> It allows you to analyze the conversation in detail and identify patterns in the interaction.
-> It helps you understand the social context of the conversation.
-> It helps you understand the patterns of communication between speakers.
-> It helps you understand how speakers introduce and develop discussion topics.
-> It helps you understand how speakers handle communication breakdowns and errors.
-> It helps you understand how cultural norms, values, and beliefs influence communication.
If your conversations are based on an audio or video interface, recording and transcribing them enables you to go back and analyze the conversation to gain a deeper understanding.
There are several automatic transcribers available:
When analyzing a conversation, it is important to consider the gender, age, social status, and other relevant characteristics of the speakers. These factors can influence communication and provide important context for understanding the conversation.
Diversity: Including a diverse range of participants can provide insights into how different groups communicate and interact. It can include age, gender, race, ethnicity, socioeconomic status, and cultural background.
Context: Conversations in different settings, such as workplaces or social gatherings, may involve different communication styles and norms.
Purpose: Conversations focused on a specific topic or goal may involve different communication strategies than casual conversations.
Power Dynamics: Conversations that involve power imbalances, such as a boss and an employee, may involve different communication patterns than conversations between peers.
Relationship: Conversations between strangers may involve different communication patterns than conversations between friends or family members.
Identify ideal participants for testing your chatbot and discuss your strategy to gather such participants for the final project.
Turn-taking refers to the way that speakers take turns participating in the interaction. It includes both the decision to speak and the transition between speakers. Turn-taking can reveal:
Power imbalances between speakers, with some individuals dominating the conversation and others having less opportunity to speak.
How speakers collaborate and negotiate with each other to move the conversation forward.
Analysis aspects:
Message Length: The length of messages can indicate a speaker's intention to take a turn or signal that they have finished speaking.
Response Time: The time it takes for a speaker to respond to a message can indicate their intention to take a turn or signal that they are finished speaking.
Use of Markers: Speakers may use markers such as ellipses, dashes, or quotation marks to signal that they are taking a turn or indicate that they are listening to another speaker.
Emojis / Emoticons: The use of emojis or emoticons can indicate a speaker's attitude, emotion, or intention to take a turn.
Repetition: Speakers may repeat words or phrases to indicate their intention to take a turn or emphasize a point.
What are your strategies to balance the lengths between the chatbot and the users? Can you use any of the above cues to improve the overall quality of the conversations?
Topic initiation refers to how speakers introduce a new topic of conversation, while topic maintenance refers to how they keep the conversation focused on that topic. Through this analysis, we:
Gain insights into the participants' interests, goals, and social context.
Understand how speakers collaborate to build shared understanding and mutual goals.
Learn effective communication techniques for keeping conversations focused and productive.
Analysis aspects:
Topic Introduction: Speakers may introduce a new topic of discussion explicitly by using phrases such as "By the way," "Speaking of," or "Have you heard about". Such phrases indicate the speaker intends to change the topic or introduce something new.
Topic Development: Speakers may develop a topic by providing additional information or asking related questions. They may use open-ended, follow-up, or clarifying questions to maintain the topic.
Topic Shift: Speakers may shift the topic of discussion by changing the subject or introducing a new topic. Such shifts may be explicit or implicit and can be signaled by phrases such as "Anyway," "Moving on," or "So, as I was saying".
Topic Re-introduction: Speakers may re-introduce a topic discussed earlier by referring to it or bringing it up again. Such references can indicate that the speaker wants to continue discussing or bring attention to the topic.
Nonverbal Cues: Speakers may use punctuation or capitalization for emphasis or tone to indicate their intention to initiate or maintain a topic.
Does your chatbot allow users to introduce or switch topics? How does your chatbot proceed when users intend to do so?
Repair strategies refer to techniques used by speakers to correct misunderstandings, clarify meaning, or resolve problems in communication. It allows us to:
Identify areas where communication breakdowns are more likely to occur, such as when speakers have different cultural backgrounds, use different languages, or have varying knowledge about the discussed topic.
Identify communication patterns that may be hindering effective communication. For example, a speaker consistently interrupts others or fails to listen actively can lead to more frequent communication breakdowns.
Several aspects need to be analyzed to understand repair strategies:
Self-repair: Speakers may self-correct errors or repair communication breakdowns in real time by repeating or rephrasing their previous statement.
Other-repair: Speakers may ask their conversational partner to repeat or clarify what they said to repair communication breakdowns. They may also offer suggestions or provide information to help resolve the problem.
Repair Initiation: Speakers may initiate repair by indicating a problem or error in communication, such as saying, "I didn't understand what you said," or "Can you repeat that?".
Repair Resolution: Speakers may resolve the problem by clarifying, repeating, or rephrasing their previous statement or using other strategies to ensure successful communication.
Lexical Choice: The words speakers choose to use can impact the success of the repair. They may use more straightforward language to clarify their message or help their conversational partner understand.
Does your chatbot have any repair strategies? What are effective ways of catching them in STDM?
Cultural context refers to the social norms, values, and beliefs that influence communication. They can vary between different cultures and can impact how people interact with each other:
In some cultures, interrupting someone may be seen as assertive and confident, while in others, it may be seen as rude or disrespectful.
Some cultures may place a high value on politeness and indirect communication, while others may value directness and assertiveness.
Culture shapes how individuals perceive and interpret the world around them. By analyzing the cultural context, we can:
Gain insights into how communication is influenced by cultural factors such as social status, gender roles, power dynamics, and language proficiency.
Recognize and address potential biases or assumptions we may bring to the interaction.
Avoid misunderstandings and communicate more respectfully and effectively with people from different cultural backgrounds.
Analysis aspects:
Social Norms: Different cultures may have different social norms that dictate appropriate behavior in conversation, such as turn-taking, interruptions, and politeness strategies.
Language Use: Language use may vary based on cultural context, including vocabulary, grammar, and pronunciation. For example, some languages may have different words or expressions for the same concept, or different cultures may have different levels of formality in their language use.
Worldviews: Cultural differences can impact how people perceive and interpret events and actions. These differences can manifest in conversation through differences in humor, storytelling, and nonverbal communication.
Values and Beliefs: Different cultures may have different values and beliefs, influencing how people communicate and what topics they discuss. For example, some cultures may prioritize directness and honesty, while others prioritize harmony and social relationships.
Contextual Factors: Cultural context is also influenced by contextual factors such as the purpose of the conversation, the setting, and the relationship between speakers.
Website
Pricing
Free
$1 / hour
$0.02 / minute
Interface
Local Installation
Azure Platform
Web API
Diarization
No
Yes
Yes
Pros
Handles noisy environments well
High accuracy
East to use
Cons
Requires a local GPU machine
Can be difficult to configure
Pricier than Azure for less accuracy