Concerning Interactions Detection and Notifications
Overview
The Concerning Interactions feature helps address the challenge of delayed intervention when users display toxic behavior, self-harm tendencies, or make threats during interactions with the virtual assistant. This feature improves user safety by automatically detecting harmful interactions and alerting designated users to address them promptly, ultimately fostering a safer environment.
How It Works
In the Client Admin, there is a new Behavior Setting called Concerning Interactions. Admins can enable or disable this setting globally.
To enable the Concerning Interactions feature select On.
When enabled, admins can select which users should receive notifications by typing the user's name in the box below then selecting from the dropdown. Admins have the flexibility to choose as many users as needed to receive notifications. Note that users will not receive notifications for interactions with offices they do not have permissions to. Therefore, it is important to consider user permissions when selecting recipients so that at least one person will receive a notification for each office.
When harmful, abusive, or offensive language is detected in interactions with the virtual assistant, the notifications can be sent via email, an in-app snackbar notification, or a car alarm sound. Notifications are configurable only at the global level and cannot be individually personalized.
Additionally, the feature includes a sensitivity slider that can be adjusted to be less or more sensitive. Less sensitive will not detect as many interactions as concerning while more sensitive will detect more as concerning and may have more false positives.
While the AI system strives for accuracy in detecting concerning content, it is important to note that some instances may be missed. Ocelot is not responsible for any failures in detection. Regular review of interactions is essential.
Notification Types
When concerning interactions are detected, selected users will receive notifications through their chosen method(s):
Email Notification
The email notification includes a link to the detected interaction.Snackbar Notification
When enabled, the snackbar notification appears in the client admin and includes a brief alert about the concerning interaction including a link to the interaction.Example of a Snackbar Notification:
Transcripts and Interactions
When concerning interactions are detected, they will be highlighted with a badge on the interaction/transcript list page and in the review panel.
Badge in Interaction List:
Interaction Review Panel:
Admins can also toggle to view only concerning interactions in the list.
Virtual Assistant Behavior
This feature does not affect the virtual assistant’s response. The “guardrails” control how the virtual assistant responds to certain content. If harmful content is detected, the virtual assistant may return a response indicating it cannot engage due to the nature of the content, such as a message stating, “I’ve detected possible content that I cannot respond to…”
What Content Is Detected?
The system detects concerning content in bot-backed interactions based on the following criteria:
- Threats of Harm: Any mention of self-harm or harm to others.
- Hate Speech: Discriminatory, prejudiced, or racist language.
- Harassment: Repeated or severe actions creating a hostile environment.
Only individual interactions are analyzed, not the entire conversation context.
Important Notes:
- This detection process applies to bot-backed, two-way messaging (text messages) but does not apply to live conversations, as those are monitored by live agents.
- Admins should continue reviewing interactions regularly to ensure timely action and maintain a safe environment for users.
How is content detected
We don’t rely on specific keywords to detect violations. Instead, we leverage advanced AI models to identify potential issues. Currently, we use OpenAI's GPT-3.5-Turbo, enhanced with prompt engineering and few-shot learning techniques to improve detection accuracy.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article