← Build logs
BackendJune 20, 2026

Reducing Unnecessary Costs by Automatically Ending Voice Call Sessions

Ever faced the issue of voice call sessions staying active unnecessarily, leading to unexpected costs? This often happens, especially in Live session environments. In this post, I'll share my experience implementing automatic session termination for silence or explicit end-of-call signals to reduce costs.

Attempts and Pitfalls

Initially, I thought of a simple logic: if there's no speech for a certain period, just terminate the session. But it turned out to be trickier than I expected. It was hard to distinguish between a user briefly pausing their speech and them actually wanting to end the call.

# Initial attempt: Simple silence timeout
def check_silence_and_terminate(session_id, last_activity_time, timeout_duration):
    current_time = datetime.now()
    if (current_time - last_activity_time).total_seconds() > timeout_duration:
        print(f"Session {session_id} terminated due to silence.")
        # Session termination logic...

With this code, there was a risk of the session being terminated even when the user was just catching their breath or taking a moment to think. Ultimately, I spent about 3 hours fiddling with this approach without achieving satisfactory results.

The Root Cause

The core of the problem was understanding 'user intent.' Relying solely on the absence of speech led to false positives. We needed to differentiate between cases where the user clearly expressed their intention to end the call (e.g., "I'll hang up," "End") and cases where they simply remained silent.

The Solution

So, in the end, I implemented the session termination logic by combining two conditions:

  1. Silence Detection: When there's no voice input for a specific duration.
  2. Explicit End-of-Call Intent Detection: Detecting speech containing specific keywords (e.g., "종료" (end), "끊어" (hang up), "안녕" (bye)).

I modified the logic to terminate the session only when both these conditions are met.

# Improved session termination logic
def should_terminate_session(session_data):
    current_time = datetime.now()
    last_activity_time = session_data['last_activity_time']
    silence_timeout = 60  # 60 seconds of silence
    last_utterance = session_data.get('last_utterance', '')
    keywords_to_terminate = ['종료', '끊어', '안녕', '마칠게']

    # 1. After a certain period of silence
    if (current_time - last_activity_time).total_seconds() > silence_timeout:
        print(f"Session {session_data['id']} is a candidate for termination due to silence.")
        return True

    # 2. When the user explicitly expresses intent to end
    for keyword in keywords_to_terminate:
        if keyword in last_utterance.lower():
            print(f"Session {session_data['id']} detected explicit intent to terminate: '{last_utterance}'")
            return True

    return False

# Example data
session_info = {
    'id': 'session_abc',
    'last_activity_time': datetime.now() - timedelta(seconds=70),
    'last_utterance': 'Yes, I understand. See you next time.'
}

if should_terminate_session(session_info):
    print("Proceeding with session termination...")
else:
    print("Session ongoing...")

session_info_2 = {
    'id': 'session_xyz',
    'last_activity_time': datetime.now() - timedelta(seconds=10),
    'last_utterance': 'Yes, I'll hang up now.'
}

if should_terminate_session(session_info_2):
    print("Proceeding with session termination...")
else:
    print("Session ongoing...")

Thanks to this logic, sessions are no longer unnecessarily terminated when the user pauses briefly, while still accurately detecting when the user intends to end the call. This has allowed for smoother, more cost-effective operation.

Results

  • Reduced unnecessary extension of Live session durations.
  • Confirmed cost savings for voice call-related services.
  • Increased cost-efficiency without compromising user experience.

Summary — To Avoid the Same Pitfalls

  • [ ] When implementing voice session termination logic, do not rely solely on silence duration.
  • [ ] Consider explicit end-of-call intent signals from the user (e.g., keyword detection) as well.
  • [ ] During testing, thoroughly validate with various speech patterns (brief pauses, intentional closing remarks, etc.).