Skip to content
Go back

Recovering Mangled YouTube URLs: A Brute-Force Adventure

Table of contents

Open Table of contents

Introduction

Have you ever dealt with a sneaky bug that corrupts data in ways that seem impossible to fix? That’s exactly what happened to our team when a frontend glitch started lowercasing YouTube video URLs before they hit our API and database. What started as a valid link like https://www.youtube.com/watch?v=urEaHW-emY4 ended up stored as https://www.youtube.com/watch?v=ureahw-emy4—completely broken. No logs, no traces, just invalid URLs pointing to non-existent videos.

But despair not! With a bit of clever brute-forcing and the right tools, we recovered the originals. In this post, I’ll walk you through the problem, our “unless…” moment, and a practical script to fix it. Spoiler: It involves generating up to 2,048 permutations and validating them like a boss a thousand times.

The Problem: Lowercased YouTube URLs

YouTube video URLs typically look like one of these:

The key part is the video ID—an 11-character string (yes, exactly 11 characters) made up of uppercase letters, lowercase letters, numbers, and special characters like hyphens and underscores (e.g., urEaHW-emY4).

Due to a frontend bug, these URLs were being lowercased (e.g., ureahw-emy4) before storage. Lowercasing breaks them because YouTube IDs are case-sensitive. A simple search or log dive? Nope— the transformation happened client-side, leaving no server-side evidence of the original.

Brute Force to the Rescue

Here’s where it gets fun (or masochistic, depending on your love for combinatorics). Since video IDs are fixed at 11 characters, and only alphabetic characters are case-sensitive, we can generate all possible case variations of the lowercased ID.

This is feasible because 2048 is a tiny number for a computer. But how do we validate without false positives?

Why Simple HTTP Requests Fail

You might think: “Just send a GET request and check the status code!” Nope. YouTube is a Single-Page Application (SPA), so:

SPAs route everything client-side, so the server always serves the app (200 OK) and lets JavaScript handle errors. Status codes are useless here.

Validating with yt-dlp

Enter yt-dlp, a powerful command-line tool for downloading YouTube videos (a fork of youtube-dl). We use it with --skip-download to check metadata without fetching the video.

Perfect! It reliably distinguishes valid from invalid without downloading anything. Install it via pip (pip install yt-dlp) or your package manager.

Implementing the Brute-Force Script

Now, let’s automate this. Below is an AI-generated Bash script (easy to run on most systems) that:

Sample Bash Script

Save this as recover_youtube.sh and make it executable (chmod +x recover_youtube.sh). It requires yt-dlp and parallel (install via sudo apt install parallel on Ubuntu or similar).

#!/bin/bash

# Usage: ./recover_youtube.sh <lowercased_video_id>
# Example: ./recover_youtube.sh ureahw-emy4

if [ $# -ne 1 ]; then
  echo "Usage: $0 <lowercased_video_id>"
  exit 1
fi

LOWER_ID="$1"
ID_LENGTH=${#LOWER_ID}

if [ "$ID_LENGTH" -ne 11 ]; then
  echo "Error: Video ID must be exactly 11 characters."
  exit 1
fi

# Function to generate all case permutations
generate_permutations() {
  local str="$1"
  local prefix="$2"
  local index="$3"

  if [ "$index" -eq "${#str}" ]; then
    echo "$prefix"
    return
  fi

  local char="${str:$index:1}"
  local lower=$(echo "$char" | tr '[:upper:]' '[:lower:]')
  local upper=$(echo "$char" | tr '[:lower:]' '[:upper:]')

  if [[ "$char" =~ [a-zA-Z] ]]; then
    # Case-sensitive: recurse for both cases
    generate_permutations "$str" "$prefix$lower" "$((index+1))"
    generate_permutations "$str" "$prefix$upper" "$((index+1))"
  else
    # Not a letter: only one option
    generate_permutations "$str" "$prefix$char" "$((index+1))"
  fi
}

# Generate permutations
PERMS=$(generate_permutations "$LOWER_ID" "" 0)

# Function to validate a single ID
validate_id() {
  local vid_id="$1"
  local output
  output=$(yt-dlp --skip-download --get-id "https://www.youtube.com/watch?v=$vid_id" 2>&1)
  if [[ "$output" == "$vid_id" ]]; then
    echo "VALID: https://www.youtube.com/watch?v=$vid_id"
  fi
}

export -f validate_id

# Validate in parallel (adjust --jobs for your CPU)
echo "$PERMS" | parallel --jobs 4 validate_id {}

echo "Done! If multiple valids appear, manually verify thumbnails or metadata."

How It Works

Tips and Caveats:

Conclusion

This brute-force hack turned a data disaster into a quick win. We recovered dozens of URLs without manual guesswork. It’s a reminder that sometimes, the “dumb” solution is the smartest—especially when logs fail you.


Share this post on:

Previous Post
Deploying a Simple Web Server with HTTPS on K3s
Next Post
Automated postgres backups using pgBackRest