Algorithms

OpaqueId builds on Ruby’s built-in SecureRandom methods to generate cryptographically secure, collision-free opaque IDs. This guide explains the technical details behind the generation process, optimization strategies, and mathematical foundations.

  • TOC

Overview

OpaqueId implements two primary algorithms optimized for different scenarios:

  1. Fast Path Algorithm - For 64-character alphabets (optimized performance)
  2. Unbiased Path Algorithm - For all other alphabets (rejection sampling)

Fast Path Algorithm (64-character alphabets)

When using a 64-character alphabet, OpaqueId employs an optimized algorithm that avoids modulo bias and provides maximum performance.

How It Works

def generate_fast(size, alphabet)
  result = ""
  size.times do
    # Generate random byte
    byte = SecureRandom.random_bytes(1).unpack1("C")

    # Use bitwise AND for fast modulo 64
    index = byte & 63  # Equivalent to byte % 64

    # Append character from alphabet
    result += alphabet[index]
  end
  result
end

Key Features

  • Bitwise Operations: Uses byte & 63 instead of byte % 64 for efficient computation
  • No Modulo Bias: 64 is a power of 2, so bitwise AND provides uniform distribution
  • Single Random Call: One SecureRandom.random_bytes(1) call per character
  • Performance Optimized: Designed for speed with 64-character alphabets

Mathematical Foundation

For a 64-character alphabet:

  • Each byte (0-255) maps to exactly 4 characters in the alphabet
  • byte & 63 extracts the lower 6 bits (0-63)
  • This provides uniform distribution without bias

Performance Characteristics

  • Optimized for speed: Uses bitwise operations for efficient performance
  • No rejection sampling: All generated bytes are used efficiently
  • Linear time complexity: O(n) where n is the ID length

Unbiased Path Algorithm (Other alphabets)

For alphabets that aren’t 64 characters, OpaqueId uses rejection sampling to ensure unbiased distribution.

How It Works

def generate_unbiased(size, alphabet, alphabet_size)
  result = ""
  size.times do
    loop do
      # Generate random byte
      byte = SecureRandom.random_bytes(1).unpack1("C")

      # Calculate threshold to avoid modulo bias
      threshold = 256 - (256 % alphabet_size)

      # Reject if byte is too large (avoids bias)
      next if byte >= threshold

      # Use modulo to get index
      index = byte % alphabet_size

      # Append character and break loop
      result += alphabet[index]
      break
    end
  end
  result
end

Key Features

  • Rejection Sampling: Discards biased random values
  • Unbiased Distribution: Ensures each character has equal probability
  • Cryptographically Secure: Uses SecureRandom for all random generation
  • Collision Resistant: High entropy prevents predictable patterns

Mathematical Foundation

The rejection sampling algorithm ensures uniform distribution:

  1. Calculate Threshold: threshold = 256 - (256 % alphabet_size)
  2. Reject Biased Values: Only accept bytes < threshold
  3. Uniform Mapping: Each accepted byte maps to exactly one character

Example: 62-character alphabet

# For 62-character alphabet
alphabet_size = 62
threshold = 256 - (256 % 62)  # = 256 - 8 = 248

# Accept bytes 0-247 (248 values)
# Each byte maps to: byte % 62
# This gives uniform distribution across all 62 characters

Performance Characteristics

  • Unbiased distribution: Uses rejection sampling to ensure uniform character distribution
  • Slight overhead: Some bytes are rejected to maintain uniformity
  • Linear time complexity: O(n × rejection_rate) where n is the ID length

Algorithm Selection

OpaqueId automatically selects the appropriate algorithm based on the alphabet size:

def generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
  alphabet_size = alphabet.size

  # Handle edge case: single character alphabet
  return alphabet * size if alphabet_size == 1

  # Use fast path for 64-character alphabets
  return generate_fast(size, alphabet) if alphabet_size == 64

  # Use unbiased path for all other alphabets
  generate_unbiased(size, alphabet, alphabet_size)
end

Selection Criteria

Alphabet Size Algorithm Reason
1 Direct repetition No randomness needed
64 Fast Path Optimized bitwise operations
Other Unbiased Path Rejection sampling for uniformity

Entropy Analysis

Entropy Calculation

The entropy of an opaque ID depends on the alphabet size and length:

Entropy = length × log₂(alphabet_size)

Examples

# 21-character ID with 62-character alphabet
entropy = 21 × log(62) = 21 × 5.954 = 125.0 bits

# 21-character ID with 64-character alphabet
entropy = 21 × log(64) = 21 × 6.000 = 126.0 bits

# 15-character ID with 16-character alphabet
entropy = 15 × log(16) = 15 × 4.000 = 60.0 bits

Collision Probability

The probability of collision for N generated IDs:

P(collision) ≈ 1 - e^(-N²/(2 × 2^entropy))

Practical Examples

# 21-character alphanumeric ID (125 bits entropy)
# 1 billion IDs: P(collision) ≈ 2.3 × 10⁻¹⁵
# 1 trillion IDs: P(collision) ≈ 2.3 × 10⁻⁹

# 15-character hexadecimal ID (60 bits entropy)
# 1 million IDs: P(collision) ≈ 2.3 × 10⁻⁶
# 1 billion IDs: P(collision) ≈ 2.3 × 10⁻³

Security Analysis

Cryptographic Security

OpaqueId uses Ruby’s SecureRandom which provides:

  • Cryptographically Secure PRNG: Uses OS entropy sources
  • Unpredictable Output: Cannot be predicted from previous outputs
  • High Entropy: Sufficient randomness for security applications

Attack Resistance

Brute Force Attacks

# 21-character alphanumeric ID
# Search space: 62²¹ ≈ 2.3 × 10³⁷
# Time to brute force: ~10²⁰ years (at 1 billion attempts/second)

Timing Attacks

The algorithms are designed to be timing-attack resistant:

  • Constant Time Operations: Bitwise operations have predictable timing
  • Rejection Sampling: Variable loop iterations don’t leak information
  • SecureRandom: OS-level entropy prevents timing-based prediction

Statistical Attacks

  • Uniform Distribution: Rejection sampling ensures statistical uniformity
  • No Patterns: Cryptographically secure randomness prevents pattern detection
  • High Entropy: Sufficient entropy prevents statistical analysis

Performance Optimization

Memory Usage

# Memory-efficient generation
def generate_efficient(size, alphabet)
  # Pre-allocate result string
  result = String.new(capacity: size)

  # Generate characters directly into result
  size.times do
    result << generate_character(alphabet)
  end

  result
end

Batch Generation

# Optimized batch generation
def generate_batch(count, size, alphabet)
  # Pre-allocate array
  results = Array.new(count)

  # Generate all IDs
  count.times do |i|
    results[i] = generate(size, alphabet)
  end

  results
end

Caching Strategies

# Cache alphabet size for performance
class OpaqueId
  def self.generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
    alphabet_size = alphabet.size

    # Use cached algorithm selection
    if alphabet_size == 64
      generate_fast(size, alphabet)
    else
      generate_unbiased(size, alphabet, alphabet_size)
    end
  end
end

Performance Characteristics

Algorithm Comparison

  • Fast Path (64-char): Maximum performance with bitwise operations
  • Unbiased Path (other): Slightly slower but ensures uniform distribution
  • Memory usage: Scales linearly with ID length and batch size
  • Time complexity: Linear scaling with ID length

Implementation Details

Error Handling

def generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
  # Validate parameters
  raise ConfigurationError, "Size must be positive" unless size.positive?
  raise ConfigurationError, "Alphabet cannot be empty" if alphabet.nil? || alphabet.empty?

  # Handle edge cases
  return alphabet * size if alphabet.size == 1

  # Generate ID
  if alphabet.size == 64
    generate_fast(size, alphabet)
  else
    generate_unbiased(size, alphabet, alphabet.size)
  end
rescue => e
  raise GenerationError, "Failed to generate opaque ID: #{e.message}"
end

Thread Safety

# OpaqueId is thread-safe
# SecureRandom is thread-safe
# No shared state between generations

# Safe for concurrent use
threads = 10.times.map do
  Thread.new do
    1000.times { OpaqueId.generate }
  end
end

threads.each(&:join)

Comparison with Other Algorithms

vs. UUID

Aspect OpaqueId UUID
Length Configurable Fixed (36 chars)
Entropy Configurable Fixed (122 bits)
Readability Human-friendly Not readable
Performance Optimized Standard

vs. NanoID

Aspect OpaqueId NanoID
Language Native Ruby JavaScript port
Performance Optimized Good
Dependencies None External gem
Customization Extensive Limited

vs. SecureRandom.hex

Aspect OpaqueId SecureRandom.hex
Alphabet Configurable Fixed (hex)
Bias None None
Performance Optimized Good
Features Rich Basic

Best Practices

1. Choose Appropriate Algorithm

# Use 64-character alphabets for maximum performance
self.opaque_id_alphabet = OpaqueId::STANDARD_ALPHABET

# Use other alphabets for specific requirements
self.opaque_id_alphabet = "0123456789"  # Numeric only

2. Optimize for Your Use Case

# High-performance applications
self.opaque_id_alphabet = OpaqueId::STANDARD_ALPHABET
self.opaque_id_length = 21

# Human-readable applications
self.opaque_id_alphabet = OpaqueId::ALPHANUMERIC_ALPHABET
self.opaque_id_length = 15

3. Consider Entropy Requirements

# High security (125+ bits entropy)
self.opaque_id_length = 21
self.opaque_id_alphabet = OpaqueId::ALPHANUMERIC_ALPHABET

# Medium security (60+ bits entropy)
self.opaque_id_length = 15
self.opaque_id_alphabet = "0123456789abcdef"

Next Steps

Now that you understand the algorithms:

  1. Explore Performance for detailed benchmarks and optimization tips
  2. Check out Security for security considerations and best practices
  3. Review Configuration for algorithm selection guidance
  4. Read API Reference for complete algorithm documentation