Algorithms

OpaqueId builds on Ruby’s built-in SecureRandom methods to generate cryptographically secure, collision-free opaque IDs. This guide explains the technical details behind the generation process, optimization strategies, and mathematical foundations.

Overview

OpaqueId implements two primary algorithms optimized for different scenarios:

Fast Path Algorithm - For 64-character alphabets (optimized performance)
Unbiased Path Algorithm - For all other alphabets (rejection sampling)

Fast Path Algorithm (64-character alphabets)

When using a 64-character alphabet, OpaqueId employs an optimized algorithm that avoids modulo bias and provides maximum performance.

How It Works

def generate_fast(size, alphabet)
  result = ""
  size.times do
    # Generate random byte
    byte = SecureRandom.random_bytes(1).unpack1("C")

    # Use bitwise AND for fast modulo 64
    index = byte & 63  # Equivalent to byte % 64

    # Append character from alphabet
    result += alphabet[index]
  end
  result
end

Key Features

Bitwise Operations: Uses byte & 63 instead of byte % 64 for efficient computation
No Modulo Bias: 64 is a power of 2, so bitwise AND provides uniform distribution
Single Random Call: One SecureRandom.random_bytes(1) call per character
Performance Optimized: Designed for speed with 64-character alphabets

Mathematical Foundation

For a 64-character alphabet:

Each byte (0-255) maps to exactly 4 characters in the alphabet
byte & 63 extracts the lower 6 bits (0-63)
This provides uniform distribution without bias

Performance Characteristics

Optimized for speed: Uses bitwise operations for efficient performance
No rejection sampling: All generated bytes are used efficiently
Linear time complexity: O(n) where n is the ID length

Unbiased Path Algorithm (Other alphabets)

For alphabets that aren’t 64 characters, OpaqueId uses rejection sampling to ensure unbiased distribution.

How It Works

def generate_unbiased(size, alphabet, alphabet_size)
  result = ""
  size.times do
    loop do
      # Generate random byte
      byte = SecureRandom.random_bytes(1).unpack1("C")

      # Calculate threshold to avoid modulo bias
      threshold = 256 - (256 % alphabet_size)

      # Reject if byte is too large (avoids bias)
      next if byte >= threshold

      # Use modulo to get index
      index = byte % alphabet_size

      # Append character and break loop
      result += alphabet[index]
      break
    end
  end
  result
end

Key Features

Rejection Sampling: Discards biased random values
Unbiased Distribution: Ensures each character has equal probability
Cryptographically Secure: Uses SecureRandom for all random generation
Collision Resistant: High entropy prevents predictable patterns

Mathematical Foundation

The rejection sampling algorithm ensures uniform distribution:

Calculate Threshold: threshold = 256 - (256 % alphabet_size)
Reject Biased Values: Only accept bytes < threshold
Uniform Mapping: Each accepted byte maps to exactly one character

Example: 62-character alphabet

# For 62-character alphabet
alphabet_size = 62
threshold = 256 - (256 % 62)  # = 256 - 8 = 248

# Accept bytes 0-247 (248 values)
# Each byte maps to: byte % 62
# This gives uniform distribution across all 62 characters

Performance Characteristics

Unbiased distribution: Uses rejection sampling to ensure uniform character distribution
Slight overhead: Some bytes are rejected to maintain uniformity
Linear time complexity: O(n × rejection_rate) where n is the ID length

Algorithm Selection

OpaqueId automatically selects the appropriate algorithm based on the alphabet size:

def generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
  alphabet_size = alphabet.size

  # Handle edge case: single character alphabet
  return alphabet * size if alphabet_size == 1

  # Use fast path for 64-character alphabets
  return generate_fast(size, alphabet) if alphabet_size == 64

  # Use unbiased path for all other alphabets
  generate_unbiased(size, alphabet, alphabet_size)
end

Selection Criteria

Alphabet Size	Algorithm	Reason
1	Direct repetition	No randomness needed
64	Fast Path	Optimized bitwise operations
Other	Unbiased Path	Rejection sampling for uniformity

Entropy Analysis

Entropy Calculation

The entropy of an opaque ID depends on the alphabet size and length:

Entropy = length × log₂(alphabet_size)

Examples

# 21-character ID with 62-character alphabet
entropy = 21 × log₂(62) = 21 × 5.954 = 125.0 bits

# 21-character ID with 64-character alphabet
entropy = 21 × log₂(64) = 21 × 6.000 = 126.0 bits

# 15-character ID with 16-character alphabet
entropy = 15 × log₂(16) = 15 × 4.000 = 60.0 bits

Collision Probability

The probability of collision for N generated IDs:

P(collision) ≈ 1 - e^(-N²/(2 × 2^entropy))

Practical Examples

# 21-character alphanumeric ID (125 bits entropy)
# 1 billion IDs: P(collision) ≈ 2.3 × 10⁻¹⁵
# 1 trillion IDs: P(collision) ≈ 2.3 × 10⁻⁹

# 15-character hexadecimal ID (60 bits entropy)
# 1 million IDs: P(collision) ≈ 2.3 × 10⁻⁶
# 1 billion IDs: P(collision) ≈ 2.3 × 10⁻³

Security Analysis

Cryptographic Security

OpaqueId uses Ruby’s SecureRandom which provides:

Cryptographically Secure PRNG: Uses OS entropy sources
Unpredictable Output: Cannot be predicted from previous outputs
High Entropy: Sufficient randomness for security applications

Attack Resistance

Brute Force Attacks

# 21-character alphanumeric ID
# Search space: 62²¹ ≈ 2.3 × 10³⁷
# Time to brute force: ~10²⁰ years (at 1 billion attempts/second)

Timing Attacks

The algorithms are designed to be timing-attack resistant:

Constant Time Operations: Bitwise operations have predictable timing
Rejection Sampling: Variable loop iterations don’t leak information
SecureRandom: OS-level entropy prevents timing-based prediction

Statistical Attacks

Uniform Distribution: Rejection sampling ensures statistical uniformity
No Patterns: Cryptographically secure randomness prevents pattern detection
High Entropy: Sufficient entropy prevents statistical analysis

Performance Optimization

Memory Usage

# Memory-efficient generation
def generate_efficient(size, alphabet)
  # Pre-allocate result string
  result = String.new(capacity: size)

  # Generate characters directly into result
  size.times do
    result << generate_character(alphabet)
  end

  result
end

Batch Generation

# Optimized batch generation
def generate_batch(count, size, alphabet)
  # Pre-allocate array
  results = Array.new(count)

  # Generate all IDs
  count.times do |i|
    results[i] = generate(size, alphabet)
  end

  results
end

Caching Strategies

# Cache alphabet size for performance
class OpaqueId
  def self.generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
    alphabet_size = alphabet.size

    # Use cached algorithm selection
    if alphabet_size == 64
      generate_fast(size, alphabet)
    else
      generate_unbiased(size, alphabet, alphabet_size)
    end
  end
end

Performance Characteristics

Algorithm Comparison

Fast Path (64-char): Maximum performance with bitwise operations
Unbiased Path (other): Slightly slower but ensures uniform distribution
Memory usage: Scales linearly with ID length and batch size
Time complexity: Linear scaling with ID length

Implementation Details

Error Handling

def generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
  # Validate parameters
  raise ConfigurationError, "Size must be positive" unless size.positive?
  raise ConfigurationError, "Alphabet cannot be empty" if alphabet.nil? || alphabet.empty?

  # Handle edge cases
  return alphabet * size if alphabet.size == 1

  # Generate ID
  if alphabet.size == 64
    generate_fast(size, alphabet)
  else
    generate_unbiased(size, alphabet, alphabet.size)
  end
rescue => e
  raise GenerationError, "Failed to generate opaque ID: #{e.message}"
end

Thread Safety

# OpaqueId is thread-safe
# SecureRandom is thread-safe
# No shared state between generations

# Safe for concurrent use
threads = 10.times.map do
  Thread.new do
    1000.times { OpaqueId.generate }
  end
end

threads.each(&:join)

Comparison with Other Algorithms

vs. UUID

Aspect	OpaqueId	UUID
Length	Configurable	Fixed (36 chars)
Entropy	Configurable	Fixed (122 bits)
Readability	Human-friendly	Not readable
Performance	Optimized	Standard

vs. NanoID

Aspect	OpaqueId	NanoID
Language	Native Ruby	JavaScript port
Performance	Optimized	Good
Dependencies	None	External gem
Customization	Extensive	Limited

vs. SecureRandom.hex

Aspect	OpaqueId	SecureRandom.hex
Alphabet	Configurable	Fixed (hex)
Bias	None	None
Performance	Optimized	Good
Features	Rich	Basic

Best Practices

1. Choose Appropriate Algorithm

# Use 64-character alphabets for maximum performance
self.opaque_id_alphabet = OpaqueId::STANDARD_ALPHABET

# Use other alphabets for specific requirements
self.opaque_id_alphabet = "0123456789"  # Numeric only

2. Optimize for Your Use Case

# High-performance applications
self.opaque_id_alphabet = OpaqueId::STANDARD_ALPHABET
self.opaque_id_length = 21

# Human-readable applications
self.opaque_id_alphabet = OpaqueId::ALPHANUMERIC_ALPHABET
self.opaque_id_length = 15

3. Consider Entropy Requirements

# High security (125+ bits entropy)
self.opaque_id_length = 21
self.opaque_id_alphabet = OpaqueId::ALPHANUMERIC_ALPHABET

# Medium security (60+ bits entropy)
self.opaque_id_length = 15
self.opaque_id_alphabet = "0123456789abcdef"

Next Steps

Now that you understand the algorithms:

Explore Performance for detailed benchmarks and optimization tips
Check out Security for security considerations and best practices
Review Configuration for algorithm selection guidance
Read API Reference for complete algorithm documentation