Algorithms
OpaqueId builds on Ruby’s built-in SecureRandom methods to generate cryptographically secure, collision-free opaque IDs. This guide explains the technical details behind the generation process, optimization strategies, and mathematical foundations.
- TOC
Overview
OpaqueId implements two primary algorithms optimized for different scenarios:
- Fast Path Algorithm - For 64-character alphabets (optimized performance)
- Unbiased Path Algorithm - For all other alphabets (rejection sampling)
Fast Path Algorithm (64-character alphabets)
When using a 64-character alphabet, OpaqueId employs an optimized algorithm that avoids modulo bias and provides maximum performance.
How It Works
def generate_fast(size, alphabet)
result = ""
size.times do
# Generate random byte
byte = SecureRandom.random_bytes(1).unpack1("C")
# Use bitwise AND for fast modulo 64
index = byte & 63 # Equivalent to byte % 64
# Append character from alphabet
result += alphabet[index]
end
result
end
Key Features
- Bitwise Operations: Uses
byte & 63instead ofbyte % 64for efficient computation - No Modulo Bias: 64 is a power of 2, so bitwise AND provides uniform distribution
- Single Random Call: One
SecureRandom.random_bytes(1)call per character - Performance Optimized: Designed for speed with 64-character alphabets
Mathematical Foundation
For a 64-character alphabet:
- Each byte (0-255) maps to exactly 4 characters in the alphabet
byte & 63extracts the lower 6 bits (0-63)- This provides uniform distribution without bias
Performance Characteristics
- Optimized for speed: Uses bitwise operations for efficient performance
- No rejection sampling: All generated bytes are used efficiently
- Linear time complexity: O(n) where n is the ID length
Unbiased Path Algorithm (Other alphabets)
For alphabets that aren’t 64 characters, OpaqueId uses rejection sampling to ensure unbiased distribution.
How It Works
def generate_unbiased(size, alphabet, alphabet_size)
result = ""
size.times do
loop do
# Generate random byte
byte = SecureRandom.random_bytes(1).unpack1("C")
# Calculate threshold to avoid modulo bias
threshold = 256 - (256 % alphabet_size)
# Reject if byte is too large (avoids bias)
next if byte >= threshold
# Use modulo to get index
index = byte % alphabet_size
# Append character and break loop
result += alphabet[index]
break
end
end
result
end
Key Features
- Rejection Sampling: Discards biased random values
- Unbiased Distribution: Ensures each character has equal probability
- Cryptographically Secure: Uses
SecureRandomfor all random generation - Collision Resistant: High entropy prevents predictable patterns
Mathematical Foundation
The rejection sampling algorithm ensures uniform distribution:
- Calculate Threshold:
threshold = 256 - (256 % alphabet_size) - Reject Biased Values: Only accept bytes < threshold
- Uniform Mapping: Each accepted byte maps to exactly one character
Example: 62-character alphabet
# For 62-character alphabet
alphabet_size = 62
threshold = 256 - (256 % 62) # = 256 - 8 = 248
# Accept bytes 0-247 (248 values)
# Each byte maps to: byte % 62
# This gives uniform distribution across all 62 characters
Performance Characteristics
- Unbiased distribution: Uses rejection sampling to ensure uniform character distribution
- Slight overhead: Some bytes are rejected to maintain uniformity
- Linear time complexity: O(n × rejection_rate) where n is the ID length
Algorithm Selection
OpaqueId automatically selects the appropriate algorithm based on the alphabet size:
def generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
alphabet_size = alphabet.size
# Handle edge case: single character alphabet
return alphabet * size if alphabet_size == 1
# Use fast path for 64-character alphabets
return generate_fast(size, alphabet) if alphabet_size == 64
# Use unbiased path for all other alphabets
generate_unbiased(size, alphabet, alphabet_size)
end
Selection Criteria
| Alphabet Size | Algorithm | Reason |
|---|---|---|
| 1 | Direct repetition | No randomness needed |
| 64 | Fast Path | Optimized bitwise operations |
| Other | Unbiased Path | Rejection sampling for uniformity |
Entropy Analysis
Entropy Calculation
The entropy of an opaque ID depends on the alphabet size and length:
Entropy = length × log₂(alphabet_size)
Examples
# 21-character ID with 62-character alphabet
entropy = 21 × log₂(62) = 21 × 5.954 = 125.0 bits
# 21-character ID with 64-character alphabet
entropy = 21 × log₂(64) = 21 × 6.000 = 126.0 bits
# 15-character ID with 16-character alphabet
entropy = 15 × log₂(16) = 15 × 4.000 = 60.0 bits
Collision Probability
The probability of collision for N generated IDs:
P(collision) ≈ 1 - e^(-N²/(2 × 2^entropy))
Practical Examples
# 21-character alphanumeric ID (125 bits entropy)
# 1 billion IDs: P(collision) ≈ 2.3 × 10⁻¹⁵
# 1 trillion IDs: P(collision) ≈ 2.3 × 10⁻⁹
# 15-character hexadecimal ID (60 bits entropy)
# 1 million IDs: P(collision) ≈ 2.3 × 10⁻⁶
# 1 billion IDs: P(collision) ≈ 2.3 × 10⁻³
Security Analysis
Cryptographic Security
OpaqueId uses Ruby’s SecureRandom which provides:
- Cryptographically Secure PRNG: Uses OS entropy sources
- Unpredictable Output: Cannot be predicted from previous outputs
- High Entropy: Sufficient randomness for security applications
Attack Resistance
Brute Force Attacks
# 21-character alphanumeric ID
# Search space: 62²¹ ≈ 2.3 × 10³⁷
# Time to brute force: ~10²⁰ years (at 1 billion attempts/second)
Timing Attacks
The algorithms are designed to be timing-attack resistant:
- Constant Time Operations: Bitwise operations have predictable timing
- Rejection Sampling: Variable loop iterations don’t leak information
- SecureRandom: OS-level entropy prevents timing-based prediction
Statistical Attacks
- Uniform Distribution: Rejection sampling ensures statistical uniformity
- No Patterns: Cryptographically secure randomness prevents pattern detection
- High Entropy: Sufficient entropy prevents statistical analysis
Performance Optimization
Memory Usage
# Memory-efficient generation
def generate_efficient(size, alphabet)
# Pre-allocate result string
result = String.new(capacity: size)
# Generate characters directly into result
size.times do
result << generate_character(alphabet)
end
result
end
Batch Generation
# Optimized batch generation
def generate_batch(count, size, alphabet)
# Pre-allocate array
results = Array.new(count)
# Generate all IDs
count.times do |i|
results[i] = generate(size, alphabet)
end
results
end
Caching Strategies
# Cache alphabet size for performance
class OpaqueId
def self.generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
alphabet_size = alphabet.size
# Use cached algorithm selection
if alphabet_size == 64
generate_fast(size, alphabet)
else
generate_unbiased(size, alphabet, alphabet_size)
end
end
end
Performance Characteristics
Algorithm Comparison
- Fast Path (64-char): Maximum performance with bitwise operations
- Unbiased Path (other): Slightly slower but ensures uniform distribution
- Memory usage: Scales linearly with ID length and batch size
- Time complexity: Linear scaling with ID length
Implementation Details
Error Handling
def generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
# Validate parameters
raise ConfigurationError, "Size must be positive" unless size.positive?
raise ConfigurationError, "Alphabet cannot be empty" if alphabet.nil? || alphabet.empty?
# Handle edge cases
return alphabet * size if alphabet.size == 1
# Generate ID
if alphabet.size == 64
generate_fast(size, alphabet)
else
generate_unbiased(size, alphabet, alphabet.size)
end
rescue => e
raise GenerationError, "Failed to generate opaque ID: #{e.message}"
end
Thread Safety
# OpaqueId is thread-safe
# SecureRandom is thread-safe
# No shared state between generations
# Safe for concurrent use
threads = 10.times.map do
Thread.new do
1000.times { OpaqueId.generate }
end
end
threads.each(&:join)
Comparison with Other Algorithms
vs. UUID
| Aspect | OpaqueId | UUID |
|---|---|---|
| Length | Configurable | Fixed (36 chars) |
| Entropy | Configurable | Fixed (122 bits) |
| Readability | Human-friendly | Not readable |
| Performance | Optimized | Standard |
vs. NanoID
| Aspect | OpaqueId | NanoID |
|---|---|---|
| Language | Native Ruby | JavaScript port |
| Performance | Optimized | Good |
| Dependencies | None | External gem |
| Customization | Extensive | Limited |
vs. SecureRandom.hex
| Aspect | OpaqueId | SecureRandom.hex |
|---|---|---|
| Alphabet | Configurable | Fixed (hex) |
| Bias | None | None |
| Performance | Optimized | Good |
| Features | Rich | Basic |
Best Practices
1. Choose Appropriate Algorithm
# Use 64-character alphabets for maximum performance
self.opaque_id_alphabet = OpaqueId::STANDARD_ALPHABET
# Use other alphabets for specific requirements
self.opaque_id_alphabet = "0123456789" # Numeric only
2. Optimize for Your Use Case
# High-performance applications
self.opaque_id_alphabet = OpaqueId::STANDARD_ALPHABET
self.opaque_id_length = 21
# Human-readable applications
self.opaque_id_alphabet = OpaqueId::ALPHANUMERIC_ALPHABET
self.opaque_id_length = 15
3. Consider Entropy Requirements
# High security (125+ bits entropy)
self.opaque_id_length = 21
self.opaque_id_alphabet = OpaqueId::ALPHANUMERIC_ALPHABET
# Medium security (60+ bits entropy)
self.opaque_id_length = 15
self.opaque_id_alphabet = "0123456789abcdef"
Next Steps
Now that you understand the algorithms:
- Explore Performance for detailed benchmarks and optimization tips
- Check out Security for security considerations and best practices
- Review Configuration for algorithm selection guidance
- Read API Reference for complete algorithm documentation