Soundex Algorithm Implementation

Problem

Soundex is an algorithm used to categorize phonetically, such that two names that sound alike but are spelled differently have the same representation.

Soundex maps every name to a string consisting of one letter and three numbers, like M460.

One version of the algorithm is as follows:

Remove consecutive consonants with the same sound (for example, change ck -> c).
Keep the first letter. The remaining steps only apply to the rest of the string.
Remove all vowels, including y, w, and h.
Replace all consonants with the following digits:
- b, f, p, v → 1
- c, g, j, k, q, s, x, z → 2
- d, t → 3
- l → 4
- m, n → 5
- r → 6
If you don't have three numbers yet, append zeros until you do. Keep the first three numbers.

Using this scheme, Jackson and Jaxen both map to J250.

Implement Soundex.

Examples

Example 1

Input: "Jackson"
Output: "J250"
Explanation:
1. Remove consecutive consonants: "Jackson" (no consecutive same sounds)
2. Keep first letter: J, work with "ackson"
3. Remove vowels (a,e,i,o,u,y,w,h): "ckson" 
4. Replace consonants: c→2, k→2, s→2, o→(removed), n→5 = "2225"
5. Keep first 3 digits: "225", pad if needed: "250" (after removing consecutive same digits)
Final: "J250"

Example 2

Input: "Jaxen"
Output: "J250"
Explanation:
1. Remove consecutive consonants: "Jaxen" (no consecutive same sounds)
2. Keep first letter: J, work with "axen"
3. Remove vowels: "xn"
4. Replace consonants: x→2, n→5 = "25"
5. Pad to 3 digits: "250"
Final: "J250"

Example 3

Input: "Robert"
Output: "R163"
Explanation:
1. Remove consecutive consonants: "Robert" (no consecutive same sounds)
2. Keep first letter: R, work with "obert"
3. Remove vowels: "brt"
4. Replace consonants: b→1, r→6, t→3 = "163"
5. Already 3 digits: "163"
Final: "R163"

Example 4

Input: "Rupert"
Output: "R163"
Explanation:
1. Remove consecutive consonants: "Rupert" (no consecutive same sounds)
2. Keep first letter: R, work with "upert"
3. Remove vowels: "prt"
4. Replace consonants: p→1, r→6, t→3 = "163"
5. Already 3 digits: "163"
Final: "R163" (same as Robert, showing phonetic similarity)

Example 5

Input: "Ashcraft"
Output: "A261"
Explanation:
1. Remove consecutive consonants: "Ashcraft" (no consecutive same sounds)
2. Keep first letter: A, work with "shcraft"
3. Remove vowels: "shcrft"
4. Replace consonants: s→2, h→(removed), c→2, r→6, f→1, t→3 = "22613"
5. Remove consecutive same digits: "2613", take first 3: "261"
Final: "A261"

Solution

Method 1 - Step-by-Step String Processing

Intuition

The Soundex algorithm transforms names into a phonetic representation by systematically removing and replacing characters based on their sound. We process the string step by step, following each rule in sequence to build the final code.

Approach

Handle edge cases: empty string, non-alphabetic characters
Convert input to uppercase for consistent processing
Remove consecutive consonants with same sound (like ck → c)
Keep the first letter as the prefix
Process remaining string: remove vowels (a,e,i,o,u,y,w,h)
Replace consonants with their corresponding digits using a mapping
Remove consecutive duplicate digits
Pad with zeros or truncate to exactly 3 digits
Combine first letter with 3-digit code

Code

C++

class Solution {
private:
    unordered_map<char, char> digitMap = {
        {'B', '1'}, {'F', '1'}, {'P', '1'}, {'V', '1'},
        {'C', '2'}, {'G', '2'}, {'J', '2'}, {'K', '2'}, {'Q', '2'}, {'S', '2'}, {'X', '2'}, {'Z', '2'},
        {'D', '3'}, {'T', '3'},
        {'L', '4'},
        {'M', '5'}, {'N', '5'},
        {'R', '6'}
    };
    
    unordered_set<char> vowels = {'A', 'E', 'I', 'O', 'U', 'Y', 'W', 'H'};
    
    string removeConsecutiveConsonants(string s) {
        string result;
        for (int i = 0; i < s.length(); i++) {
            if (i == 0 || s[i] != s[i-1] || vowels.count(s[i])) {
                result += s[i];
            }
        }
        return result;
    }
    
    string removeVowels(string s) {
        string result;
        for (char c : s) {
            if (vowels.find(c) == vowels.end()) {
                result += c;
            }
        }
        return result;
    }
    
    string replaceWithDigits(string s) {
        string result;
        for (char c : s) {
            if (digitMap.count(c)) {
                result += digitMap[c];
            }
        }
        return result;
    }
    
    string removeDuplicateDigits(string s) {
        if (s.empty()) return s;
        string result;
        result += s[0];
        for (int i = 1; i < s.length(); i++) {
            if (s[i] != s[i-1]) {
                result += s[i];
            }
        }
        return result;
    }
    
public:
    string soundex(string name) {
        if (name.empty()) return "";
        
        // Convert to uppercase
        transform(name.begin(), name.end(), name.begin(), ::toupper);
        
        // Step 1: Remove consecutive consonants with same sound
        name = removeConsecutiveConsonants(name);
        
        // Step 2: Keep first letter
        char firstLetter = name[0];
        string remaining = name.substr(1);
        
        // Step 3: Remove vowels
        remaining = removeVowels(remaining);
        
        // Step 4: Replace consonants with digits
        string digits = replaceWithDigits(remaining);
        
        // Remove consecutive duplicate digits
        digits = removeDuplicateDigits(digits);
        
        // Step 5: Pad or truncate to 3 digits
        while (digits.length() < 3) {
            digits += '0';
        }
        if (digits.length() > 3) {
            digits = digits.substr(0, 3);
        }
        
        return string(1, firstLetter) + digits;
    }
};

Go

func soundex(name string) string {
    if len(name) == 0 {
        return ""
    }
    
    digitMap := map[rune]rune{
        'B': '1', 'F': '1', 'P': '1', 'V': '1',
        'C': '2', 'G': '2', 'J': '2', 'K': '2', 'Q': '2', 'S': '2', 'X': '2', 'Z': '2',
        'D': '3', 'T': '3',
        'L': '4',
        'M': '5', 'N': '5',
        'R': '6',
    }
    
    vowels := map[rune]bool{
        'A': true, 'E': true, 'I': true, 'O': true, 'U': true, 'Y': true, 'W': true, 'H': true,
    }
    
    // Convert to uppercase
    name = strings.ToUpper(name)
    runes := []rune(name)
    
    // Step 1: Remove consecutive consonants with same sound
    var filtered []rune
    for i, r := range runes {
        if i == 0 || r != runes[i-1] || vowels[r] {
            filtered = append(filtered, r)
        }
    }
    
    if len(filtered) == 0 {
        return ""
    }
    
    // Step 2: Keep first letter
    firstLetter := filtered[0]
    remaining := filtered[1:]
    
    // Step 3: Remove vowels
    var noVowels []rune
    for _, r := range remaining {
        if !vowels[r] {
            noVowels = append(noVowels, r)
        }
    }
    
    // Step 4: Replace with digits
    var digits []rune
    for _, r := range noVowels {
        if digit, exists := digitMap[r]; exists {
            digits = append(digits, digit)
        }
    }
    
    // Remove consecutive duplicate digits
    var noDuplicates []rune
    for i, digit := range digits {
        if i == 0 || digit != digits[i-1] {
            noDuplicates = append(noDuplicates, digit)
        }
    }
    
    // Step 5: Pad or truncate to 3 digits
    for len(noDuplicates) < 3 {
        noDuplicates = append(noDuplicates, '0')
    }
    if len(noDuplicates) > 3 {
        noDuplicates = noDuplicates[:3]
    }
    
    return string(firstLetter) + string(noDuplicates)
}

Java

class Solution {
    private Map<Character, Character> digitMap;
    private Set<Character> vowels;
    
    public Solution() {
        digitMap = new HashMap<>();
        digitMap.put('B', '1'); digitMap.put('F', '1'); digitMap.put('P', '1'); digitMap.put('V', '1');
        digitMap.put('C', '2'); digitMap.put('G', '2'); digitMap.put('J', '2'); digitMap.put('K', '2');
        digitMap.put('Q', '2'); digitMap.put('S', '2'); digitMap.put('X', '2'); digitMap.put('Z', '2');
        digitMap.put('D', '3'); digitMap.put('T', '3');
        digitMap.put('L', '4');
        digitMap.put('M', '5'); digitMap.put('N', '5');
        digitMap.put('R', '6');
        
        vowels = new HashSet<>(Arrays.asList('A', 'E', 'I', 'O', 'U', 'Y', 'W', 'H'));
    }
    
    private String removeConsecutiveConsonants(String s) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            if (i == 0 || c != s.charAt(i-1) || vowels.contains(c)) {
                sb.append(c);
            }
        }
        return sb.toString();
    }
    
    private String removeVowels(String s) {
        StringBuilder sb = new StringBuilder();
        for (char c : s.toCharArray()) {
            if (!vowels.contains(c)) {
                sb.append(c);
            }
        }
        return sb.toString();
    }
    
    private String replaceWithDigits(String s) {
        StringBuilder sb = new StringBuilder();
        for (char c : s.toCharArray()) {
            if (digitMap.containsKey(c)) {
                sb.append(digitMap.get(c));
            }
        }
        return sb.toString();
    }
    
    private String removeDuplicateDigits(String s) {
        if (s.isEmpty()) return s;
        StringBuilder sb = new StringBuilder();
        sb.append(s.charAt(0));
        for (int i = 1; i < s.length(); i++) {
            if (s.charAt(i) != s.charAt(i-1)) {
                sb.append(s.charAt(i));
            }
        }
        return sb.toString();
    }
    
    public String soundex(String name) {
        if (name == null || name.isEmpty()) return "";
        
        // Convert to uppercase
        name = name.toUpperCase();
        
        // Step 1: Remove consecutive consonants with same sound
        name = removeConsecutiveConsonants(name);
        
        // Step 2: Keep first letter
        char firstLetter = name.charAt(0);
        String remaining = name.substring(1);
        
        // Step 3: Remove vowels
        remaining = removeVowels(remaining);
        
        // Step 4: Replace consonants with digits
        String digits = replaceWithDigits(remaining);
        
        // Remove consecutive duplicate digits
        digits = removeDuplicateDigits(digits);
        
        // Step 5: Pad or truncate to 3 digits
        while (digits.length() < 3) {
            digits += '0';
        }
        if (digits.length() > 3) {
            digits = digits.substring(0, 3);
        }
        
        return firstLetter + digits;
    }
}

Python

class Solution:
    def __init__(self):
        self.digit_map = {
            'B': '1', 'F': '1', 'P': '1', 'V': '1',
            'C': '2', 'G': '2', 'J': '2', 'K': '2', 'Q': '2', 'S': '2', 'X': '2', 'Z': '2',
            'D': '3', 'T': '3',
            'L': '4',
            'M': '5', 'N': '5',
            'R': '6'
        }
        self.vowels = {'A', 'E', 'I', 'O', 'U', 'Y', 'W', 'H'}
    
    def _remove_consecutive_consonants(self, s: str) -> str:
        result = []
        for i, c in enumerate(s):
            if i == 0 or c != s[i-1] or c in self.vowels:
                result.append(c)
        return ''.join(result)
    
    def _remove_vowels(self, s: str) -> str:
        return ''.join(c for c in s if c not in self.vowels)
    
    def _replace_with_digits(self, s: str) -> str:
        return ''.join(self.digit_map.get(c, '') for c in s if c in self.digit_map)
    
    def _remove_duplicate_digits(self, s: str) -> str:
        if not s:
            return s
        result = [s[0]]
        for i in range(1, len(s)):
            if s[i] != s[i-1]:
                result.append(s[i])
        return ''.join(result)
    
    def soundex(self, name: str) -> str:
        if not name:
            return ""
        
        # Convert to uppercase
        name = name.upper()
        
        # Step 1: Remove consecutive consonants with same sound
        name = self._remove_consecutive_consonants(name)
        
        # Step 2: Keep first letter
        first_letter = name[0]
        remaining = name[1:]
        
        # Step 3: Remove vowels
        remaining = self._remove_vowels(remaining)
        
        # Step 4: Replace consonants with digits
        digits = self._replace_with_digits(remaining)
        
        # Remove consecutive duplicate digits
        digits = self._remove_duplicate_digits(digits)
        
        # Step 5: Pad or truncate to 3 digits
        digits = digits.ljust(3, '0')[:3]
        
        return first_letter + digits

Complexity

⏰ Time complexity: O(n), where n is the length of the input string. Each step processes the string once
🧺 Space complexity: O(n) for intermediate string transformations and hash maps storing character mappings

Method 2 - Optimized Single-Pass Processing

Intuition

Instead of multiple passes through the string, we can optimize by combining several steps and processing the string in fewer iterations. We can also avoid creating multiple intermediate strings by using more efficient data structures.

Approach

Process input validation and case conversion upfront
Combine vowel removal and digit mapping in a single pass
Handle consecutive duplicate removal during the mapping process
Use StringBuilder/list for efficient string building
Apply padding/truncation logic at the end

Code

C++

class Solution {
private:
    unordered_map<char, char> digitMap = {
        {'B', '1'}, {'F', '1'}, {'P', '1'}, {'V', '1'},
        {'C', '2'}, {'G', '2'}, {'J', '2'}, {'K', '2'}, {'Q', '2'}, {'S', '2'}, {'X', '2'}, {'Z', '2'},
        {'D', '3'}, {'T', '3'},
        {'L', '4'},
        {'M', '5'}, {'N', '5'},
        {'R', '6'}
    };
    
    unordered_set<char> vowels = {'A', 'E', 'I', 'O', 'U', 'Y', 'W', 'H'};
    
public:
    string soundexOptimized(string name) {
        if (name.empty()) return "";
        
        // Convert to uppercase
        transform(name.begin(), name.end(), name.begin(), ::toupper);
        
        // Remove consecutive duplicates first
        string processed;
        for (int i = 0; i < name.length(); i++) {
            if (i == 0 || name[i] != name[i-1] || vowels.count(name[i])) {
                processed += name[i];
            }
        }
        
        char firstLetter = processed[0];
        string digits;
        
        // Process remaining characters in single pass
        for (int i = 1; i < processed.length(); i++) {
            char c = processed[i];
            
            // Skip vowels
            if (vowels.count(c)) continue;
            
            // Map to digit if applicable
            if (digitMap.count(c)) {
                char digit = digitMap[c];
                // Add only if different from last digit
                if (digits.empty() || digits.back() != digit) {
                    digits += digit;
                }
            }
        }
        
        // Adjust to exactly 3 digits
        digits.resize(3, '0');
        
        return string(1, firstLetter) + digits;
    }
};

Go

func soundexOptimized(name string) string {
    if len(name) == 0 {
        return ""
    }
    
    digitMap := map[rune]rune{
        'B': '1', 'F': '1', 'P': '1', 'V': '1',
        'C': '2', 'G': '2', 'J': '2', 'K': '2', 'Q': '2', 'S': '2', 'X': '2', 'Z': '2',
        'D': '3', 'T': '3',
        'L': '4',
        'M': '5', 'N': '5',
        'R': '6',
    }
    
    vowels := map[rune]bool{
        'A': true, 'E': true, 'I': true, 'O': true, 'U': true, 'Y': true, 'W': true, 'H': true,
    }
    
    name = strings.ToUpper(name)
    runes := []rune(name)
    
    // Remove consecutive duplicates
    var processed []rune
    for i, r := range runes {
        if i == 0 || r != runes[i-1] || vowels[r] {
            processed = append(processed, r)
        }
    }
    
    if len(processed) == 0 {
        return ""
    }
    
    firstLetter := processed[0]
    var digits []rune
    
    // Process remaining characters
    for i := 1; i < len(processed); i++ {
        r := processed[i]
        
        // Skip vowels
        if vowels[r] {
            continue
        }
        
        // Map to digit
        if digit, exists := digitMap[r]; exists {
            if len(digits) == 0 || digits[len(digits)-1] != digit {
                digits = append(digits, digit)
            }
        }
    }
    
    // Pad to 3 digits
    for len(digits) < 3 {
        digits = append(digits, '0')
    }
    if len(digits) > 3 {
        digits = digits[:3]
    }
    
    return string(firstLetter) + string(digits)
}

Java

class Solution {
    private static final Map<Character, Character> DIGIT_MAP = new HashMap<>();
    private static final Set<Character> VOWELS = new HashSet<>();
    
    static {
        DIGIT_MAP.put('B', '1'); DIGIT_MAP.put('F', '1'); DIGIT_MAP.put('P', '1'); DIGIT_MAP.put('V', '1');
        DIGIT_MAP.put('C', '2'); DIGIT_MAP.put('G', '2'); DIGIT_MAP.put('J', '2'); DIGIT_MAP.put('K', '2');
        DIGIT_MAP.put('Q', '2'); DIGIT_MAP.put('S', '2'); DIGIT_MAP.put('X', '2'); DIGIT_MAP.put('Z', '2');
        DIGIT_MAP.put('D', '3'); DIGIT_MAP.put('T', '3');
        DIGIT_MAP.put('L', '4');
        DIGIT_MAP.put('M', '5'); DIGIT_MAP.put('N', '5');
        DIGIT_MAP.put('R', '6');
        
        VOWELS.addAll(Arrays.asList('A', 'E', 'I', 'O', 'U', 'Y', 'W', 'H'));
    }
    
    public String soundexOptimized(String name) {
        if (name == null || name.isEmpty()) return "";
        
        name = name.toUpperCase();
        
        // Remove consecutive duplicates
        StringBuilder processed = new StringBuilder();
        for (int i = 0; i < name.length(); i++) {
            char c = name.charAt(i);
            if (i == 0 || c != name.charAt(i-1) || VOWELS.contains(c)) {
                processed.append(c);
            }
        }
        
        char firstLetter = processed.charAt(0);
        StringBuilder digits = new StringBuilder();
        
        // Process remaining characters
        for (int i = 1; i < processed.length(); i++) {
            char c = processed.charAt(i);
            
            if (VOWELS.contains(c)) continue;
            
            if (DIGIT_MAP.containsKey(c)) {
                char digit = DIGIT_MAP.get(c);
                if (digits.length() == 0 || digits.charAt(digits.length()-1) != digit) {
                    digits.append(digit);
                }
            }
        }
        
        // Adjust to exactly 3 digits
        while (digits.length() < 3) {
            digits.append('0');
        }
        if (digits.length() > 3) {
            digits.setLength(3);
        }
        
        return firstLetter + digits.toString();
    }
}

Python

class Solution:
    DIGIT_MAP = {
        'B': '1', 'F': '1', 'P': '1', 'V': '1',
        'C': '2', 'G': '2', 'J': '2', 'K': '2', 'Q': '2', 'S': '2', 'X': '2', 'Z': '2',
        'D': '3', 'T': '3',
        'L': '4',
        'M': '5', 'N': '5',
        'R': '6'
    }
    VOWELS = {'A', 'E', 'I', 'O', 'U', 'Y', 'W', 'H'}
    
    def soundex_optimized(self, name: str) -> str:
        if not name:
            return ""
        
        name = name.upper()
        
        # Remove consecutive duplicates
        processed = []
        for i, c in enumerate(name):
            if i == 0 or c != name[i-1] or c in self.VOWELS:
                processed.append(c)
        
        if not processed:
            return ""
        
        first_letter = processed[0]
        digits = []
        
        # Process remaining characters
        for i in range(1, len(processed)):
            c = processed[i]
            
            if c in self.VOWELS:
                continue
            
            if c in self.DIGIT_MAP:
                digit = self.DIGIT_MAP[c]
                if not digits or digits[-1] != digit:
                    digits.append(digit)
        
        # Adjust to exactly 3 digits
        digits = (digits + ['0'] * 3)[:3]
        
        return first_letter + ''.join(digits)

Complexity

⏰ Time complexity: O(n), single pass through the string with constant-time operations
🧺 Space complexity: O(1) additional space beyond input/output (constant-size maps and processed string)

Notes

Method 1 provides a clear step-by-step implementation following the algorithm exactly as described
Method 2 optimizes by combining steps and reducing intermediate string creation
The Soundex algorithm was originally designed for English surnames and may not work well for other languages
Edge cases include empty strings, non-alphabetic characters, and very short names
Both implementations handle the consecutive duplicate removal rule properly
The algorithm produces the same code for phonetically similar names like "Jackson" and "Jaxen"
Modern applications often use more sophisticated phonetic algorithms like Metaphone or Double Metaphone
The digit mapping reflects phonetic similarity: similar sounds map to the same digit
Vowels and similar-sounding consonants (y, w, h) are treated as separators and removed
The fixed 4-character format (1 letter + 3 digits) makes it suitable for database indexing and searching

Problem

Examples

Example 1

Example 2

Example 3

Example 4

Example 5

Solution

Method 1 - Step-by-Step String Processing

Intuition

Approach

Code

C++

Go

Java

Python

Complexity

Method 2 - Optimized Single-Pass Processing

Intuition

Approach

Code

C++

Go

Java

Python

Complexity

Notes

Comments