Soundex Algorithm Implementation
Problem
Soundex is an algorithm used to categorize phonetically, such that two names that sound alike but are spelled differently have the same representation.
Soundex maps every name to a string consisting of one letter and three numbers, like M460.
One version of the algorithm is as follows:
- Remove consecutive consonants with the same sound (for example, change
ck -> c). - Keep the first letter. The remaining steps only apply to the rest of the string.
- Remove all vowels, including
y,w, andh. - Replace all consonants with the following digits:
- b, f, p, v → 1
- c, g, j, k, q, s, x, z → 2
- d, t → 3
- l → 4
- m, n → 5
- r → 6
- If you don't have three numbers yet, append zeros until you do. Keep the first three numbers.
Using this scheme, Jackson and Jaxen both map to J250.
Implement Soundex.
Examples
Example 1
Input: "Jackson"
Output: "J250"
Explanation:
1. Remove consecutive consonants: "Jackson" (no consecutive same sounds)
2. Keep first letter: J, work with "ackson"
3. Remove vowels (a,e,i,o,u,y,w,h): "ckson"
4. Replace consonants: c→2, k→2, s→2, o→(removed), n→5 = "2225"
5. Keep first 3 digits: "225", pad if needed: "250" (after removing consecutive same digits)
Final: "J250"
Example 2
Input: "Jaxen"
Output: "J250"
Explanation:
1. Remove consecutive consonants: "Jaxen" (no consecutive same sounds)
2. Keep first letter: J, work with "axen"
3. Remove vowels: "xn"
4. Replace consonants: x→2, n→5 = "25"
5. Pad to 3 digits: "250"
Final: "J250"
Example 3
Input: "Robert"
Output: "R163"
Explanation:
1. Remove consecutive consonants: "Robert" (no consecutive same sounds)
2. Keep first letter: R, work with "obert"
3. Remove vowels: "brt"
4. Replace consonants: b→1, r→6, t→3 = "163"
5. Already 3 digits: "163"
Final: "R163"
Example 4
Input: "Rupert"
Output: "R163"
Explanation:
1. Remove consecutive consonants: "Rupert" (no consecutive same sounds)
2. Keep first letter: R, work with "upert"
3. Remove vowels: "prt"
4. Replace consonants: p→1, r→6, t→3 = "163"
5. Already 3 digits: "163"
Final: "R163" (same as Robert, showing phonetic similarity)
Example 5
Input: "Ashcraft"
Output: "A261"
Explanation:
1. Remove consecutive consonants: "Ashcraft" (no consecutive same sounds)
2. Keep first letter: A, work with "shcraft"
3. Remove vowels: "shcrft"
4. Replace consonants: s→2, h→(removed), c→2, r→6, f→1, t→3 = "22613"
5. Remove consecutive same digits: "2613", take first 3: "261"
Final: "A261"
Solution
Method 1 - Step-by-Step String Processing
Intuition
The Soundex algorithm transforms names into a phonetic representation by systematically removing and replacing characters based on their sound. We process the string step by step, following each rule in sequence to build the final code.
Approach
- Handle edge cases: empty string, non-alphabetic characters
- Convert input to uppercase for consistent processing
- Remove consecutive consonants with same sound (like ck → c)
- Keep the first letter as the prefix
- Process remaining string: remove vowels (a,e,i,o,u,y,w,h)
- Replace consonants with their corresponding digits using a mapping
- Remove consecutive duplicate digits
- Pad with zeros or truncate to exactly 3 digits
- Combine first letter with 3-digit code
Code
C++
class Solution {
private:
unordered_map<char, char> digitMap = {
{'B', '1'}, {'F', '1'}, {'P', '1'}, {'V', '1'},
{'C', '2'}, {'G', '2'}, {'J', '2'}, {'K', '2'}, {'Q', '2'}, {'S', '2'}, {'X', '2'}, {'Z', '2'},
{'D', '3'}, {'T', '3'},
{'L', '4'},
{'M', '5'}, {'N', '5'},
{'R', '6'}
};
unordered_set<char> vowels = {'A', 'E', 'I', 'O', 'U', 'Y', 'W', 'H'};
string removeConsecutiveConsonants(string s) {
string result;
for (int i = 0; i < s.length(); i++) {
if (i == 0 || s[i] != s[i-1] || vowels.count(s[i])) {
result += s[i];
}
}
return result;
}
string removeVowels(string s) {
string result;
for (char c : s) {
if (vowels.find(c) == vowels.end()) {
result += c;
}
}
return result;
}
string replaceWithDigits(string s) {
string result;
for (char c : s) {
if (digitMap.count(c)) {
result += digitMap[c];
}
}
return result;
}
string removeDuplicateDigits(string s) {
if (s.empty()) return s;
string result;
result += s[0];
for (int i = 1; i < s.length(); i++) {
if (s[i] != s[i-1]) {
result += s[i];
}
}
return result;
}
public:
string soundex(string name) {
if (name.empty()) return "";
// Convert to uppercase
transform(name.begin(), name.end(), name.begin(), ::toupper);
// Step 1: Remove consecutive consonants with same sound
name = removeConsecutiveConsonants(name);
// Step 2: Keep first letter
char firstLetter = name[0];
string remaining = name.substr(1);
// Step 3: Remove vowels
remaining = removeVowels(remaining);
// Step 4: Replace consonants with digits
string digits = replaceWithDigits(remaining);
// Remove consecutive duplicate digits
digits = removeDuplicateDigits(digits);
// Step 5: Pad or truncate to 3 digits
while (digits.length() < 3) {
digits += '0';
}
if (digits.length() > 3) {
digits = digits.substr(0, 3);
}
return string(1, firstLetter) + digits;
}
};
Go
func soundex(name string) string {
if len(name) == 0 {
return ""
}
digitMap := map[rune]rune{
'B': '1', 'F': '1', 'P': '1', 'V': '1',
'C': '2', 'G': '2', 'J': '2', 'K': '2', 'Q': '2', 'S': '2', 'X': '2', 'Z': '2',
'D': '3', 'T': '3',
'L': '4',
'M': '5', 'N': '5',
'R': '6',
}
vowels := map[rune]bool{
'A': true, 'E': true, 'I': true, 'O': true, 'U': true, 'Y': true, 'W': true, 'H': true,
}
// Convert to uppercase
name = strings.ToUpper(name)
runes := []rune(name)
// Step 1: Remove consecutive consonants with same sound
var filtered []rune
for i, r := range runes {
if i == 0 || r != runes[i-1] || vowels[r] {
filtered = append(filtered, r)
}
}
if len(filtered) == 0 {
return ""
}
// Step 2: Keep first letter
firstLetter := filtered[0]
remaining := filtered[1:]
// Step 3: Remove vowels
var noVowels []rune
for _, r := range remaining {
if !vowels[r] {
noVowels = append(noVowels, r)
}
}
// Step 4: Replace with digits
var digits []rune
for _, r := range noVowels {
if digit, exists := digitMap[r]; exists {
digits = append(digits, digit)
}
}
// Remove consecutive duplicate digits
var noDuplicates []rune
for i, digit := range digits {
if i == 0 || digit != digits[i-1] {
noDuplicates = append(noDuplicates, digit)
}
}
// Step 5: Pad or truncate to 3 digits
for len(noDuplicates) < 3 {
noDuplicates = append(noDuplicates, '0')
}
if len(noDuplicates) > 3 {
noDuplicates = noDuplicates[:3]
}
return string(firstLetter) + string(noDuplicates)
}
Java
class Solution {
private Map<Character, Character> digitMap;
private Set<Character> vowels;
public Solution() {
digitMap = new HashMap<>();
digitMap.put('B', '1'); digitMap.put('F', '1'); digitMap.put('P', '1'); digitMap.put('V', '1');
digitMap.put('C', '2'); digitMap.put('G', '2'); digitMap.put('J', '2'); digitMap.put('K', '2');
digitMap.put('Q', '2'); digitMap.put('S', '2'); digitMap.put('X', '2'); digitMap.put('Z', '2');
digitMap.put('D', '3'); digitMap.put('T', '3');
digitMap.put('L', '4');
digitMap.put('M', '5'); digitMap.put('N', '5');
digitMap.put('R', '6');
vowels = new HashSet<>(Arrays.asList('A', 'E', 'I', 'O', 'U', 'Y', 'W', 'H'));
}
private String removeConsecutiveConsonants(String s) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (i == 0 || c != s.charAt(i-1) || vowels.contains(c)) {
sb.append(c);
}
}
return sb.toString();
}
private String removeVowels(String s) {
StringBuilder sb = new StringBuilder();
for (char c : s.toCharArray()) {
if (!vowels.contains(c)) {
sb.append(c);
}
}
return sb.toString();
}
private String replaceWithDigits(String s) {
StringBuilder sb = new StringBuilder();
for (char c : s.toCharArray()) {
if (digitMap.containsKey(c)) {
sb.append(digitMap.get(c));
}
}
return sb.toString();
}
private String removeDuplicateDigits(String s) {
if (s.isEmpty()) return s;
StringBuilder sb = new StringBuilder();
sb.append(s.charAt(0));
for (int i = 1; i < s.length(); i++) {
if (s.charAt(i) != s.charAt(i-1)) {
sb.append(s.charAt(i));
}
}
return sb.toString();
}
public String soundex(String name) {
if (name == null || name.isEmpty()) return "";
// Convert to uppercase
name = name.toUpperCase();
// Step 1: Remove consecutive consonants with same sound
name = removeConsecutiveConsonants(name);
// Step 2: Keep first letter
char firstLetter = name.charAt(0);
String remaining = name.substring(1);
// Step 3: Remove vowels
remaining = removeVowels(remaining);
// Step 4: Replace consonants with digits
String digits = replaceWithDigits(remaining);
// Remove consecutive duplicate digits
digits = removeDuplicateDigits(digits);
// Step 5: Pad or truncate to 3 digits
while (digits.length() < 3) {
digits += '0';
}
if (digits.length() > 3) {
digits = digits.substring(0, 3);
}
return firstLetter + digits;
}
}
Python
class Solution:
def __init__(self):
self.digit_map = {
'B': '1', 'F': '1', 'P': '1', 'V': '1',
'C': '2', 'G': '2', 'J': '2', 'K': '2', 'Q': '2', 'S': '2', 'X': '2', 'Z': '2',
'D': '3', 'T': '3',
'L': '4',
'M': '5', 'N': '5',
'R': '6'
}
self.vowels = {'A', 'E', 'I', 'O', 'U', 'Y', 'W', 'H'}
def _remove_consecutive_consonants(self, s: str) -> str:
result = []
for i, c in enumerate(s):
if i == 0 or c != s[i-1] or c in self.vowels:
result.append(c)
return ''.join(result)
def _remove_vowels(self, s: str) -> str:
return ''.join(c for c in s if c not in self.vowels)
def _replace_with_digits(self, s: str) -> str:
return ''.join(self.digit_map.get(c, '') for c in s if c in self.digit_map)
def _remove_duplicate_digits(self, s: str) -> str:
if not s:
return s
result = [s[0]]
for i in range(1, len(s)):
if s[i] != s[i-1]:
result.append(s[i])
return ''.join(result)
def soundex(self, name: str) -> str:
if not name:
return ""
# Convert to uppercase
name = name.upper()
# Step 1: Remove consecutive consonants with same sound
name = self._remove_consecutive_consonants(name)
# Step 2: Keep first letter
first_letter = name[0]
remaining = name[1:]
# Step 3: Remove vowels
remaining = self._remove_vowels(remaining)
# Step 4: Replace consonants with digits
digits = self._replace_with_digits(remaining)
# Remove consecutive duplicate digits
digits = self._remove_duplicate_digits(digits)
# Step 5: Pad or truncate to 3 digits
digits = digits.ljust(3, '0')[:3]
return first_letter + digits
Complexity
- ⏰ Time complexity:
O(n), where n is the length of the input string. Each step processes the string once - 🧺 Space complexity:
O(n)for intermediate string transformations and hash maps storing character mappings
Method 2 - Optimized Single-Pass Processing
Intuition
Instead of multiple passes through the string, we can optimize by combining several steps and processing the string in fewer iterations. We can also avoid creating multiple intermediate strings by using more efficient data structures.
Approach
- Process input validation and case conversion upfront
- Combine vowel removal and digit mapping in a single pass
- Handle consecutive duplicate removal during the mapping process
- Use StringBuilder/list for efficient string building
- Apply padding/truncation logic at the end
Code
C++
class Solution {
private:
unordered_map<char, char> digitMap = {
{'B', '1'}, {'F', '1'}, {'P', '1'}, {'V', '1'},
{'C', '2'}, {'G', '2'}, {'J', '2'}, {'K', '2'}, {'Q', '2'}, {'S', '2'}, {'X', '2'}, {'Z', '2'},
{'D', '3'}, {'T', '3'},
{'L', '4'},
{'M', '5'}, {'N', '5'},
{'R', '6'}
};
unordered_set<char> vowels = {'A', 'E', 'I', 'O', 'U', 'Y', 'W', 'H'};
public:
string soundexOptimized(string name) {
if (name.empty()) return "";
// Convert to uppercase
transform(name.begin(), name.end(), name.begin(), ::toupper);
// Remove consecutive duplicates first
string processed;
for (int i = 0; i < name.length(); i++) {
if (i == 0 || name[i] != name[i-1] || vowels.count(name[i])) {
processed += name[i];
}
}
char firstLetter = processed[0];
string digits;
// Process remaining characters in single pass
for (int i = 1; i < processed.length(); i++) {
char c = processed[i];
// Skip vowels
if (vowels.count(c)) continue;
// Map to digit if applicable
if (digitMap.count(c)) {
char digit = digitMap[c];
// Add only if different from last digit
if (digits.empty() || digits.back() != digit) {
digits += digit;
}
}
}
// Adjust to exactly 3 digits
digits.resize(3, '0');
return string(1, firstLetter) + digits;
}
};
Go
func soundexOptimized(name string) string {
if len(name) == 0 {
return ""
}
digitMap := map[rune]rune{
'B': '1', 'F': '1', 'P': '1', 'V': '1',
'C': '2', 'G': '2', 'J': '2', 'K': '2', 'Q': '2', 'S': '2', 'X': '2', 'Z': '2',
'D': '3', 'T': '3',
'L': '4',
'M': '5', 'N': '5',
'R': '6',
}
vowels := map[rune]bool{
'A': true, 'E': true, 'I': true, 'O': true, 'U': true, 'Y': true, 'W': true, 'H': true,
}
name = strings.ToUpper(name)
runes := []rune(name)
// Remove consecutive duplicates
var processed []rune
for i, r := range runes {
if i == 0 || r != runes[i-1] || vowels[r] {
processed = append(processed, r)
}
}
if len(processed) == 0 {
return ""
}
firstLetter := processed[0]
var digits []rune
// Process remaining characters
for i := 1; i < len(processed); i++ {
r := processed[i]
// Skip vowels
if vowels[r] {
continue
}
// Map to digit
if digit, exists := digitMap[r]; exists {
if len(digits) == 0 || digits[len(digits)-1] != digit {
digits = append(digits, digit)
}
}
}
// Pad to 3 digits
for len(digits) < 3 {
digits = append(digits, '0')
}
if len(digits) > 3 {
digits = digits[:3]
}
return string(firstLetter) + string(digits)
}
Java
class Solution {
private static final Map<Character, Character> DIGIT_MAP = new HashMap<>();
private static final Set<Character> VOWELS = new HashSet<>();
static {
DIGIT_MAP.put('B', '1'); DIGIT_MAP.put('F', '1'); DIGIT_MAP.put('P', '1'); DIGIT_MAP.put('V', '1');
DIGIT_MAP.put('C', '2'); DIGIT_MAP.put('G', '2'); DIGIT_MAP.put('J', '2'); DIGIT_MAP.put('K', '2');
DIGIT_MAP.put('Q', '2'); DIGIT_MAP.put('S', '2'); DIGIT_MAP.put('X', '2'); DIGIT_MAP.put('Z', '2');
DIGIT_MAP.put('D', '3'); DIGIT_MAP.put('T', '3');
DIGIT_MAP.put('L', '4');
DIGIT_MAP.put('M', '5'); DIGIT_MAP.put('N', '5');
DIGIT_MAP.put('R', '6');
VOWELS.addAll(Arrays.asList('A', 'E', 'I', 'O', 'U', 'Y', 'W', 'H'));
}
public String soundexOptimized(String name) {
if (name == null || name.isEmpty()) return "";
name = name.toUpperCase();
// Remove consecutive duplicates
StringBuilder processed = new StringBuilder();
for (int i = 0; i < name.length(); i++) {
char c = name.charAt(i);
if (i == 0 || c != name.charAt(i-1) || VOWELS.contains(c)) {
processed.append(c);
}
}
char firstLetter = processed.charAt(0);
StringBuilder digits = new StringBuilder();
// Process remaining characters
for (int i = 1; i < processed.length(); i++) {
char c = processed.charAt(i);
if (VOWELS.contains(c)) continue;
if (DIGIT_MAP.containsKey(c)) {
char digit = DIGIT_MAP.get(c);
if (digits.length() == 0 || digits.charAt(digits.length()-1) != digit) {
digits.append(digit);
}
}
}
// Adjust to exactly 3 digits
while (digits.length() < 3) {
digits.append('0');
}
if (digits.length() > 3) {
digits.setLength(3);
}
return firstLetter + digits.toString();
}
}
Python
class Solution:
DIGIT_MAP = {
'B': '1', 'F': '1', 'P': '1', 'V': '1',
'C': '2', 'G': '2', 'J': '2', 'K': '2', 'Q': '2', 'S': '2', 'X': '2', 'Z': '2',
'D': '3', 'T': '3',
'L': '4',
'M': '5', 'N': '5',
'R': '6'
}
VOWELS = {'A', 'E', 'I', 'O', 'U', 'Y', 'W', 'H'}
def soundex_optimized(self, name: str) -> str:
if not name:
return ""
name = name.upper()
# Remove consecutive duplicates
processed = []
for i, c in enumerate(name):
if i == 0 or c != name[i-1] or c in self.VOWELS:
processed.append(c)
if not processed:
return ""
first_letter = processed[0]
digits = []
# Process remaining characters
for i in range(1, len(processed)):
c = processed[i]
if c in self.VOWELS:
continue
if c in self.DIGIT_MAP:
digit = self.DIGIT_MAP[c]
if not digits or digits[-1] != digit:
digits.append(digit)
# Adjust to exactly 3 digits
digits = (digits + ['0'] * 3)[:3]
return first_letter + ''.join(digits)
Complexity
- ⏰ Time complexity:
O(n), single pass through the string with constant-time operations - 🧺 Space complexity:
O(1)additional space beyond input/output (constant-size maps and processed string)
Notes
- Method 1 provides a clear step-by-step implementation following the algorithm exactly as described
- Method 2 optimizes by combining steps and reducing intermediate string creation
- The Soundex algorithm was originally designed for English surnames and may not work well for other languages
- Edge cases include empty strings, non-alphabetic characters, and very short names
- Both implementations handle the consecutive duplicate removal rule properly
- The algorithm produces the same code for phonetically similar names like "Jackson" and "Jaxen"
- Modern applications often use more sophisticated phonetic algorithms like Metaphone or Double Metaphone
- The digit mapping reflects phonetic similarity: similar sounds map to the same digit
- Vowels and similar-sounding consonants (y, w, h) are treated as separators and removed
- The fixed 4-character format (1 letter + 3 digits) makes it suitable for database indexing and searching