Remove duplicate characters in a string

Problem

Given a string, Write a program to remove duplicate characters from the string.

Examples

Example 1:

Input: s = kodeknight
Output: kodenight

Example 2:

Input: s = k5kc
Output: k5c

Solution

There are multiple approach to solve it.

Method 1 : Brute Force

Code

C

void removeDuplicates(char *str)
{ 
	 if (!str) {
		 return; 
	 }
	 int len = strlen(str);
	 if (len < 2){
		 return;
	 }

	 int tail = 1; 
	 for (int i = 1; i < len; ++i) { 
	 	int j; 
		 for (j = 0; j < tail; ++j) 
		 if (str[i] == str[j]) {
	 		break;
		 }
	 	if (j == tail) { 
			 str[tail] = str[i];
			 ++tail;
	 	} 
	 } 
	 str[tail] = '\0';
}

Complexity

⏰ Time complexity: O(n^2)
🧺 Space complexity: O(1)

Method 2 : Using Sorting

These are the steps we can follow:

Convert string to char array (if not already)
Sort the elements.k5kc becomes 5ckk
Now in a loop, remove duplicates by comparing the current character with previous character. 5ck

Code

Java

public static String removeDuplicates(String s) {
	char[] chars = s.toCharArray();
	Arrays.sort(chars); // O(nlogn)
	String sbString = new String();
	for (int i = 1; i < chars.length; i++) {
		if (chars[i] != chars[i - 1])
			sbString += chars[i];
	}
	return sbString;
}

Complexity

⏰ Time complexity: O(nlogn)
🧺 Space complexity: O(1)

Characters are not in order.

Method 3 - Using Hashtable

Convert the string to character array.
Store all the characters in the Set.
Set will store only one instance of each character
Iterate through Set and print the characters.
This approach will change the order of characters.

Code

Java

Two Pass

 public static String removeDuplicates(String s) {
     HashSet <Character> set = new HashSet<>();
     char[] chars = s.toCharArray();
     for (int i = 0; i < chars.length; i++) {
         set.add(chars[i]);
     }
     Iterator <Character> iterator = set.iterator();
     String sbString = new String();
     while (iterator.hasNext()){
         sbString += iterator.next() + "";
     }

     return sbString;
 }

One Pass

Single pass algorithm without changing order:

 public static String removeDuplicates(String s) {
     HashSet < Character > set = new HashSet < > ();
     char[] chars = s.toCharArray();
     StringBuilder sb = new StringBuilder();
     for (int i = 0; i < chars.length; i++) {
         char c = chars[i];
         if (set.contains(c)) {
             continue;
         }
         set.add(chars[i]);
         sb.append(c);
     }
     return sb.toString();
 }

Complexity

⏰ Time complexity: O(n)
🧺 Space complexity: O(n)

Method 4 - LinkedHashset

Here are the algorithm:

Convert the string to character array.
Store all the characters in the Linked Hash Set.
Set will store only one instance of each character
Iterate through Set and print the characters.
This approach will retain the order of characters.

Code

Java

public static String removeDuplicates(String s) {
    LinkedHashSet <Character> set = new LinkedHashSet<>();
    char[] chars = s.toCharArray();
    for (int i = 0; i < chars.length; i++) {
        set.add(chars[i]);
    }
    Iterator <Character> iterator = set.iterator();
    String sbString = new String();
    while (iterator.hasNext())
        sbString += iterator.next() + "";

    return sbString;
}

Complexity

⏰ Time complexity: O(n^2)
🧺 Space complexity: O(1)

Method 5 - Using array as Hashtable

Similar to approach with hashtable, but if chars are between a-z, or ascii, just use array Here is the procedure

Convert the string to character array.
Create the boolean array of 256,( for all 0 – 255 ASCII characters)
Iterate through character array from step one and set the Boolean array index ( index = ascii value of the character). Ignore if that index is already true.
Create a separate array based on the index set as true in Boolean array.
This approach will retain the order of characters.

Code

C

void removeDuplicates(char * str) {

    if (!str)
        return;
    int len = strlen(str);
    if (len < 2)
        return;
    bool hit[256];
    for (int i = 0; i < 256; ++i)
        hit[i] = false;

    hit[str[0]] = true;
    int tail = 1;
    for (int i = 1; i < len; ++i) {
        if (!hit[str[i]]) {
            str[tail] = str[i];
            ++tail;
            hit[str[i]] = true;
        }
    }
    str[tail] = '\0';
}

Java

public static String removeDuplicates(String s) {
    char[] chrArr = s.toCharArray();
    boolean[] asciiChrSet = new boolean[256];
    StringBuilder stb = new StringBuilder();
    for (int i = 0; i < chrArr.length; i++) {
        if (asciiChrSet[chrArr[i]]) {
            continue;
        }
        asciiChrSet[chrArr[i]] = true;
        stb.append(chrArr[i]);
    }
    return stb.toString();
}

Complexity

⏰ Time complexity: O(n)
🧺 Space complexity: O(n)

Method 6 - Special Case of Chars Being a…z

Putting the constraint of space and ordering Suppose the problem becomes:

Write code to remove the duplicate characters in a string without using any additional buffer. NOTE: One or two additional variables are fine. An extra copy of the array is not.

Then what solution do we have? Method 2 cannot be used as we want to keep the ordering same. Method 3 cannot be used, as it uses hashtable, which uses extra spaces. Well, nothing is stopping us from using method 1, but it is brute force, and can we do better?

The answer is yes, if the interviewer confirms, that your string is a constrained by say it can have only small case integers. Since one or two additional variables are fine but no buffer is allowed, you can simulate the behaviour of a hashmap by using an integer to store bits instead. This does not work for normal cases. It will only work from a..z case sensitive. No spaces supported. This would work miracles in a 256bit system to process the entire ASCII range. But it does run in O(N). I still +1 this one.

Code

C

This simple solution runs at O(n), which is faster than yours. Also, it isn’t conceptually complicated and in-place :

public static void removeDuplicates(char[] str) {
    int map = 0;
    for (int i = 0; i < str.length; i++) {
        if ((map & (1 << (str[i] - 'a'))) > 0) // duplicate detected
            str[i] = 0;
        else // add unique char as a bit '1' to the map
            map |= 1 << (str[i] - 'a');
    }
}

The drawback is that the duplicates (which are replaced with 0’s) will not be placed at the end of the str[] array. However, this can easily be fixed by looping through the array one last time. Also, an integer has the capacity for only regular letters.

Complexity

⏰ Time complexity: O(n)
🧺 Space complexity: O(1)

Problem#

Examples#

Solution#

Method 1 : Brute Force#

Code#

C#

Complexity#

Method 2 : Using Sorting#

Code#

Java#

Complexity#

Method 3 - Using Hashtable#

Code#

Java#

Two Pass#

One Pass#

Complexity#

Method 4 - LinkedHashset#

Code#

Java#

Complexity#

Method 5 - Using array as Hashtable#

Code#

C#

Java#

Complexity#

Method 6 - Special Case of Chars Being a…z#

Code#

C#

Complexity#

Problem

Examples

Solution

Method 1 : Brute Force

Code

C

Complexity

Method 2 : Using Sorting

Code

Java

Complexity

Method 3 - Using Hashtable

Code

Java

Two Pass

One Pass

Complexity

Method 4 - LinkedHashset

Code

Java

Complexity

Method 5 - Using array as Hashtable

Code

C

Java

Complexity

Method 6 - Special Case of Chars Being a…z

Code

C

Complexity