Repeat Code with Leetcode: String Compression
Here we venture beyond the sweet, innocent world of Leetcode easy, with its calm Fizzbuzz problems and anagrams, into the desolate and God-forsaken hole that is Leetcode medium. For a time, this compression problem was considered an easy. Leetcode then decided to change nothing about the problem itself and re-label it as a medium. That is why we now find ourselves in such a crisis.
In all seriousness, though, this problem is very similar to the Cracking the Coding Interview question, with a twist. That description, again:
The old question: (1.6 from Cracking the Coding Interview): Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabcccccaaa would become a2b1c5a3. If the compressed string would not become smaller than the original string, your method should return the original string. You can assume the string has only uppercase and lowercase letters (a-z).
The new question (link embedded): Given an array of characters chars
, compress it using the following algorithm:
Begin with an empty string s
. For each group of consecutive repeating characters in chars
:
- If the group’s length is
1
, append the character tos
. - Otherwise, append the character followed by the group’s length.
The compressed string s
should not be returned separately, but instead, be stored in the input character array chars
. Note that group lengths that are 10
or longer will be split into multiple characters in chars
.
After you are done modifying the input array, return the new length of the array.
You must write an algorithm that uses only constant extra space.
Basically, the problem is similar to its counterpart, only you have to do it in place, you do not include the count if the count is 1, a count with multiple digits like “13” has to be separated by digits, you have to return the size, and…well…frankly, I suspect that 3000 people found the description confusing as hell. If I were the ruler of Leetcode, I might have added the following:
- Clarification 1: The count you return represents the length of your new string. Let us suppose, for example, that you modify abbbbbbbbbbbb in-place into ab12 bbbbbbbbb but return 4. That would mean your answer is ab12, not ab12bbbbbbbbbb
- Clarification 2: Yes, if a count is greater than or equal to 10 then you are going to have to break it up
Solving the Problem
Problem: Leetcode 443
Difficulty: Medium
Topics Covered: String manipulation; Two pointers
- You solve this problem in-place. You do not create a new string like you are allowed to do in the Cracking the Coding Interview version, and you do not add new data structures.
- Use i, your first pointer, to iterate through the vector (or whatever your input is. Maybe you are using something other than C++).
- Use j, your second pointer, to walk forward as long as there is a repeat.
- j - i represents your count, but only if your count is greater than 1. Put the letter and count in your vector. I found this to be the one thing Kevin Naughton could have explained in more detail. If your original vector of chars is aabbbbccc, after you determine that the first count is 2 you turn this into aa̶ 2bbbbccc->a2bb̶ 4cbccc->a2b4cb̶ 3ccc. I will type that again without the strikethrough. You turn aabbbbccc into a2bbbbccc, then turn that into a2b4bbccc, then turn that into a2b4c3ccc, then return 6. Your answer, because you have specified the length as 6, is a2b4c3.
- Move i to the new j. Repeat until the end.
My code is below, in case you thought that was confusing. You can find it on my personal Github here.
You are welcome to check out a similar bit of code on my personal Github here, which I think more clearly shows what is going on via print statements, but the print statements also cause everything to explode on larger test cases.
Two Pointers
I think my conversation with a friend (who, for the upteenth time, wishes to be called “Smack” for some reason) went a little something like this:
Me: Is this a Two-Pointer Technique problem?
Smack: Why would you call it that?
Me: Say you were solving the problem of finding the middle element in a linked list…you would use the Two-Pointer Technique, correct?
Smack: Sure, but it is just too simple to have a name.
Me: Leetcode classifies this as a “two-pointer” problem. Do you take issue with calling this specific problem a “Two-Pointer Technique” problem, or you take issue with the term “Two-Pointer Technique” in general?
Smack: Yeah, I don’t like the term. When you stop a car, do you call that the “press-brake-to-stop technique”?
Without getting too far into the weeds, this is a list of two-pointer problems on Leetcode and this is the Leetcode explanation of the Two-Pointer Technique.
Closing Thoughts
So we just solved a Leetcode medium problem. Who cares? It has already been done, and this will do nothing to help us with the ArcGIS integration issue we will have to face again on Monday morning, or that Thursday morning standup when our junior engineer is likely going to get grilled because our senior engineer cannot stand him. Nothing is fundamentally different, now that we have done this. You are still probably going to have to read more articles, and I am going to get another angry Medium comment by someone who has 20 years of experience and wants this blog to be deleted.
But this problem establishes yet another way we can approach string compression. It is a string manipulation problem, which companies seem to love asking on the first interviews. It represents an entry-point into similar problems. This problem is more difficult than the version in which we could create a new string and append to it, which in turn was more difficult than your standard Fizzbuzz.
In other words, we are continuing to climb the skill ladder. The pace may be slow, but so were my first solution’s print statements.