Solution: Find Longest Self-Contained Substring

Let’s solve the Find Longest Self-Contained Substring problem using the Hash Maps pattern.

Statement

You are given a string, s, consisting of lowercase English letters. Your task is to find the length of the longest self-contained substring of s.

A substring t of s is called self-contained if:

  • t is not equal to the entire string s.

  • Every character in t does not appear anywhere else in s (outside of t).

In other words, all characters in t are completely unique to that substring within the string s.

Return the length of the longest self-contained substring. If no such substring exists, return -1.

Constraints:

  • 22 \leq s.length 1000\leq 1000

  • s consists only of lowercase English letters.

Solution

We iterate through the string once to record where each character first appears and where it last appears. Two separate hash maps store each character’s first and last occurrence indices. Each unique character serves as a potential starting point, defining an initial window from its first to last occurrence. The window is adjusted based on the following conditions:

  • As we iterate within this window, if we encounter a character whose last occurrence is further to the right than the current window’s end, we expand the window to include it. This ensures that the substring remains self-contained, meaning all occurrences of each character within it are included. The process continues until no more characters extend the window’s boundary. 

  • As we expand the window, every character inside it must have its first occurrence within the current window’s start boundary. If we encounter a character whose first occurrence index is before the current window’s starting position, it means that an earlier part of the string contains an instance of that character, violating the self-contained property. When this happens, the current substring is invalid, and we discard it from consideration. 

The maximum valid window length is tracked and updated accordingly, ensuring the longest valid substring is returned.

The steps of the algorithm are as follows:

  1. Create two hashmaps, first, and last, to store the first and last occurrence index of each character in the string.

  2. Iterate through the string once to populate these hash maps with each character’s first and last occurrence.

  3. Initialize maxLen = -1 to keep track of the maximum length of a valid self-contained substring found.

  4. For each unique character c1 (processed once), start from its first occurrence index:

    1. Initialize the start and end of the window by setting the starting point to the character’s first occurrence and the ending point to its last occurrence in the string.

    2. Iterate through the string from the starting position start, extending the endpoint end whenever a character’s last occurrence is beyond the current endpoint end.

    3. If a character c2 inside the window has its first occurrence before the window's start, the window is invalid.

  5. Validate the substring:

    1. When the current index j reaches the end, check if the window is valid and its length is less than the total string length.

    2. If the window is valid, update maxLen with the maximum of its current value and the window’s length (end - start + 1).

  6. After checking all potential starting characters, return the maximum valid length found. If no valid substring exists, return -1.

Let’s look at the illustration below to better understand the solution.

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.