8 minute read

Alien dictionary.

Problem description

Taken from interviewing.io.

Alien Dictionary Problem

You are given a list of lexicographically sorted words from an alien language. This language, despite sharing a character set with ASCII, has a unique order. Return the alphabetical order of all the letters found in the list of words. There may be more than one possible answer.

Example Inputs and Outputs

Example 1

Input: words = [“kaa”,”akcd”,”akca”,”cak”,”cad”]
Output: “kdac”

Example 2

Input: words = [“b”,”a”]
Output: “ba”

Example 3

Input: words = [“ab”,”a”,”b”]
Output: “”

Constraints

  • 0 <= words.length <= 100
  • 0 <= words[i].length <= 100
  • The character set for words[i] is ASCII.

Solution type

Toplogical sort, plus some hashing.

Solution

This one is on LeetCode Premium, which I don’t have and I’m not really willing to get just for the sake of practice. While googling to see if the problem was posted elsewhere, I found an article that listed the solution technique as topological sort, which kind of spoils most of the problem solving…so don’t view this one as legit. Still gonna give it a proper go, though.

Topological sort should pretty plainly give us the correct solution if we imagine drawing directed edges between our letters (which serve as vertices in our graph). If letter 1 comes before letter 2, then draw an edge from 1 to 2.

Here’s my entire test program I ran on cpp.sh before realizing I am an idiot and did not think about how human dictionaries actually work. Good lord.

#include <iostream>
#include <string>
#include <vector>
#include <unordered_set>
#include <algorithm>

using namespace std;

bool dfs(vector<vector<bool>>& adjs, vector<bool>& visited, vector<bool>& s, string& ordering, int i) {
    bool result = true;
    visited[i] = true;
    s[i] = true;
    for (int next = 0; next < 26; ++next) {
        if (adjs[i][next]) {
            if (visited[next]) {
                return false;
            }
            result = result && dfs(adjs, visited, s, ordering, next);
        }
    }
    s[i] = false;
    ordering.push_back('a' + i);
    return result;
}

string alienDictionary(vector<string>& words) {
    vector<vector<bool>> adjs(26, vector<bool>(26, false));
    unordered_set<int> used;
    for (int w = 0; w < words.size() - 1; ++w) {
        string word1 = words[w], word[2];
        for (int i = 0; i < word.size(); ++i) {
            for (int j = i + 1; j < word.size(); ++j) {
                int first = word[i] - 'a', second = word[j] - 'a';
                used.insert(first);
                used.insert(second);
                adjs[first][second] = true;
            }
        }
    }
    for (int i = 0; i < words.size(); ++i) {
        for (int j = i + 1; j < words.size(); ++j) {
            int first = words[i][0] - 'a', second = words[j][0] - 'a';
            used.insert(first);
            used.insert(second);
            adjs[first][second] = true;
        }
    }
    string ordering;
    vector<bool> visited(26, false);
    vector<bool> s(26, false);
    bool valid = true;
    for (int i = 0; i < 26; ++i) {
        if (used.find(i) != used.end() && !visited[i]) {
            valid = valid && dfs(adjs, visited, s, ordering, i);
        }
    } 

    reverse(ordering.begin(), ordering.end());
    return valid ? ordering : "NOPE";
}

int main()
{
    vector<string> test1 = {"a", "b"};
    vector<string> test2 = {"kaa","akcd","akca","cak","cad"};
    cout << alienDictionary(test1) << endl;
    cout << alienDictionary(test2) << endl;
}

I guess I lasered in on the topological sort too much. This isn’t a big deal; the way to compare two lexicographically adjacent words is to iterate through both and if you find a differing character, write an edge from the first character to the second character and break. If one word is longer than the other and the shorter word is a prefix of the longer one, then it will always come second lexicographically and there is nothing to learn there.

Well, my hour went out and this is what I had at the end. Does not pass…

#include <iostream>
#include <string>
#include <vector>
#include <unordered_set>
#include <algorithm>

using namespace std;

bool dfs(vector<vector<bool>>& adjs, vector<bool>& visited, vector<bool>& s, string& ordering, int i) {
    bool result = true;
    visited[i] = true;
    s[i] = true;
    for (int next = 0; next < 26; ++next) {
        if (adjs[i][next]) {
            if (visited[next]) {
                return false;
            }
            result = result && dfs(adjs, visited, s, ordering, next);
        }
    }
    s[i] = false;
    ordering.push_back('a' + i);
    return result;
}

string alienDictionary(vector<string>& words) {
    vector<vector<bool>> adjs(26, vector<bool>(26, false));
    unordered_set<int> used;
    for (int w = 0; w < words.size() - 1; ++w) {
        string word1 = words[w], word2 = words[w + 1];
        for (int i = 0; i < min(word1.size(), word2.size()); ++i) {
            if (word1[i] != word2[i]) {
                int first = word1[i] - 'a', second = word2[i] - 'a';
                used.insert(first);
                used.insert(second);
                adjs[first][second] = true;
                break;
            }
        }
    }

    string ordering;
    vector<bool> visited(26, false);
    vector<bool> s(26, false);
    bool valid = true;
    for (int i = 0; i < 26; ++i) {
        if (used.find(i) != used.end() && !visited[i]) {
            valid = valid && dfs(adjs, visited, s, ordering, i);
        }
    } 

    reverse(ordering.begin(), ordering.end());
    return valid ? ordering : "NOPE";
}

int main()
{
    vector<string> test1 = {"a", "b"};
    vector<string> test2 = {"kaa","akcd","akca","cak","cad"};
    cout << alienDictionary(test1) << endl;
    cout << alienDictionary(test2) << endl;
}

We could debate whether or not having this problem inside the leetcode harness would’ve made it possible for me to submit a working solution on time. I think I got tripped up by my own confusion on how to implement topological sort (spoiler alert: depth-first search is not the way you’re supposed to do it) as well as my dumb face not remembering how lexicological sort works. Maybe the two other problems I did earlier in the day took something out of me too, who knows. No point getting hung up over it, let’s see what the solution is…

Alright. I clearly didn’t know how to do a proper topological sort. Here’s a proper implementation on a generic graph (not specific to this problem):

vector<int> topologicalSort(unordered_map<int, vector<int>>& adjs, unordered_map<int, int>& inDegrees) {
    vector<int> ordering;
    queue<int> bfs;
    for (auto& it : inDegrees) {
        if (it.second == 0) {
            bfs.push(it.first);
        }
    }
    while (!bfs.empty()) {
        int current = bfs.front();
        bfs.pop();
        ordering.push_back(current);
        for (int i = 0; i < adjs[current].size(); ++i) {
            if (!--inDegrees[i]) {
                bfs.push(i);
            }
        }
    }
    // detects cycles
    if (ordering.size() != inDegrees.size()) {
        ordering.clear();
    }
    return ordering;
}

For future Kevin: take in the adjlist and indegrees, then BFS on nodes with in-degree 0. inDegrees needs to be updated on every queue pop in order for this to work properly. This will terminate even if there are cycles; it’s just that the output ordering will not include the entire graph.

Notice that we need a map of vertex : in-degree of said vertex, as otherwise we won’t know when to add something to the queue. In-degree here means the number of directed edges pointing towards a given vertex. For this problem, it doesn’t cost anything extra asymptotically to build inDegrees, since we need to pay the same asymptotic time and memory cost to build the adjacency list off of words anyway.

Making the adjlist a map instead of a 2D vector reduces memory and time costs to \(O(c)\), where \(c\) is the number of unique characters, instead of always doing things 26 times. Moreover, it’s a little easier to program and it’s not any more costly memory-wise.

Here’s what looks like a final valid solution, although I can’t submit to leetcode so I don’t know for sure. Note that this solution catches some edge cases that are missed in the original: a second word being longer than the first (which is invalid and should result in returning no ordering) and cycle detection (which is not really done properly the first time around). This includes the whole testing harness since again, the leetcode platform is not accessible for me.

#include <iostream>
#include <string>
#include <vector>
#include <unordered_set>
#include <unordered_map>
#include <algorithm>
#include <queue>

using namespace std;

string alienDictionary(vector<string>& words) {
    unordered_map<char, unordered_set<char>> adjs;
    unordered_map<char, int> inDegrees;
    bool valid = true;
    for (string& word : words) {
        for (char c : word) {
            if (inDegrees.find(c) == inDegrees.end()) {
                inDegrees[c] = 0;
            }
        }
    }   

    for (int w = 0; w < words.size() - 1; ++w) {
        string word1 = words[w], word2 = words[w + 1];
        for (int i = 0; i < min(word1.size(), word2.size()); ++i) {
            if (word1[i] != word2[i]) {
                adjs[word1[i]].insert(word2[i]);
                inDegrees[word2[i]]++;
                break;
            }
        }
        if (word1.size() > word2.size() && mismatch(word2.begin(), word2.end(), word1.begin()).first == word2.end()) {
            valid = false;
        }
    }

    string ordering;
    queue<char> bfs;
    for (auto& it : inDegrees) {
        if (it.second == 0) {
            bfs.push(it.first);
        }
    }
    while (!bfs.empty()) {
        char current = bfs.front();
        bfs.pop();
        ordering.push_back(current);
        for (auto& i : adjs[current]) {
            if (!--inDegrees[i]) {
                bfs.push(i);
            }
        }
    }
    // detects cycles
    if (ordering.size() != inDegrees.size()) {
        valid = false;
    }

    reverse(ordering.begin(), ordering.end());
    return valid ? ordering : "";
}

int main()
{
    vector<vector<string>> tests = {
        {"a", "b"},
        {"kaa","akcd","akca","cak","cad"},
        {"kaaba", "kaab"},
        {"ba", "a", "b"},
        {"caa", "aaa", "aab"},
        {"baa", "abcd", "abca", "cab", "cad"}
    };
    
    for (auto& test: tests) {
        std::cout << alienDictionary(test) << endl;
    }    
}

The unordered sets and maps are helpful in this case because we don’t know before going through words which characters we’re using and which characters have relationships. Using vectors basically means that we have adjacency matrices instead of adjlists, and that in turn requires us to deal with empty / unused cells, which just makes things harder and is not worth the time to implement around.

It’s worth noting that DFS will give you a reverse ordering, i.e. the “top” elements in the graph will come at the end instead.

I saw a variant (I think on geeksforgeeks) that asked for the number of possible valid orderings rather than just a single possible ordering. Answering that question requires some thought about how to compute the number of possible solutions given by topological sort. According to this Stack Exchange answer by Misha Lavrov (which at a brief glance seems to check out), you can solve this recursively in exponential time. Not gonna do that because I want to move on to the next problems.

Finally: this is absolutely not a problem you want to do in C++. So much annoying legwork.

Leave a comment

Your email address will not be published. Required fields are marked. *

Loading...