Skip to content

1. Data Structures & Algorithms

1.1: Unpacking a sequence into separate variables

Problem

You have a N-element tuple or sequence that you would like to unpack into a collection of N variables.

Solution
  1. Any sequence or iterable can be unpacked into variables using a simple assignment operation.
  2. The only requirement is that the the number of variables and structure of the sequence must match.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
##----------------------------------------------------------
p = (4,5)   ## create a tuple
x, y = p    ## unpack the tuple into variables 'x' & 'y'. 
            ## x=4; y=5
##----------------------------------------------------------
data = ["ACME", 50, 91.1, (2021,10,07)]
name, shares, price, date = data    ## unpack the list
                                    ## name="ACME"; shares=50
                                    ## price=91.1; date=(2021,10,07)

## another way to unpack the nested iterable or container
## name="ACME"; shares=50; price=91.1; 
## year = 2021; month=10; day=07
name, shares, price, (year, month, day) = data

If there is a mismatch in the number of elements; you will get an ERROR.

1
2
p = (4, 5)
x, y, z = p
1
2
3
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: need more than 2 values to unpack

Discussion

Unpacking actually works with any object that happens to be iterable (tuples, lists, dicts, str, files, iteratirs, generators...)

1.2: Unpacking elements from iterables of arbitrary length

Problem

You need to unpack N elements from an iterable; but the iterable may be longer than N elements (causing a too many values to unpack exception)

Solution

1.3 Keeping last N items

Problem

You want to keep a limited history of last few items seen during iteration.

Solution
  1. Keeping a limited history is a perfect use for collections.deque
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
"""
Performs a simple text match on a sequence of lines
and yields the matching line & previous N lines of context when found
"""

from collections import deque

def search(lines, pattern, history=5):
    previous_lines = deque(maxlen=history)
    for line in lines:
        if pattern in line:
            yield line, previous_lines
        previous_lines.append(line)

##-----------------------------------------------------------##
## Example use ona file
if __name__ == "__main__":
    with open("somefile.txt", "r") as f:
        for line, prev_lines in search(f, "my_pattern"):
            for x in lines:
                print(x, end="")
            print(line, end="")
            print("--"*10)

1.5: Priority Queue

Problem

You want to implement a queue that sorts items by a given priority & always returns the item with highest priority on each pop operation.

Solution

The following class uses heapq module to implement a simple priority queue

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import heapq

class PriorityQueue:
    def __init__(self):
        self._queue = []
        self._idx = 0

    def __repr__(self):
        return self._queue

    def push(self, item, priority):

        ## because heappop returns the 1st item in the queue 
        ## (which has the smallest priority);
        ## So, we need to invert the priority as (-priority)
        ## Thus, we get the item with highest priority on pop

        heapq.heappush(self._queue, (-priority, self._idx, item))
        self._idx += 1

    def pop(self):
        """
        Returns item with HIGHEST priority in the priority queue
        """
        result = heapq.heappop(self._queue)     ## a tuple of type (-priority, idx, item)
        return result[-1]                       ## "item" from above line

Here is an example of how we might use the above PriorityQueue class

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
class Item:
    def __init__(self, name):
        self.name = name

    def __repr__(self):
        return f"Item({self.name})"

##-----------------------------------------------------------##

q = PriorityQueue()                 ## Create a priority-queue object
q.push(Item("foo"), priority=1)     ## push an Item("foo") with priority=1
q.push(Item("bar"), priority=5)     ## push an Item("bar") with priority=5
q.push(Item("spam"), priority=4)    ## push an Item("spam") with priority=4
q.push(Item("grok"), priority=1)    ## push an Item("grok") with priority=1

q.pop()                             ## Item("bar")
q.pop()                             ## Item("spam")
q.pop()                             ## Item("foo")
q.pop()                             ## Item("grok")

In the above example note that the items with same priority (Item("foo") and Item("grok")) are returned in the same order as they were inserted into the queue.

Discussion

The core of this recipe concerns with the use of heapq module. The methods heapq.heappush() and heapq.heappop() insert and remove items from self._queue in such a way that the first tem in the list has the smallest priority.

The heappop() method always returns the smallest item; so that is the key idea to make our PriorityQueue pop correct items.

In this recipe the queue consists of tuple (-priority, idx, item). The priority value is negated to get the queue to sort items from highest priority to lowest priority. This is opposite from normal heap ordering; which sorts items from smallest to highest value.

The role of the idx value is to properly order items with the same priority level. By keeping a constantly increasing index, the items will be sorted according to the order in which they were inserted. However, the idx also serves an important role in making the comparision work for items with the smae priority level. To elaborate on this, the instances of Item can't be ordered.

1
2
3
4
5
6
7
## Item class objects cannot be compared; 
## because we have not implemented __eq__, __le__ etc...methods to support it

a = Item("foo")
b = Item("bar")
a < b                   ## ERROR. 
                        ## because "Item" object cannot be compared.
1
2
3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: Item() < Item()

But, if you make (priority, item) tuple and compare them, then it works fine as long as the priorities are different.

1
2
3
4
5
6
7
## tuples (priority, item) with different priorities.
## Comparision works fine

a = (1, Item("foo"))
b = (2, Item("bar"))
a < b                   ## Returns "True"
                        ## because priority: 1 < 2
So, to handle the case of same priority value; the idx is used. The idx is always increasing and unique; so there is no chance of collision as shown below

1
2
3
4
5
6
7
## Tuples (priority, idx, item) with same priorities but unique idx.
## Comparision works fine

a = (1, 7, Item("foo"))
b = (1, 8, Item("bar"))
a < b                   ## Returns "True"
                        ## because idx: 7 < 8 

1.6: collections.defaultdict

Mapping Keys to multiple values in Dictionary

You want to make a dictionary that maps keys to more than one value (so called multdict)

Solution

A dictionary is a mapping where each key is mapped to a single value. If you want to map keys to multiple values; you need to store multiple values in another container (like list or set or other container types)

1
2
3
4
5
6
7
d = {"a": [1,2,3],
     "b": [4,5],
    }

e = {"a": {1,2,3},
     "b": {4,5},
    }
1. Use a list if you want to preserve the insertion-order of items. 2. Use a set if you want to eliminate duplicates.

To easily construct such dictionaries, you can use collections.defaultdict. A feature of defaultdict is that it automatically initializes the first value so you can simply focus on adding items.

1
2
3
4
5
6
7
from collections import defaultdict

d = defaultdict(list)               ## use "list" as a constructor to initialize values map for new keys
d["a"].append(1)                    ## d = {"a":[1]}
d["a"].append(2)                    ## d = {"a":[1,2]}
d["b"].append(5)                    ## d = {"a":[1,2], "b":[5]}
d["b"].append(5)                    ## d = {"a":[1,2], "b":[5,5]}       duplication of 5 is allowed in list
1
2
3
4
5
6
7
from collections import defaultdict

d = defaultdict(set)                ## use "set" as a constructor to initialize values map for new keys
d["a"].add(1)                       ## d = {"a": {1}}
d["a"].add(2)                       ## d = {"a": {1,2}}
d["b"].add(5)                       ## d = {"a": {1,2}, "b": {5}}
d["b"].add(5)                       ## d = {"a": {1,2}, "b": {5}}       duplicates NOT allowed in "set"

NOTE: One caution with defaultdict is that it will automatically create dictionary entries for keys accessed later on (even if they aren't found in the dictionary) i.e. instead of throwing KeyError for keys that are not found, it adds that key to the dictinary and maps its value to the empty constructor (list or set etc. as above).

If you want to avoid this behavior, its better to use usual dict() with setdefault(). This process is a bit messy and hard-to-read as shown below; but provides user-control to handle edge cases.

1
2
3
4
d = {}                              ## a regular dictionary
d.setdefault("a", []).append(1)     ## d = {"a": [1]}
d.setdefault("a", []).append(2)     ## d = {"a": [1,2]}
d.setdefault("b", []).append(5)     ## d = {"a": [1,2], "b": [5]}
Discussion

In principle, constructing a multivariate dictionary is simple. However, initialization of the first value can be messy if you try to do it yourself. Below two code-snippets show this tradeoff.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
d = {}
for key, value in pairs:
    if key not in d:
        d[key] = []
    d[key].append(value)

##--------------- v/s -----------------##

from collections import defaultdict
d = defaultdict()
for key, value in items:
    d[key].append(value)

1.9: Finding commonalities in two dictionaries

Problem

You have two dictionaries and you want to find what they have in common (like same keys, same values etc.)

Solution

Consider two dictionaries

1
2
a = {"x":1, "y":2, "z":3}
b = {"w":10, "x":11, "y":2}
To find out what the two dictionaries have in common, simply perform the common set operations using the keys() and values() methods.

For example:

1
2
## finding keys in common
a.keys() & b.keys()         ##{"x", "y"}

1
2
## finding keys in a, that are not in b
a.keys() - b.keys()         ## {"z"}
1
2
## finding (key, value) pair in common
a.items() & b.items()       ## {("y", 2)}

These kinds of operations can also be used to alter or filter dictionary contents.

For, example, suppose you want to make a new dictionary with selected keys removed. Below is a sample code using a dictionary comprehension

1
2
## make a new dictionary with certain keys removed
c = {key:a[key] for key in a.keys()-{"z", "w"}}     ## {"x":1, "y":2}

Discussion

A dictionary is a mapping between a set of keys and values

The .keys() method on a dict returns keys-view object of the dict. This keys-view object support common set operations like join, intersection, difference etc. These set operations are supported because it is guranteed that the keys of a dict are uniquely hashable.

The .items() method on a dict returns items-view object consisting of (key, value) pairs. This items-view object also supports common set operations as above.

The .values() method on the dict returns a values-view object consisting of values of the dict. BUT, this values-view object DOES-NOT SUPPORT common set operations because the values of a dict are not guranteed to be unique. Although, if necessary we can always convert this values-view object into a set and then perform required set-operations as usual.

1.10: Removing duplicates from a Sequence (maintaining order)

Problem

You want to remove duplicates from the sequence while maintaining order

Solution

If the values in a sequence are hashable, then we solve this problem by using set and generator

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def dedupe(items):
    seen = set()
    for x in items:
        if x not in seen:
            yield x
            seen.add(x)

##----------------------------------##

a = [1, 5, 2, 1, 9, 1, 5, 10]
list(dedupe(a))                     ## [1, 5, 2, 9, 10]

## NOTE: above the order of the items in the output list is maintained according to the input "a"

But this above solution is valid only if the items in the sequence are hashable.

For sequences, which have unhashable items (like dict), the below solution works:

1
2
3
4
5
6
7
def dedupe_unhashable(items, key=None):
    seen = set()
    for item in items:
        val = item if key is None else key(item)
        if val not in seen:
            yield item
            seen.add(item)

Here, the purpose of the key argument is to specify a function that converts sequence items into hashable type for the purpose of duplicate detection. For example:

1
2
3
4
5
a = [{"x":1, "y":2},
     {"x":1, "y":3},
     {"X":1, "y":2},
     {"x":2, "y":4}
    ]
1
list(dedupe_unhashable(a, key=lambda d: (d["x"], d["y"])))
1
2
3
[{"x":1, "y":2},
 {"x":1, "y":3},
 {"x":2, "y":4}]
1
list(dedupe_unhashable(a, key=lambda d: (d["x"])))
1
2
[{"x":1, "y":2},
 {"x":2, "y":4}]

This solution will work for any kind of data structure and a key function that returns a hashable value

1.12: Frequency of items in a Sequence

Problem

You have a sequence of hashable items asn you would want ot count the frequency of each item in the sequence.

Solution

Use collections.Counter

Suppose we are give a list of strings words as shown below and we want to find the frequency of each word in the list

1
2
3
4
5
6
words = [
   'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes',
   'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the',
   'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into',
   'my', 'eyes', "you're", 'under'
]
1
2
3
4
5
from collections import Counter
word_counts = Counter(words)

top_three = word_counts.most_common(3)      ## returns the 3 most common items
print(top_three)                            ## [('eyes', 8), ('the', 5), ('look', 4)]

The Counter class has a method most_common() which returns the topn "n" most common items in the sequence.

Discussion

🎯 As input, the Counter object can be fed any sequence of hashable items. The Counter object is basically just a dictionary that holds the unique items in the input sequence as its keys and their resepective counts as its values.

🎯 If you want to increment the count manually, simply use addition op.

1
2
3
morewords = ['why','are','you','not','looking','in','my','eyes']
for word in morewords:
    word_counts[word] += 1

🏆 Alternative, you can also use update method on the Counter object to update the counter object.

1
word_counts.update(morewords)   ## does the same work as the above addiotion op.

🏆 The Counter object also supports common mathematical operations like sum, difference etc.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from collections import Counter
words = ["hello", "world", "hello", "world", "earth"]
morewords = ["hello", "vinay", "earth"]

a = Counter(words)          ## {"hello":2, "world":2, "earth":1}
b = Counter(morewords)      ## {"hello":1, "vinay":1, "earth":1}

## adding the two counter objects together
a + b                                       ## {"hello":3, "world":2, "earth":2, "vinay":1}

## subtracting the two counter objects
a - b                                       ## {"hello":1, "world":2, "vinay":1}

1.13: Sorting a List of dicts by a common key

Problem

You have a list of dictionaries and you would like to sort the items according to one or more of the dict values

Solution

Sorting this type of structure is easy using operator module's itemgetter() function. Let's say you've queried a database table to get a listing of the members on your website, and you receive the following data structure in return.

1
2
3
4
5
6
rows = [
    {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
    {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
    {'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
    {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}
]
It is fairly easy to output these rows ordered by any of the fields common to all of the rows in the dict. For example:

1
2
3
4
from operator import itemgetter

rows_by_fname = sorted(rows, key=itemgetter("fname"))
rows_by_uid = sorted(rows, key=itemgetter("uid"))
1
print(rows_by_fname)
1
2
3
4
[{'fname': 'Big', 'uid': 1004, 'lname': 'Jones'},
 {'fname': 'Brian', 'uid': 1003, 'lname': 'Jones'},
 {'fname': 'David', 'uid': 1002, 'lname': 'Beazley'},
 {'fname': 'John', 'uid': 1001, 'lname': 'Cleese'}]
1
print(rows_by_uid)
1
2
3
4
[{'fname': 'John', 'uid': 1001, 'lname': 'Cleese'},
 {'fname': 'David', 'uid': 1002, 'lname': 'Beazley'},
 {'fname': 'Brian', 'uid': 1003, 'lname': 'Jones'},
 {'fname': 'Big', 'uid': 1004, 'lname': 'Jones'}]

🏆 The itemgetter() function also accepts multiple keys as arguments

1
2
rows_by_lfname = sorted(rows, key=itemgetter("lname", "fname"))
print(rows_by_lfname)
1
2
3
4
[{'fname': 'David', 'uid': 1002, 'lname': 'Beazley'},
 {'fname': 'John', 'uid': 1001, 'lname': 'Cleese'},
 {'fname': 'Big', 'uid': 1004, 'lname': 'Jones'},
 {'fname': 'Brian', 'uid': 1003, 'lname': 'Jones'}]

Discussion

In this example, rows is passed to the built-in sorted() function which accepts a keyword argument key. The key argument is expected to be a callable that accepts a single item from rows as input and returns a single value that is used as a basis for sorting. The itemgetter() is just such a callable.

The operator.itemgetter() function takes as argument the lookup indices which is used to extract the desired values from the records in the rows. It can be a dictionary key name, a numeric list element, or any value that can be fed to the object's __getitem__() method.

🚨 If you give multiple indices to itemgetter(), the callable it produces will return a tuple with all the indexed elements into it; which is then passed to the sorted() function. This is important if you want to simulatenously sort on multiple fields (as shown in above example rows_by_lfname, ie.e sorting by lname first then by fname).

The functionality of itemgetter() is sometimes replaced by lambda expressions,

1
2
rows_by_fanme = sorted(rows, key=lambda r: r["fname"])
rows_by_lfname = sorted(rows, key=lambda r: (r["lname"], r["fname"]))

🏆 But itemgetter() is faster than lambda expressions.

🏆 Note that this technique can also be used for other functions that does their operations which require key callable function such as min(), max() etc.

1
2
3
from operator import itemgetter
min(rows, key=itemgetter("uid"))    ## {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}
max(rows, key=itemgetter("uid"))    ## {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}

1.14: Sorting objects w/o native comparision support

Problem

You want to sort objects of the same class; but they don't natively support comparision operation.

Solution

The built-in sorted() function takes a key argument that can be passed a callable that will return a value in the object that sorted() will use to compare the objects.

For example, if you have sequence of User instances and you want to sort the items in the sequence using their used_id attribute; you would supply a callable that will take a User instance as input and return the user_id of that input which can then be used by sorted() function as a key to sort the instances in the sequence. This is illustrated in the below code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class User:
    def __init__(self, user_id):
    self.user_id = user_id

    def __repr__(self):
        return f"User({self.user_id})"

## ---------------------------------------------- ##

users = [User(23), User(1), User(9)]

sorted(users, key=lambda u: u.user_id)  ## [User(1), User(9), User(23)]

🏆 Instead of using lambda, you can use a faster operator.attrgetter() function

1
2
from operator import attrgetter
sorted(users, key=attrgetter("user_id"))    ## [user(1), User(9), User(23)]
Discussion

The choice of whether to use lambda or operator.attrgetter() is personal, but operator.attrgetter() is a bit faster and supports multiple attribute indexing as shown below.

Suppose the User class has two attributes user_id and user_name and we want to sort sequence of User instances based on both these attributes; then we can use operator.attrgetter() very easily.

1
2
3
4
from operator import attrgetter

## assume that "User" class has two attributes "user_id" and "user_name"
sorted(users, key=attrgetter("user_id", "user_name"))

🚨 This behavior of attrgetter() is analogous to itemgetter() function (used for dictionaries)

🏆 Similar to itemgetter(), we can use attrgetter() for other comparision operations that require a key attribute such as min(), max() etc.

1
2
min(users, key=attrgetter("user_id"))
max(users, key=attrgetter("user_id"))