1. Data Structures & Algorithms
1.1: Unpacking a sequence into separate variables
Problem
You have a N-element
tuple or sequence that you would like to unpack into a collection of N
variables.
Solution
- Any sequence or iterable can be unpacked into variables using a simple assignment operation.
- The only requirement is that the the number of variables and structure of the sequence must match.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
If there is a mismatch in the number of elements; you will get an ERROR.
1 2 |
|
1 2 3 |
|
Discussion
Unpacking actually works with any object that happens to be iterable (tuples, lists, dicts, str, files, iteratirs, generators...)
1.2: Unpacking elements from iterables of arbitrary length
Problem
You need to unpack N
elements from an iterable; but the iterable may be longer than N
elements (causing a too many values to unpack
exception)
Solution
1.3 Keeping last N items
Problem
You want to keep a limited history of last few items seen during iteration.
Solution
- Keeping a limited history is a perfect use for
collections.deque
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
1.5: Priority Queue
Problem
You want to implement a queue
that sorts items by a given priority
& always returns the item with highest priority on each pop
operation.
Solution
The following class uses heapq
module to implement a simple priority queue
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
Here is an example of how we might use the above PriorityQueue
class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
In the above example note that the items with same priority (Item("foo")
and Item("grok")
)
are returned in the same order as they were inserted into the queue.
Discussion
The core of this recipe concerns with the use of heapq
module.
The methods heapq.heappush()
and heapq.heappop()
insert and remove items
from self._queue
in such a way that the first tem in the list has the smallest priority.
The heappop()
method always returns the smallest item; so that is the key idea to make
our PriorityQueue
pop correct items.
In this recipe the queue consists of tuple (-priority, idx, item)
.
The priority
value is negated to get the queue to sort items from
highest priority to lowest priority.
This is opposite from normal heap ordering; which sorts items from smallest to highest value.
The role of the idx
value is to properly order items with the same priority level.
By keeping a constantly increasing index, the items will be sorted according to the
order in which they were inserted.
However, the idx
also serves an important role in making the comparision work for items
with the smae priority level. To elaborate on this, the instances of Item
can't be ordered.
1 2 3 4 5 6 7 |
|
1 2 3 |
|
But, if you make (priority, item) tuple and compare them, then it works fine as long as the priorities are different.
1 2 3 4 5 6 7 |
|
idx
is used.
The idx
is always increasing and unique; so there is no chance of collision as shown below
1 2 3 4 5 6 7 |
|
1.6: collections.defaultdict
Mapping Keys to multiple values in Dictionary
You want to make a dictionary that maps keys to more than one value
(so called multdict
)
Solution
A dictionary is a mapping
where each key
is mapped to a single value.
If you want to map keys to multiple values; you need to store multiple values
in another container (like list
or set
or other container types
)
1 2 3 4 5 6 7 |
|
list
if you want to preserve the insertion-order of items.
2. Use a set
if you want to eliminate duplicates.
To easily construct such dictionaries, you can use collections.defaultdict
.
A feature of defaultdict
is that it automatically initializes the first value so
you can simply focus on adding items.
1 2 3 4 5 6 7 |
|
1 2 3 4 5 6 7 |
|
NOTE: One caution with defaultdict
is that it will automatically create dictionary entries for
keys accessed later on (even if they aren't found in the dictionary) i.e. instead of throwing KeyError
for keys that are not found, it adds that key to the dictinary and maps its value to the empty constructor
(list
or set
etc. as above).
If you want to avoid this behavior, its better to use usual dict()
with setdefault()
.
This process is a bit messy and hard-to-read as shown below;
but provides user-control to handle edge cases.
1 2 3 4 |
|
Discussion
In principle, constructing a multivariate dictionary is simple. However, initialization of the first value can be messy if you try to do it yourself. Below two code-snippets show this tradeoff.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
1.9: Finding commonalities in two dictionaries
Problem
You have two dictionaries and you want to find what they have in common (like same keys
, same values
etc.)
Solution
Consider two dictionaries
1 2 |
|
keys()
and values()
methods.
For example:
1 2 |
|
1 2 |
|
1 2 |
|
These kinds of operations can also be used to alter or filter dictionary contents.
For, example, suppose you want to make a new dictionary with selected keys removed. Below is a sample code using a dictionary comprehension
1 2 |
|
Discussion
A dictionary is a mapping between a set of keys and values
The .keys()
method on a dict returns keys-view object of the dict.
This keys-view object support common set operations like join, intersection, difference etc.
These set operations are supported because it is guranteed that the keys of a dict are uniquely hashable.
The .items()
method on a dict returns items-view object consisting
of (key, value)
pairs. This items-view object also supports common set operations as above.
The .values()
method on the dict returns a values-view object consisting of values of the dict.
BUT, this values-view object DOES-NOT SUPPORT common set operations
because the values of a dict are not guranteed to be unique.
Although, if necessary we can always convert this values-view object into a set and
then perform required set-operations as usual.
1.10: Removing duplicates from a Sequence (maintaining order)
Problem
You want to remove duplicates from the sequence while maintaining order
Solution
If the values in a sequence are hashable, then we solve this problem by using set
and generator
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
But this above solution is valid only if the items in the sequence are hashable.
For sequences, which have unhashable items (like dict
), the below solution works:
1 2 3 4 5 6 7 |
|
Here, the purpose of the key
argument is to specify a function
that converts sequence items
into hashable type for the purpose of duplicate detection.
For example:
1 2 3 4 5 |
|
1 |
|
1 2 3 |
|
1 |
|
1 2 |
|
This solution will work for any kind of data structure and a key
function that returns a hashable value
1.12: Frequency of items in a Sequence
Problem
You have a sequence of hashable items asn you would want ot count the frequency of each item in the sequence.
Solution
Use collections.Counter
Suppose we are give a list of strings words
as shown below and we want to find
the frequency of each word in the list
1 2 3 4 5 6 |
|
1 2 3 4 5 |
|
The Counter
class has a method most_common()
which returns the topn "n" most common
items in the sequence.
Discussion
As input, the
Counter
object can be fed any sequence of hashable items.
The Counter
object is basically just a dictionary that holds the unique items
in the input sequence as its keys
and their resepective counts as its values
.
If you want to increment the count manually, simply use addition op.
1 2 3 |
|
Alternative, you can also use
update
method on the Counter
object to
update the counter object.
1 |
|
The
Counter
object also supports common mathematical
operations like sum, difference etc.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
1.13: Sorting a List of dicts by a common key
Problem
You have a list of dictionaries and you would like to sort the items according to one or more of the dict values
Solution
Sorting this type of structure is easy using operator
module's itemgetter()
function.
Let's say you've queried a database table to get a listing of the members on your website,
and you receive the following data structure in return.
1 2 3 4 5 6 |
|
1 2 3 4 |
|
1 |
|
1 2 3 4 |
|
1 |
|
1 2 3 4 |
|
The
itemgetter()
function also accepts multiple keys as arguments
1 2 |
|
1 2 3 4 |
|
Discussion
In this example, rows
is passed to the built-in sorted()
function which accepts a keyword
argument key
. The key
argument is expected to be a callable that accepts a single item
from rows
as input and returns a single value that is used as a basis for sorting.
The itemgetter()
is just such a callable.
The operator.itemgetter()
function takes as argument the lookup indices
which is used to extract the desired values from the records in the rows
.
It can be a dictionary key name, a numeric list element, or any value that can be fed to the
object's __getitem__()
method.
If you give multiple indices to
itemgetter()
, the callable it produces will return
a tuple with all the indexed elements into it; which is then passed to the sorted()
function.
This is important if you want to simulatenously sort on multiple fields
(as shown in above example rows_by_lfname
, ie.e sorting by lname
first then by fname
).
The functionality of itemgetter()
is sometimes replaced by lambda
expressions,
1 2 |
|
But
itemgetter()
is faster than lambda
expressions.
Note that this technique can also be used for other functions that does their operations
which require
key
callable function such as min()
, max()
etc.
1 2 3 |
|
1.14: Sorting objects w/o native comparision support
Problem
You want to sort objects of the same class; but they don't natively support comparision operation.
Solution
The built-in sorted()
function takes a key
argument that can be passed a callable
that will return a value in the object that sorted()
will use to compare the objects.
For example, if you have sequence of User
instances and you want to sort the items in the sequence
using their used_id
attribute; you would supply a callable that will take a User
instance
as input and return the user_id
of that input which can then be used by sorted()
function
as a key to sort the instances in the sequence. This is illustrated in the below code.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Instead of using
lambda
, you can use a faster operator.attrgetter()
function
1 2 |
|
Discussion
The choice of whether to use lambda
or operator.attrgetter()
is personal,
but operator.attrgetter()
is a bit faster and supports multiple attribute indexing
as shown below.
Suppose the User
class has two attributes user_id
and user_name
and we want to sort
sequence of User
instances based on both these attributes; then we can use operator.attrgetter()
very easily.
1 2 3 4 |
|
This behavior of
attrgetter()
is analogous to itemgetter()
function (used for dictionaries)
Similar to
itemgetter()
, we can use attrgetter()
for other
comparision operations that require a key
attribute such as min()
, max()
etc.
1 2 |
|