Anti-Patterns in Python Programming

tribaal · on July 9, 2014

The more frequent and dangerous pitfalls are, in my humble opinion:

- Bare except: statements (that catches everything, even Ctrl-C)

- Mutables as default function/method arguments

- Wildcard imports!

SEJeff · on July 9, 2014

Couldn't agree more! One of my all time new python interview questions gets a surprisingly large number of developers.

Given a function like:

    def append_one(l=[]):

        l.append(1)

        return l

What does this return each time?

    >>> append_one()

    >>> append_one()

    >>> append_one()

ryangittins · on July 9, 2014

The l (lowercase L) and the 1 (one) look really similar. Could that be the cause of some confusion? Of course, the function name helps, but most developers have learned not to trust function names to be an accurate description of what the function does, especially in tricky interview questions.

Still, I'd change this to something like:

    def append_five(l=[]):

        l.append(5)

        return l

It tests the same thing (knowledge of how default parameters work), but without the confounding problem of similar-looking characters. Of course, syntax highlighting would help the applicant out.

All of that being said, I still don't doubt that many developers don't know what they should about default parameters.

trevor-e · on July 9, 2014

I'm not a Python expert, but iirc from various blog posts the "l" variable does not get reset between function calls which will cause undesired behavior. So calling the function 3 times without argument would produce a list of size 1,2, and 3 with the third call rather than 3 lists of size 1. Can any Python guru's confirm?

zo1 · on July 9, 2014

The expression presented in the parameter list is only evaluated once, and that is when the method is defined. The confusion is that people assume the expression is evaluated every time the method is called.

kyllo · on July 9, 2014

The confusion is that people assume the expression is evaluated every time the method is called.

Because that's how it works in a lot of other languages, such as Ruby and Javascript.

zo1 · on July 9, 2014

I doubt that's the only reason. I fell for it myself at first without ever seeing a line of Ruby. It initially feels intuitive, and that's why I think most fall for it.

ollysb · on July 10, 2014

I think it's because the arguments are bound when the function is called. It's just natural that you'd expect the default values to also be bound at the same time.

Genmutant · on July 9, 2014

Yes that is correct. The default value gets created when the function is interpreted ("compiled").

deathanatos · on July 9, 2014

> The default value gets created when the function is interpreted ("compiled").

No. The default value gets "created" (the expression is evaluated and stored) when the def statement is executed. Take the following example:

  In [1]: def foo():
     ...:     def append_five(l=[]):
     ...:         l.append(5)
     ...:         return l
     ...:     return append_five
     ...:

  In [2]: a = foo()

  In [3]: b = foo()

  In [4]: a()
  Out[4]: [5]

  In [5]: b()
  Out[5]: [5]

  In [6]: _4 is _5
  Out[6]: False

We only wrote one function definition, but multiple lists are created. (They are created when the "def append_five" definition executes, during the execution of foo.)

tmerr · on July 10, 2014

I thought that was what he meant. Is there any sharp distinction between "interpreting" and "evaluating" in python that I am unaware of? I've always used the words more or less interchangeably. But now that I think about it that might be a little naive since I have no idea how it works under the hood

msellout · on July 10, 2014

You could say that interpreting is first parsing and second executing/evaluating. The parser tokenizes and does a small amount of optimization such as ignoring unassigned values.

gcr · on July 10, 2014

The parent wrote "compiled", which is certainly more incorrect than either "interpreted" or "evaluated."

SEJeff · on July 9, 2014

The key is object mutability. A list type is mutable and a tuple type is immutable.

If the candidate correctly deduces what will happen, I'll ask them to write a bug-free version, which looks like one of the below:

def append_one(var=None):

    var = var or []

    var.append(1)

    return var

def append_one(var=None):

    if var is None:

        var = []

    var.append(1)

    return var

Mutability is a very subtle but very important concept to understand in python. Everyone who uses python for non-trivial code should know it well: https://docs.python.org/2/reference/datamodel.html

mbell · on July 9, 2014

> The key is object mutability. A list type is mutable and a tuple type is immutable.

I don't think the question has much to do with mutability, it isn't surprising to me nor would I imagine most programmers that a list is mutable, that's very common.

The surprising part of this question is that the default value of 'l' continues to exist outside the lexical scope of the function, the expected behavior is that the value of 'l' is initialized at function call time and is garbage collected after each call. As it sits, using default values in python is sort of like defining a global that only has a named reference inside the function block, which is very strange.

pyre · on July 9, 2014

It has something to do with mutability, because if an object is immutable, the behavior of Python matches what the naive developer expects. It's only mutable objects that break those expectations.

Don't even get into unexpected behavior in classes:

    In [1]: class A(object):
       ...:     l = []
       ...:

    In [2]: a, b = A(), A()

    In [3]: a.l.append("Something")

    In [4]: a.l
    Out[4]: ['Something']

    In [5]: b.l
    Out[5]: ['Something']

    In [6]: class B(object0:
       ...:
    KeyboardInterrupt

    In [6]: class B(object):
       ...:     l = None
       ...:     def __init__(self):
       ...:         self.l = []
       ...:

    In [7]: c, d = B(), B()

    In [8]: c.l.append("Something")

    In [9]: c.l, d.l
    Out[9]: (['Something'], [])

mbell · on July 9, 2014

The other scoping issue in python that always struck me as strange is that loop variables aren't scoped to the loop, they continue to exist after the loop completes. I can see the logic for this feature even if I don't agree with it, but what I really don't get is that the loop variables are not defined if you iterate over something that is empty:

   >>> for item in [1]:
   ...   print item
   1
   >>> item
   1

   >>> for i in []:
   ...   print i
   
   >>> i
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   NameError: name 'i' is not defined

I would expect i == None. That oddity makes it dangerous to use the feature unless you're really careful (e.g. using a for - else construct).

pyre · on July 9, 2014

Then there's for-else:

    In [1]: for i in []:
       ...:     pass
       ...: else:
       ...:     print 'Else!'
       ...:
    Else!

    In [2]: for i in []:
       ...:     break
       ...: else:
       ...:    print 'Else!'
       ...:
    Else!

    In [3]: for i in range(2):
       ...:     break
       ...: else:
       ...:     print 'Else!'
       ...:

    In [4]: for i in range(2):
       ...:     pass
       ...: else:
       ...:     print 'Else!'
       ...:
    Else!

The syntax could be interpreted as:

  if len(l) == 0:
    print "Else!"
  else:
    for i in l:
      pass

The "catch cases where a `break` is triggered" case isn't common enough for this syntax feature to be encountered very often, leading to confusion when people come across it (though at least it's not a bug where a common use-case has weird behavior to new-comers).

dragonwriter · on July 9, 2014

> but what I really don't get is that the loop variables are not defined if you iterate over something that is empty

if you conceptualize how a for-loop has to work as a while-loop using Python's iterator protocol (which is the only way the iterator protocol itself makes sense), it seems pretty intuitive.

That is, this:

  for item in items:
      ...1
  else:
      ...2

becomes, approximately:

  try:
      while True:
          __hidden_iter = items.iter()
          try: 
              item = __hidden_iter.next()
          except StopIteration:
              raise __NormalLoopExit
          ...1
  except __NormalLoopExit:
      ...2

If you have an empty loop, the first assignment doesn't complete (instead raising StopIteration in evaluating the right side, which raises the notional exception __NormalLoopExit, which invokes the else: clause, if any) so the variable never gets around to being created.

Dylan16807 · on July 9, 2014

> if an object is immutable, the behavior of Python matches what the naive developer expects

If the object was immutable then append wouldn't work. That's hardly matching expectations.

pyre · on July 9, 2014

Most immutable objects don't have methods that would mutate the value, but fail because the object is immutable...

I guess the clarification to what I was saying is that, in the simple case (integers, strings, None) the objects are immutable. It's only getting into cases where the value of the object itself is mutable, that you run into issues. If all objects (or all objects 'allowed' as default values) were immutable, then this behavior would not trigger.

So saying that mutability has nothing to do with it isn't entirely true. It's the immutability of the types of values used in most simple cases that hides this issue from developers until they run into a more complex case.

SEJeff · on July 9, 2014

Read my post that has the "correct answers" which show you how to do it. The key is setting the default to None and then doing something like:

if val is None: val = []

or the more idiomatic python way:

    val = val or []

CognitiveLens · on July 10, 2014

I believe the former is more idiomatic, but I don't have a reference.

You want to explicitly check against `None` so that you're not overwriting all falsey values of `val` - even though you should generally try to enforce argument types, your second example would cause unexpected behavior in some cases, particularly those that have non-falsey 'default' assignments

Dylan16807 · on July 10, 2014

I'm well aware of that way to do it, but it doesn't excuse a different way being unintuitive.

Paradigma11 · on July 10, 2014

Why? You would just get a new list back.

msellout · on July 10, 2014

I can see why you might think so, but remember the Zen of Python is to have only one obvious way to do something.

    >>> [1,2,3] + [4,5]
    [1, 2, 3, 4, 5]

Thus appending should do something different than addition.

    >>> x = [1,2,3]
    >>> x.append([4,5])
    >>> x
    [1, 2, 3, [4, 5]]

manish_gill · on July 9, 2014

I'm sorry, but I don't understand why you would think this as unexpected behaviour? For the class A, the list l is a class-level attribute, hence it can be referred via either a or b objects, but for class B, after initialisation, l is an object attribute, so it is different for both c and d.

mbell · on July 9, 2014

It's not the concept that can be confusing, it's the syntax python chose.

In most of the languages I'm familiar with, there are very clear syntax differences when working with class attributes. For example, in many languages class attributes have to be accessed via the class name instead of from an instance of the class making it clear to the programmer they are working with a class attribute, e.g. MyClass.myClassVariable not myInstance.myClassVariable. Additionally, the way you define class attributes in python is the way you define instance attributes in many languages, which just adds to the confusion. e.g. in Java or C# you can define class variables directly in the class body, but an explicit 'static' keyword is needed, undecorated definitions are assumed to be instance variables.

Finally, I think the definition of class B above is a little more nuanced, class B has both a class attribute named l AND an instance attribute named l.

B.l == None and B().l == []

manish_gill · on July 9, 2014

Ah, gotcha!

It's been a while since I've done major OOP coding in any language other than Python, so I'm a little rusty. The issues you raise are perfectly legitimate and would be understandably confusing to newcomers to the language. :)

mark-r · on July 10, 2014

In Python everything is an object, including a function. The default value isn't a global, it belongs to the function.

I wonder if people who weren't exposed to languages which work differently ala C++ would be as surprised?

squeaky-clean · on July 9, 2014

I'm not a Python dev, but I've been meaning to learn for a while. So this is really interesting stuff. A few questions, if you don't mind.

I understand mutability and immutability in other languages (and I gave your link a quick read to make sure there weren't any weird Python-specific rules), so I understand how the list can change and still be the same object, but a tuple or string would not. But why does that mean that the default parameter object remains in existence throughout all calls, instead of being recreated each time it is called?

Is there a reason for this being the default behavior? It seems like the majority of the time you would want to use a default parameter, you'd want it to behave like your bug-free examples.

SoftwareMaven · on July 9, 2014

Think of the default parameter values as arguments to the initializer for the function object. If you passed a list into the constructor of a class, you wouldn't be surprised that if you modified the list outside the class that it would modify the same list inside the class.

While that explains how it works, I actually completely agree with you. This is surprising behavior and, in a language that prides itself on not being surprising, seems, well, surprising.

I have to wonder if performance isn't the big reason for it. If your default is [], it isn't a big deal to re-evaluate, but if your default is get_default_cities_from_slow_web_service(), having that re-evaluated on every function call would be catastrophic. Given the choice between two negatives, the choice they made is probably reasonable.

SEJeff · on July 9, 2014

You pretty much nailed it right there.

Before I ever ask this question (I do a lot of tech interviews sadly) I always ask the candidate about object mutability vs immutability. Almost everyone knows the textbook answer, and only a few know the actual implications of it. This tests which they know :)

Default kwargs of a function are defined at function definition. However, they are only in scope, for the scope of said function. It is a weird but important subtle difference.

calvins · on July 9, 2014

I think only the second is truly bug-free. The first only does what the user expects if they pass in a non-empty list:

  my_list = []
  append_one(my_list)
  # my_list didn't get anything appended to it

This shows up another subtle trap related to the "truthiness" (or falsiness in this case) of things like the empty list.

gshubert17 · on July 9, 2014

Since append doesn't return a value, how about:

  def append_one(var=None):
      return (var or []) + [1]

Would this take longer and/or use more storage for long lists as vars?

keeperofdakeys · on July 10, 2014

When you use + on two lists, a new list is created, and elements from both are copied into the new one. Whereas the append operation modifies the list, and simply adds a value. Keep in mind that a python "list" is really like a C++ vector, so while sometimes append operation sometimes allocates a new array, and copies all the values, in general is O(1). The add operation is O(n).

And besides all that, there is nothing wrong with doing an append on one line, and returning the variable on the next. It's clear and readable.

SEJeff · on July 10, 2014

I like this, very elegant actually.

msellout · on July 10, 2014

I'd argue that the key difference from other languages is (re)assignment rather than (im)mutability.

ryangittins · on July 9, 2014

Yes, that's correct. The default value is only interpreted once, when the `def` statement is called. After that point, it's completely mutable. You have to see Python functions as objects and default parameter values as object variables.

Dylan16807 · on July 9, 2014

The problem is not the existence of the object variables, but that they are in such an unfortunate place. The rest of the parameter list is declaring fresh local variables. It's inconsistent that the left side of the equals sign is per-invocation, and the right side is per-def.

sophacles · on July 9, 2014

Not the op, but I'd accept the confusion response of: [[]] [[],[]] [[],[],[]]

because the behavior is the same, whether or not they misread an 'l' as a 1.

pyre · on July 9, 2014

Python doesn't seem to agree with you :^)

  In [1]: a = []

  In [2]: a.append(a)

  In [3]: a
  Out[3]: [[...]]

  In [4]: a[0]
  Out[4]: [[...]]

  In [5]: a[0][0]
  Out[5]: [[...]]

  In [6]: a[0][0][0]
  Out[6]: [[...]]

  In [7]: a[0][0][0][0]
  Out[7]: [[...]]

  In [8]: a.append(a)

  In [9]: a
  Out[9]: [[...], [...]]

  In [10]: a[0][1][0] is a
  Out[10]: True

  In [11]: id(a)
  Out[11]: 4547140064

  In [12]: id(a[0][1][0])
  Out[12]: 4547140064

sophacles · on July 9, 2014

Yeah, the whole infinite loop thing. Wasn't fully thinking when I wrote my reply. Good catch.

Retric · on July 9, 2014

I would caution you not to interview on things you would not be happy to see in your code base.

In my experence your much better off with people that look at odd syntax and say, "I don't know what that does" vs those who do.

GhotiFish · on July 9, 2014

Well, this is pulled from a list of common python errors.

Using the default value in some capacity isn't that uncommon... Though maybe you were speaking to a more general case? for example, decoding a an obfuscated C file.

mark-r · on July 10, 2014

I've seen this trotted out time and time again, and at least in this simplified form it's a red herring. If you're going to mutate the argument, it doesn't make sense to give it a default value. If you're going to return a modified form of the input you need to make a copy of it. Doing both is simply absurd.

pbh101 · on July 10, 2014

Disagree. Would say it's a decent violation of expectations for the same instance to be passed into every invocation. Of course, the counterargument is 'know your tools,' which I'm partial to, but the fact that this pops up is an indication it is counterintuitive.

mark-r · on July 10, 2014

I actually agree it's counterintuitive. But this particular example makes no sense, nobody should be writing real-world code that looks like this in the first place. Either modify the original or return a copy, don't try to do both.

jmount · on July 9, 2014

Wow, that is really ugly semantics. Here are some notes of mine on how hard R works to avoid exposing this sort of aliasing/mutability issue to the user: http://www.win-vector.com/blog/2014/04/you-dont-need-to-unde...

lilsunnybee · on July 9, 2014

Yeah i really don't understand why this is just assumed to be a common 'gotcha' to be recognized and avoided by every competent python programmer. What exactly does the python spec specify as the desired behavior here? If you have this 'broken stair' that everyone should just know to step over, shouldn't somebody actually fix the stair!?

I know python is not unique in having warts like this, but it's pretty b.s. in general that unexpected behavior is just thought to be okay, especially in a language meant to be very accessible, and most especially since it's being used as a perfectly valid metric for disqualifying new python programmers from employment.

hueving · on July 10, 2014

It's not broken when you understand that functions are objects and default parameters are just members of those objects. Each time the function is executed you get local vars that point to these object members. If one is a mutable type, any changes you make to it will then obviously persist.

ollysb · on July 10, 2014

It leaves me wondering, are the parameters also scoped to the class(seeing as they're declared at the same time)? Wouldn't this cause an issue with concurrent access to the function?

pas · on July 10, 2014

In Python there's no such thing, because GIL. Maybe in JPython.

rmrfrmrf · on July 9, 2014

At what level would you test an interviewee with this kind of question: Python guru, Python expert, Python ninja, Python rockstar, or merely "is familiar with Python"? Your example is a very common gotcha that has been covered ad nauseam, but IMO it's still the kind of bug that would be caught immediately in code review and is very easily fixed.

lilsunnybee · on July 9, 2014

I thought we were past using trick questions like this anyways! Since, you know, it would be easy for an experienced yet anxious programmer to get tripped up on this, but someone who just browsed "python interview questions 101" to breeze on through. Also it selects against experienced multi-language developers, since language-specific quirks like this are not generally useful information to keep front-loaded, but are trivial to become re-familiar with in a work environment, or even gasp learn for the first time from a co-worker or helpful article.

If the industry as a whole cared about evidence-based, non-superstitious, non-monoculture-reinforcing hiring practices, we'd realize that tripping people up and judging programming capability based on minutia is as unfair as it is self-defeating.

SEJeff · on July 9, 2014

I don't use this as a trick question. I ask them to describe mutable vs immutable objects and gotchas. Then I write this function and ask them to describe in excruciating detail what it does, why, and how.

It is simply a easy way to gauge a candidate's proficiency with the language. It also helps if they know that this is a problem. You'd be shocked to know a lot of people on the market for jobs writing python don't get this question correct, but the smart ones often do when talking through it even if they didn't originally.

yoo-interested · on July 10, 2014

I think the idea is to see whether the interviewee is a kind of person who always googles and reads on "gotchas of language X" whenever he/she learns X.

scott_w · on July 9, 2014

Possibly the most interesting anti-pattern I saw was:

a_list_of_words = "my list of words".split(" ")

I never enquired why, since there were bigger issues in the code e.g. "unit testing" by running the code, taking the result and putting it as the check value. By running repr(value), copying out the string then comparing self.assertEqual(repr(value), '[<Object1: unicode_value>, ...]')

oddthink · on July 9, 2014

I do that all the time in the interpreter, especially when slicing pandas DataFrame objects, e.g.:

    df_subset = df['date buyer nwidgets'.split()]

That is far easier to type than the explicit list, with all its punctuation. Now, it's definitely weird that they did a `split(" ")` rather than just using the default, but the idea is the same.

I do try to strip stuff like that out before I put it into a script, replacing it with the explicit list, but I'm never sure if that actually improves anything. It's not as if the explicit list is any easier to read.

klibertp · on July 9, 2014

It's not weird, it's wrong:

    In [1]: "a    string     r".split(" ")
    Out[1]: ['a', '', '', '', 'string', '', '', '', '', 'r']

    In [2]: "a    string     r".split()
    Out[2]: ['a', 'string', 'r']

glomph · on July 9, 2014

In their case though that wouldn't have been a problem as each word was split on a single space.

manoDev · on July 9, 2014

Split without argument is equivalent to:

    >>> filter(None, " quick   hack for  split".split(" "))
    ['quick', 'hack', 'for', 'split']

klibertp · on July 9, 2014

Yeah, or re.split(r'\s+', ...), or something - the issue here is that you need to know about this.

From the comment a few levels up I understood that the code which used the str.split with " " argument didn't signify that someone who written it knew about its semantics. If he did and it was really what was intended then ofc it's completely ok, but if not, it can easily lead to bugs.

For example, if the user is required to input several ints separated with whitespace, this:

    map(int, input_str.split())

will rise only in expected cases, while this:

    map(int, input_str.split(" "))

can lead to rejecting correct input just because someone pressed space twice. It's very frustrating for the user, too, because whitespace are hard to spot visually.

So, I don't know if this qualifies as antipattern, but I think if I saw .split(" ") instead of .split() in the code I'd at the very least expect the comment explaining why it's used.

blossoms · on July 9, 2014

I don't mean to be pedantic, but a list (I am assuming df is a list) requires an int.

(That sentence I wrote about using hashable types need not apply, sorry!)

If you ran that code, you would get this error:

    TypeError: list indices must be integers, not list

gipp · on July 9, 2014

That's a pandas dataframe (idiomatically denoted df), not a list. It has funky slicing properties, and he's selecting columns of the dataframe in a perfectly valid way.

bluecalm · on July 9, 2014

This is actually useful: you may want to experiment with different list_of_words in the future and typing the words between [" ", " ", " "] is time consuming. It's also less readable.

coldtea · on July 9, 2014

How is this a real "anti-pattern"?

Inefficent, or bizarre way to do it maybe.

Anti-pattern is supposed to mean something more, though.

In this case there are no adverse effects and no ambiguity -- so, I guess the programmer was just lazy to construct the list.

james2vegas · on July 9, 2014

perhaps that line was written by someone used to Perl, where they would have had

@a_list_of_words = qw/my list of words/;

there

dozzie · on July 9, 2014

Or rubyst:

  a_list_of_words = %w{my list of words}

rbonvall · on July 10, 2014

You'd be sure if they wrote:

    >>> qw = str.split
    >>> qw('my list of words')
    ['my', 'list', 'of', 'words']

_ondq · on July 9, 2014

Why even use a list here? Tuples are for immutable/constant data.

a_tuple_of_words = ("my", "tuple", "of", "words")

or

a_tuple_of_words = "my", "tuple", "of", "words"

dozzie · on July 9, 2014

...because it's a list? Tuples were supposed to have a structure (at least that's what all the rest of the world thinks of them), so iterating through combination of apples, cars and languages makes no sense whatsoever.

But yes, Python misses entirely the point of tuples, treating them as read-only lists.

http://dozzie.jogger.pl/2014/04/11/python-tuples-the-useless...

pdonis · on July 10, 2014

> Tuples were supposed to have a structure (at least that's what all the rest of the world thinks of them)

No, a structure is, you know, a structure -- what C calls a struct. Python calls it a namedtuple. If some people call it just a tuple, well, that's a difference in terminology, but it doesn't mean Python is confused about the concepts, it's just using terminology you're not used to.

Also, if we're going to be pedantic about the meaning of data types, your blog post is wrong about lists. You say "position in the list doesn't matter", but that means ordering doesn't matter, and an unordered collection of similar objects is a set, not a list. Python makes this distinction clear: a list is ordered, a set is not.

dozzie · on July 10, 2014

> [...] that's a difference in terminology, but it doesn't mean Python is confused about the concepts

No, it menas exactly this. The term "tuple" and its use predates Python. Sorry, no banana.

> [...] your blog post is wrong about lists. You say "position in the list doesn't matter", but that means ordering doesn't matter

Oh, so what's the difference in meaning of element True on position 1 and element True on position 20? Position in list doesn't matter if we're talking about meaning of the elements.

pdonis · on July 10, 2014

> The term "tuple" and its use predates Python.

References, please? And not mathematical references; programming references. C was using the keyword "struct" long before Python to refer to what you are calling a tuple.

> what's the difference in meaning of element True on position 1 and element True on position 20?

The fact that the index is 1 instead of 20. Both elements have the same type, and might well refer to the same property of some sequence of things; but the index being 1 instead of 20 means the element True is describing that property relative to the first item in some sequence, instead of the 20th item. That's why position in the list makes a difference: the ordering of the items, as well as the type of the items, carries information.

(Of course, in Python the list items don't even have to be of the same type; but most uses of Python lists in practice that I've seen do assume that all the elements are "the same kind of thing".)

fjh · on July 10, 2014

> References, please? And not mathematical references; programming references.

ML has had tuples several decades before Python existed.

_ondq · on July 9, 2014

I'm not sure what you're getting at here, or what you are expecting tuples to be like. They can have as much "structure" as you need--they're just a collection.

Lighter-weight, immutable collections have a use case. The code in OP appears to be one where it makes sense. I follow the rule where variables are mutable IFF they need to be mutable.

dozzie · on July 9, 2014

Tuples by pythonists are used as they were mere lists, just immutable. This is clearly displayed by Python's own interface.

For the rest of the world, tuples are not immutable lists. They are tuples, i.e. collections of "objects" that could share nothing about their type. Tuples often are not even iterable! (Erlang, Haskell)

The fact that tuples in Python can have as much structure as one wants is derived from dynamic typing, not from the tuples' nature. The same you could say about Python's lists.

This is a really subtle issue. It takes to know more languages to see it clearly.

maxerickson · on July 9, 2014

Have you looked at named tuples? They shipped in the python standard library sometime in the last few years (they are at least in 3.3) and are clearly intended for storing structured data.

A typical rule of thumb in Python land is that heterogeneous data probably belongs in a tuple, so practice goes a little further than immutable lists.

I think you could improve your demonstration of the usage in the standard library by examining a random selection of usages to try to find out what is typical. But maybe you already looked at more than you talk about in the article (and I understand that this might not be an interesting use of your time).

dozzie · on July 10, 2014

No, I haven't looked at them. Python 2 has them since release 2.6, so it's out of my reach for any practical purpose at the moment (I need to preserve compatibility with Python 2.4).

> A typical rule of thumb in Python land is that heterogeneous data probably belongs in a tuple, so practice goes a little further than immutable lists.

The problem with Python tuples is it's two things mixed: immutable lists and a container for heterogenous data. It's the same situation as JavaScript's objects.

_ondq · on July 9, 2014

If it's so subtle, does it matter? This sounds like you just have a problem with the word "tuple" applied to an object that behaves differently from tuples in a statically-typed language.

Would you feel better if they named it "ImmutableList" instead?

jholman · on July 9, 2014

Can't speak for GP, but I would [feel better with that name].

(Although I agree with you that statically-typed-language-tuples don't seem to make sense in Python.)

But hey... Python's weird choice of how to name the ImmutableList could be worse, right?

For example, someone could be malicious enough to call their general-purpose associative array a "hash", just because a hashmap (note: not a hash) is a good implementation for large associative arrays. Wow, that'd be hilariously misleading, wouldn't it? Good times!

Or imagine someone was silly enough to name their auto-resizing arrays "vectors", even though in all previously existing contexts a "vector" is a sort of thing which absolutely cannot be meaningfully resized/extended. Ha. Think of the tiny cognitive burden placed on generations of future programmers-who-study-math, trying to juggle these two very-similar-but-distinct concepts, multiplied by the number of such future programmers. Amazing practical joke, right?

/rant

wtetzner · on July 10, 2014

Not really that important, but I think map is a better name than associative array.

dozzie · on July 10, 2014

No, it's not a problem with word "tuple" behaving differently from statically-typed language. It's a problem with word "tuple" behaving differently from all the rest of the world.

Yes, I would feel better if it was named "ImmutableList" or any other way that is not misleading about the purpose.

jl6 · on July 9, 2014

I'm sorry, I don't see clearly what a tuple should be. What would be different about Python tuples if they were true tuples?

alextgordon · on July 9, 2014

In most languages you can't usually:

1. Iterate over a tuple

2. Convert a list to a tuple

3. Construct a tuple of a length not known at compile-time

Python allows these because "why not?" but it does break their "one and only one way to do it" rule and confuses beginners a hell of a lot.

There are definitely borderline cases. For instance, should a Vector be a list or a tuple? A Vec3 type is obviously a tuple, but a large Vector destined for BLAS is obviously a list.

dragonwriter · on July 9, 2014

> Python allows these because "why not?"

No, it allows them because the distinction that those restrictions are founded on is only useful in a statically-typed languages, and Python isn't statically typed.

> For instance, should a Vector be a list or a tuple?

A real vector/array should be its own data type (probably implemented in a C, or similar low-level, extension) that happens to implement the interface expected of an indexable, iterable collection, neither a list nor a tuple.

dozzie · on July 10, 2014

> [this] distinction [...] is only useful in a statically-typed languages

...like Erlang.

dragonwriter · on July 10, 2014

Yeah, while snarkily made its a good point that Erlang does make use of it without being statically typed.

There is a deep difference that goes beyond use of tuples in language approach between Python and Erlang here where it comes to types in which Erlang, while dynamically typed, has a deep concern for types in its pattern matching system to make path decisions while Python is very much centered on using dynamic OO techniques -- how objects respond to messages -- to do that.

So I'd still say its the same kind of deep language approach difference at work.

jl6 · on July 10, 2014

So why does Python distinguish between a list and a tuple at all?

dragonwriter · on July 10, 2014

Because the distinction between a mutable list and an immutable list is still meaningful in a dynamic language like python.

hueving · on July 10, 2014

One is mutable, one is not. Performance. Also, look up namedtuple. Useful for returning multiple values.

Peaker · on July 9, 2014

Python tuples are used in both ways.

Even in Haskell, though, people often write all kinds of type-class magic to allow "iterating" over a tuple. For example, a Binary instance over a tuple wants to call "put" on each element.

Haskell's (Oleg's) HList is basically a tuple with iteration/list-like operations.

dragonwriter · on July 9, 2014

The distinction between "tuple" and "immutable list" doesn't make any sense outside of a staticly-typed language, since the only difference is what other values a particular value is type-compatible with.

dozzie · on July 10, 2014

Yes, of course it doesn't. You have just vanished whole Erlang. Or is it statically typed?...

blossoms · on July 9, 2014

NB a tuple of one item only requires a trailing comma, and a tuple of zero items is represented as ()

cabalamat · on July 9, 2014

> Mutables as default function/method arguments

It would really make sense to change the semantics of Python to fix this issue.

_ondq · on July 9, 2014

Change them how, to no longer have functions be first class objects? The behavior of mutable default arguments is clear if you know how Python treats function objects. Any "fix" would handicap the language.

yoo-interested · on July 9, 2014

> to no longer have functions be first class objects?

There are other dynamic languages with functions as first class objects which don't share the "mutable default arguments" gotcha.

But having said that, any change regarding this would break backward compatibility.

jcampbell1 · on July 9, 2014

Can you clarify?

    def foo(default_arg = []):

Why can't that just be shorthand for:

    def foo(default_arg = ParamNone):
        if default_arg == ParamNone:
            default_arg = []

How would that break first class functions?

dalke · on July 9, 2014

As a minor point, use "default_arg is ParamNone", since "==" probably won't do the right thing.

What breaks is something like:

  def foo(default_arg = slow_f()):
    pass

Under the shorthand gets turned into:

  ParamNone = object()

  def foo(default_arg = ParamNone):
    if default_arg is ParamNone:
        default_arg = slow_f()
    pass

This is fine, since everyone would know that the shorthand means to not put slow code there. Instead, people will start writing it as:

  _foo_arg = slow_f()
  def foo(default_arg = _foo_arg):
    pass

Of course, then what happens with:

  _foo_arg = slow_f()
  def foo(default_arg = _foo_arg):
    _foo_arg = 5

? Under expansion it becomes:

  _foo_arg = slow_f()
  def foo(default_arg = ParamNone):
    if default_arg is ParamNone:
        default_arg = _foo_arg
    _foo_arg.add(5)

This violates Python's scoping rules, because _foo_arg is now being used in local scope instead of global scope. Eg:

  >>> def f(x=None):
  ...   if x is None:
  ...     x = spam
  ...   spam = 3
  ... 
  >>> spam = 9
  >>> 
  >>> f()
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 3, in f
  UnboundLocalError: local variable 'spam' referenced before assignment

Which means you now need a new scoping rule, just to handle default parameters without making things more confusing.

It also turns what was a simple O(1) offset into a precomputed list into a globals() lookup for many cases.

syllogism · on July 9, 2014

In Python 3 there's the nonlocal keyword to deal with the scoping thing.

The default arguments thing is worse than a lot of the stuff Python 3 corrected.

dalke · on July 10, 2014

I don't see how nonlocal could fix this. Could you explain?

More specifically, given a better 'default arguments thing', how would you interpret:

    x = [2]
    def f(x=x*5):
        x.append(4)

With the earlier conversion it's:

    x = [2]
    def f(x=DefaultArg):
        if x is DefaultArg:
            x = x*5
        x.append(4)

This isn't going to work because the x inside of f() is different than the outside x, and you'll get the error message I mentioned.

If you add a nonlocal, as in:

    x = [2]
    def f(x=DefaultArg):
        nonlocal x
        if x is DefaultArg:
            x = x*5
        x.append(4)

then you'll get "SyntaxError: name 'x' is parameter and nonlocal".

What other solution are you thinking of?

syllogism · on July 11, 2014

Ah I see. Yeah nonlocal can't fix that.

ollysb · on July 10, 2014

Why not just memoize the result of the slow function?

dalke · on July 10, 2014

I think by your use of "just memoize" that I didn't explain myself well enough. I don't mean to focus on the slow aspect, nor even the function aspect. It could be

   def foo(default_arg = [0]*(256*256)):
     ...

and still get the same namespace issues.

Memoization is not always going to be an available solution. For example, it may be that slow_f() returns a stateless object, so can be reused, while slow_f(x) returns something stateful. You can think of my examples as either using default arguments as a single element memo, or using a module variable for the same. Both premised on the idea that the developer knows enough to make the right decision.

_ondq · on July 9, 2014

How would the above work for duck-typed lists, for example? How would the language "know" which types are mutable?

kyllo · on July 9, 2014

No, you could just evaluate the expression every time the function is called (like Ruby does) instead of evaluating it once at the time the function is defined (like Python does).

klibertp · on July 9, 2014

Remove a `static` keyword from C (used inside a function) while you're at it...

LyndsySimon · on July 9, 2014

Wildcard imports are OK, if and only if the module is specifically designed with this usage in mind.

Example: nose.tools

geertj · on July 9, 2014

Agreed on all counts. However I do find myself using mutables as default arguments sometimes because the generated documentation is clearer.

For example, this is a real method in one of my projects:

  def listen(self, address, ssl=False, ssl_args={}):
      pass

I like the way this turns up in the docs because it's immediately clear that ssl_args needs to be a dict. Otherwise I have to describe it in words.

jamiis · on July 9, 2014

You should not ever use mutable default args. The documentation benefit is drastically outweighed by the potential for nasty bugs. http://pythonconquerstheuniverse.wordpress.com/2012/02/15/mu...

kunstmord · on July 9, 2014

In case you're using Python 3, why not write ssl_args: dict, ssl: bool=False?

tribaal · on July 9, 2014

Well, that's throwing people implementing subclasses under the bus, IMO.

Why not just add @param annotations in your docstrings instead?

geertj · on July 9, 2014

> Well, that's throwing people implementing subclasses under the bus, IMO.

If they need to touch this argument in an overridden method and they don't know what they are doing, then yes.

> Why not just add @param annotations in your docstrings instead?

I'm using Sphinx and it renders them separately. I want the empty dict to show up in the function signature.

Peaker · on July 9, 2014

A single cryptic bug due to this practice will more than negate the minor doc readability benefits you get from that. And there will likely be many more than one cryptic bug.

There are other ways to emphasize it ought to be a dict/mappable. Change its name to be suffixed as "_dict", for example?

peter-row · on July 9, 2014

Also, not knowing what scope a variable has when creating closures.

doctoboggan · on July 9, 2014

I agree that bare excepts are bad, however they do not catch Ctrl-C. If you _do_ want to catch Ctrl-C you have to except a KeyboardInterrupt explicitly.

njharman · on July 9, 2014

This is untrue (just tested) for at least Python 2.7.x. under Linux, try ... except catches Ctrl-C.

shadowmint · on July 9, 2014

Mmm... you should always use 'if x is not None:' imo.

It's very common for libraries to make values evaluate to False, and very easy to get bugs if you just lazily test with 'if x'.

Sqlalchemy springs to mind immediately as one of the common ones where using any() and if x: is a reeeeeallly bad idea; there are plenty of others.

I'm pretty skpetical about modifying your coding behavior based on what libraries you happen to be currently using.

'If x' isn't your friend.

tomp · on July 9, 2014

Especially taking into account the more bizarre bugs (features?) of python:

    (bool(datetime.time(0)), bool(datetime.time(1))) == (False, True)

I always consider `if x:` a bug, unless x can only be a boolean. Furthermore, it seriously hinders readability and clarity of the code.

rmrfrmrf · on July 9, 2014

> I always consider `if x:` a bug, unless x can only be a boolean.

Oh, like this? ;)

   var eventA = new Date(), eventB = new Date();

   if (!parseInt((eventA - eventB) / 1000)) {
     console.log("these events occurred simultaneously");
   } else {
     // troll harder with confusing use of 'asynchronous'
     console.log("these events occurred asynchronously");
   }

Zancarius · on July 9, 2014

I was specifically looking through these comments for your example.

I got bit by this once, and it's certainly... strange (it's also surprising). It's considered "behavior consistent with the rest of Python" [1] (which I can agree with) even if it makes little sense in terms of immediate readability to someone who hasn't previously encountered it. Fortunately, the workaround is easy, and it is documented.

There's at least a couple of spats on the mailing list regarding this feature that are of interest to the curious or at least those who are interested in the history of such behavior.

[1] http://bugs.python.org/issue13936

CmdrKrool · on July 9, 2014

Agreed. Especially if you program in more than one language, trying to remember the subtleties of each one's collection of rules for "truthiness" is a fraught exercise. And isn't it Python people who like to say, "explicit is better than implicit"?

glimcat · on July 9, 2014

    "Explicit is better than implicit."

If you mean is not None, you should say is not None.

It's fast and readable and there are no "just be aware that" disclaimers to tack on afterwards.

LyndsySimon · on July 9, 2014

It depends.

If you're checking to see if that value is None, then yes - you should check that.

If you're merely checking if the value is truthy, then using "if x:" is completely legitimate.

keeperofdakeys · on July 10, 2014

I've found "if x" to be much less readable, especially when I'm looking at code written in a language I'm not familiar with. When I'm reading such code, I want to be 100% what something is doing, and not have to read documentation when avoidable.

pyfish · on July 9, 2014

I agree. The author brings this to light as well and warns about the implications of changing what is considered "truthy" in x.

msvalkon · on July 9, 2014

Check out Raymond Hettingers Transforming Code into Beautiful, Idiomatic Python talk on youtube [1].

Great talk on avoiding some of the common pitfalls new python developers step in. Exposes some nice language features.

[1]: https://www.youtube.com/watch?v=OSGv2VnC0go

gshubert17 · on July 9, 2014

And the slides are here:

https://speakerdeck.com/pyconslides/transforming-code-into-b...

sophacles · on July 9, 2014

The only thing I disagree with is "use nested comprehensions" thing. In my mind: x = [letter for word in words for letter in word]

is inside-out or backwards or backwards. I want the nested for being the less specific case:

   x = [letter for letter in word for word in words]

makes more sense in my mind.

(It's also my first answer to the "what're some warts in the your language of choice).

njharman · on July 9, 2014

I'm in the camp that if your list comp needs more than one for clause, it's complicated enough to be broken out into actual for loop.

_glsb · on July 9, 2014

Everybody, listen to this person!

jules · on July 9, 2014

Then it turns into this:

    x = []
    for word in words:
       for letter in word:
          x.append(letter)

Which in addition to being far more verbose and less readable, is also less efficient.

sophacles · on July 9, 2014

I'll tell you what I tell my team: it's barely more verbose, and the readability is up for extremely serious debate (I mean, everyone understands nested for loops, but the nested list comprehension thing is wierd... why is something that appears way way before the innermost "for" the same as something in that for loop's declaration?)

It may also be less efficient. When I'm shown numbers that the difference between the comprehension and the for loops are (in each specfic instance, or in aggregate for the program in question) is above statistical noise AND it's a significant factor in overall runtime (I won't ever worry about a millisecond when the runtime is 1s), then I'll gladly say: put them in.

Until then, just use the loops. Use of really strange language features that are surprising, not exactly idiomatic (this argument is common for this case) and not shown to be of actual benefit, are detrimental in a polyglot environment.

jules · on July 9, 2014

When I see a list comprehension I can see with a single glance what it's doing. Not so with the 4 line for loop. Comprehensions aren't a strange language feature in Python either...it's one of the central features of Python.

Don't use something until it's proven to yield a great benefit is a very conservative approach. That may be appropriate in some cases, but I'm very glad that I am not in such a team since that would be incredibly frustrating. I much prefer an approach where you go with the choice that's most likely the better one, even if it's not 100% proven better or not a big difference.

sophacles · on July 10, 2014

I'm talking strictly about multi "for" comprehensions. They just are too confusing to me and most of the people I've worked with ever. But we also use lots (most) python features fully, just that one has been the source of dozens of bugs in this one codebase, not to mention others I've worked on with other people. It is a shitty non-intuitive syntax.

Nested for loops, flatten(), various itertools functions and chained generator expressions all suffice, and I have yet to see them provide measurable slowdown to actual code compared to good algorithms and decent factoring. Like I said, I'll even use multi-for comprehensions if there is a measurable difference over nested for-loops.

Also, I think you are intentionally misrepresenting what I said - when I said don't use "weird stuff" I explicitly excluded idiomatic language things. That includes (for python) single for comprehensions. The multi-for comprehension is something I rarely come across in the wild despite it's long time existence in python - it's a weird one.

hueving · on July 10, 2014

I think you are trying to justify your strange preferences after the fact. How exactly were there bugs caused by nested for loops that you encountered? It's not like if you mess up the order it will actually run without throwing an exception. Nested list comprehensions are idiomatic python. It's really strange that you don't let your team use them because you are afraid of them.

_glsb · on July 10, 2014

Well, Google doesn't recommend it either (http://google-styleguide.googlecode.com/svn/trunk/pyguide.ht...), but I guess they are all bloody noobs or whatever.

I mean, single level comprehension is good. Nested list comprehension is OK only in most trivial cases. In my opinion, if I see how a person uses list comprehension, I can tell, what kind of person this is.

There are people who, for example, do this def all_is_okey_dorey(lst): return all([some_predicate_fn(x) for x in lst])

instead of this def all_is_okey_dorey(lst): for x in lst: if not some_predicate_fn(x): return False return True

and can live with themselves somehow.

Or there are people, who refuse to acknowledge the existence of anything besides Python 3.x and when forced to write in 2.x use list comprehension instead of iterator comprehension.

Thing is, the validity of using nested list comprehension depends not on the amount of for loops you have, but on the thing you want to do with the item. If it's just selection, then it might be ok. If you want to apply some kind of function to it, then it's most probably the case of trying to be too clever.

hueving · on July 12, 2014

If you think two loops is not a "simple case" of a list comprehension as Google suggests to use them for, perhaps you shouldn't be doing code reviews. It sounds like your team would be held down by your weak grasp of the language.

jules · on July 10, 2014

The only thing wrong with that list comprehension version is those [ ]

    all(some_predicate_fn(x) for x in lst)

Much better than the loop.

ghshephard · on July 10, 2014

It's much more readable, people won't make mistakes on it the way they do when trying to be too clever with their multiple-for statements in list-comprehensions.

Trying to pack too much on a single line is one of the sins of perl, and I'm happy to read python code that is comfortable being multi-line.

rbonvall · on July 10, 2014

For loops can often be avoided. I would write this particular example in one of these ways, that I think are readable:

    x = []
    for word in words:
        x.extend(word)

    from itertools import chain
    x = [letter for letter in chain(*words)]

    x = list(chain(*words))

YokoZar · on July 10, 2014

Wouldn't chain(*words) require unpacking all of words before feeding it into the chain function, storing a second copy of the word list in memory?

rbonvall · on July 10, 2014

Yes it would, but I don't care about these small efficiencies, say 97% of the time ;)

The lazy version in Python 3 would be this one:

    list(chain(*map(iter, words)))

For Python 2 one has to use itertools.imap instead of map.

njharman · on July 10, 2014

It's not all or nothing. One, last? for loop can be list comp.

jules · on July 9, 2014

Even clearer:

    x = [for word in words: for letter in word: letter]

This also has the advantage of being readable left to right without encountering any unbound identifiers like all other constructs in Python.

hueving · on July 10, 2014

As someone who spends a significant amount of time in python that is much harder to follow. There is an expectation for order in list comprehensions and what you wrote completely violates it.

cuu508 · on July 9, 2014

> write a list comprehension (...) code just looks a lot cleaner and what you're doing is clearer.

I know how to use list comprehensions, but often avoid using them and use the standard for loops. List comprehensions look nice and clean for small examples, but they can easily get long and become mentally hard to parse. I would rather go for three 30 character lines instead of one 90 character line.

_ondq · on July 9, 2014

Depends on how you want to program: imperative vs functional.

Personally I think list comprehensions are the most beautiful part of Python, though sometimes I use map() when I'm trying to be explicitly functional (I realize it's allegedly slower, etc).

Generally I think list comprehensions are cleaner and allow you to write purer functions with fewer mutable variables. I disagree that deeply nested for loops are necessarily more readable.

SEJeff · on July 9, 2014

Map is not allegedly slower, it is demonstrably slower.

$ python -mtimeit -s'nums=range(10)' 'map(lambda i: i + 3, nums)' 1000000 loops, best of 3: 1.61 usec per loop

$ python -mtimeit -s'nums=range(10)' '[i + 3 for i in nums]' 1000000 loops, best of 3: 0.722 usec per loop

Function calls have overhead in python, list comprehensions are implemented knowing this fact and avoiding it so the heavy lifting ultimately happens in C code.

colcarroll · on July 9, 2014

As you point out though, it depends on where the function is coming from:

$ python -mtimeit -s'nums=range(10)' '[str(i) for i in nums]' 100000 loops, best of 3: 2.57 usec per loop

$ python -mtimeit -s'nums=range(10)' 'map(str, nums)' 1000000 loops, best of 3: 1.88 usec per loop

$ python -mtimeit -s'nums=range(10)' 'import math' '[math.sqrt(i) for i in nums]' 100000 loops, best of 3: 3.25 usec per loop

$ python -mtimeit -s'nums=range(10)' 'import math' 'map(math.sqrt, nums)' 100000 loops, best of 3: 2.55 usec per loop

_ondq · on July 9, 2014

Fair enough. I said "allegedly" because I had never personally measured the performance difference.

Even though you could construe map as "half as fast" (or twice as slow) as the equivalent comprehension, I don't see a difference of ~1 usec making any difference in my code thus far. Good to know, though.

SEJeff · on July 9, 2014

Yup, for very large calculations, or certain use cases, it can make much larger differences. It all depends on your use case.

existencebox · on July 9, 2014

Deeply nested I'd usually agree, but if you end up with deeply nested comprehensions as a result, in either case, you're probably better off with a little bit of restructuring/splitting things more functions to at least make the nesting cleaner to read.

However, I think he was referring to if the conditionals/additional modifications needed to build your list get a bit excessive, so you'd have a like... [dostuffto(A) for A in alsodostuffto(LIST) if conditional(A)] (but with more complex operations at each step).

Granted at that point you can argue that you should do just as my example shows and put the "dostuffto" into more encapsulated functions, but sometimes that doesn't seem like the right choice.

meowface · on July 9, 2014

I'm a bit torn on it.

In a case where you need to do a lot of nested appends, I've found that even a long list comprehension can be easier to read. You just have to be sure to properly indent it and break it up into multiple lines. My rule is that every extra `for` starts a new line, and sometimes moving the predicate to its own line when it's too long, too.

cuu508 · on July 10, 2014

For a concrete example, I was just recently converting a list-of-dicts into a dict-of-dicts. Here's an isolated snippet:

http://pastebin.com/8q46bK0v

To my eye, the list comprehension version is reasonable. But I like the imperative style better: it uses the most basic language features and at a glance you can tell what it does. My favourite is the dictionary comprehension version, it's the shortest but still conveys clearly what it's doing.

meowface · on July 11, 2014

Yeah, I prefer the dict comprehension.

I use dict comprehensions quite frequently in my own code as well.

thinkpad20 · on July 9, 2014

I agree. I try to use list comprehensions when I can, but the fact is that as soon as you have to do any sort of logic in your map, it quickly gets unwieldy. I'd much rather use a functional style in many of these cases, but python's syntax and crummy lambdas (as compared to what would be natural to do in say JavaScript or Ruby), means that in the majority of cases I find myself using an imperative style.

moretti · on July 9, 2014

I always struggle to understand why a list comprehension

  alist = [foo(word) for word in words]

is considered more Pythonic than map

  alist = map(foo, words)

dagw · on July 9, 2014

List comprehensions are more flexible and easier to read in the non-trivial case. Sure in the trivial case you show a map might be considered neater, but just adding a filter is enough to make the list comprehension more readable in my mind. Python's lambda syntax also makes using maps and filters quite ugly.

Compare:

   alist = [x**2 for x in mylist if x%3==0]

to

   alist = map(lambda x: x**2,filter(lambda x: x%3==0, mylist)

Plus python also has set comprehension and dict comprehension, which share essentially the same syntax.

zo1 · on July 9, 2014

Don't forget generator comprehensions which are almost identical to list comprehensions. But instead of evaluating the whole result set and returning it, they return a generator that you can then iterate over. Very neat stuff.

Peaker · on July 9, 2014

In Haskell, the latter looks like:

  alist = (map (**2) . filter (\x -> x `mod` 3 == 0)) myList

Or:

  alist = (map (**2) . filter ((== 0) . (`mod` 3))) myList

If alist is a transformation, and not applied to myList, it's cleaner:

  alist = map (**2) . filter ((== 0) . (`mod` 3))

Though Haskell also has list comprehensions, with more "mathy" syntax:

  alist = [x**2 | x <- mylist, x `mod` 3 == 0]

column · on July 9, 2014

hi, you are lacking a closing parenthesis in your second example.

blossoms · on July 9, 2014

    alist = [foo(word) for word in words if word.startswith('a')]
    alist = map(foo, filter(lambda word: word.startswith('a'), words))

Which reads better?

LyndsySimon · on July 9, 2014

I don't use FP practices in Python much, but if I did I'd define the filter outside the map, like so:

    begins_with_a = lambda x: x.startswith('a')
    alist = map(foo, filter(begins_with_a, words))

nilved · on July 9, 2014

Even with named functions, Python's use of global functions instead of methods for iterators force you to read the expression from the inside out. I think Lisp languages nailed this with their threading macros, which allow natural left-to-right reading, but Ruby's strategy is better than Python's, too, while maintaining very similar syntax.

    ;; clojure
    (let [begins-with-a #(.startsWith % "a")
          foo #(do-some-stuff-with %)]
      (-> words (filter begins-with-a) (map foo))))

    # ruby
    words.filter { |e| e.start_with?(?a) }.map { |e| foo(e) }

It doesn't really make sense for things like `len` and `map` to be global functions in object-oriented languages.

TheLoneWolfling · on July 9, 2014

Those are not equivalent. You need to wrap the map in a list() call.

omaranto · on July 9, 2014

In Python 3 you'd be right, but I think people still call mostly that language "Python 3" and mean Python 2 when they say "Python".

RHSeeger · on July 9, 2014

Admittedly, my thought would be to chain the calls, not nest them:

   alist = words.filter(lambda word: word.startswith('a')
                .map(foo)

That being said, my Python is limited and I don't know it filter/map are available as methods of a list. At the end of the day, there are cases where list comprehensions are much cleaner/understandable... and cases where the reverse is true.

CmdrKrool · on July 9, 2014

> I don't know it filter/map are available as methods of a list.

They're not. Which is a shame in my opinion, because as you've written it you can clearly read the operations in the order they happen, ie. filter followed by map. Instead, you do have to do the second line of what blossoms wrote above.

And I don't think it's possible to write a list comprehension that reads in execution order, either :(

lolpep8 · on July 9, 2014

Map and filter, of course; less syntactical noise, simple function semantics, and plenty of precedent and equivalents in all other languages.

bluecalm · on July 9, 2014

It's not like list comprehensions lack equivalent in other languages. Let's take Haskell for example. Python lists comprehension could use a where clause from Haskell though so one could really pack everything into a one-liner :)

(as it is now, list comprehensions requiring various references to result of a function call evaluate the function each time it's used)

metaphorm · on July 9, 2014

you can consider a list comprehension to be a sort of literal representation of the result of map. I think literals have a benefit for code readability and should be used when feasible (i.e. the literal is compact enough).

the other reason its considered more idiomatic in Python is just because the compiler does a better job of parsing and optimizing list comprehensions.

XorNot · on July 9, 2014

You use list comprehensions in other places you wouldn't use map. The difference in text size as you posted is minimal, but the list comprehension - once you're used to reading them - tells you exactly what's going on.

Where as map could be anything. It could be redefined for all you'd know.

a3n · on July 9, 2014

Don't know if this is why, but the list comprehension takes an expression at the "foo(word)" location, and is therefore more general than map, which requires a function. The comprehension in that case is simpler.

  words = ['w1', 'w2', 'w3']

  [word[1] for word in words]

  ['1', '2', '3']
  
  map(lambda x: x[1], words)

  ['1', '2', '3']

I like looking at the list comprehension better. The use of lambda looks forced in this case. I also imagine there's a penalty for calling the (anonymous) function in map.

icebraining · on July 9, 2014

In that case I'd use

  map(itemgetter(1), words)

a3n · on July 10, 2014

Instead of my map example, or instead of a list comprehension in general?

icebraining · on July 10, 2014

Instead of a list comprehension equivalent to your example.

TheLoneWolfling · on July 9, 2014

Those are not equivalent.

You cannot index a map object, for example.

ufo · on July 9, 2014

I think the main reason is the lambda syntax. List comprehensions also let you do filter and nested loops.

ggchappell · on July 9, 2014

This is a nice little article, but I wonder about some of the design decisions. In particular:

> The simplifications employed (for example, ignoring generators and the power of itertools when talking about iteration) reflect its intended audience.

Are generators really that hard? (Not a rhetorical question!)

The article mentions problems resulting from the creation of a temporary list based on a large initial list. So, why not just replace a list comprehension "[ ... ]" with a generator expression "( ... )"? Result: minimal storage requirements, and no computation of values later than those that are actually used.

And then there is itertools. This package might seem a bit unintuitive to those who have only programmed in "C". But I think the solution to that is to give examples of how itertools can be used to create simple, readable, efficient code.

omegote · on July 9, 2014

Point 3 of the iteration part is not good advice. With [1:] you're making a copy of the list just to iterate over it...

omaranto · on July 9, 2014

You're right. I still wouldn't recommend looping over indices, but rather using itertools.islice(xs, 1, None) instead of xs[1:].

zo1 · on July 9, 2014

You're right... But at least it's a shallow copy.

msl09 · on July 9, 2014

I find that the testing for empty is a bit misguided if you want to be rigorous with types. for instance:

     >>>def isempty(l):
     >>>    return not bool(l)
     >>>isempty([])
     True
     >>>isempty(None)
     True

If embedded within your program logic this kind of pattern can waste precious time with debugging. You can catch your errors much more quickly if you are explicit with your comparisons.

elandybarr · on July 10, 2014

Failing to use join is a big one.

I have seen countless instances of people writing the logic to output commas in between items (like for CSV export) that they want to concatenate into a string.

    header_line = ','.join( header for header in headers )
    csv_line    = ','.join( str(dataset[key]) for key in dataset.keys() )

Example for a case of a dictionary mapping a string to a bunch of numbers.

marcinw · on July 10, 2014

Proper Python would use the csv module for this operation, as your CSV export would break if `header` or `dataset[key]` contains a comma.

elandybarr · on July 10, 2014

Yeah. This was a special case for a one line CSV that was requested by my client. It was a dictionary with a bunch of single measurements.

cpenner461 · on July 10, 2014

Any reason not to do the first one more compactly?

     ','.join(headers)

elandybarr · on July 10, 2014

Oops. Good catch!

rcfox · on July 10, 2014

You should just use dataset.values() (or itervalues if you're using Python 2) instead of iterating over the keys, and then looking them up.

andreasvc · on July 10, 2014

You're relying on dataset.keys() being in the same order as headers. Really, you should use the csv module, even for one-line CSV files.

yeukhon · on July 9, 2014

> PEP 8 is the universal style guide for Python code.

> If you aren't following it, you should have good reasons beyond "I just don't like the way that looks."

Core dev and Guido have said many times PEP 8 are not holy.

See https://mail.python.org/pipermail/python-dev/2010-November/1...

In essence, a "stupid reason" like "I don't like it" is a valid reason not to adopt PEP 8.

In fact, I don't like the PEP 8 recommendation on docstring. I like Google's docstring (aka napoleon in Sphinx contrib-module).

http://sphinxcontrib-napoleon.readthedocs.org/en/latest/exam...

orf · on July 9, 2014

Interesting read, a couple of things I noticed though:

1. In "Checking for contents in linear time" both examples are the same. Perhaps remove the list entirely in the second example

2. Itertools.islice helps if you need to slice a list with a bajillion elements

gr3yh47 · on July 9, 2014

# Avoid this

lyrics_list = ['her', 'name', 'is', 'rio']

words = make_wordlist() # Pretend this returns many words that we want to test

for word in words:

    if word in lyrics_list: # Linear time

        print word, "is in the lyrics"

# Do this

lyrics_list = ['her', 'name', 'is', 'rio']

lyrics_set = set(lyrics_list) # Linear time set construction

words = make_wordlist() # Pretend this returns many words that we want to test

for word in words:

    if word in lyrics_list: # Constant time

        print word, "is in the lyrics"

the second example should read ... if word in lyrics_set: ...

darkxanthos · on July 9, 2014

Just to point out, if you really will have a tiny list and that's knowable, it's possible this example would be best with a straight linear time check. It could be fewer operations than hashing a string and looking it up. Practically pedantry though.

constantine_ · on July 9, 2014

Thanks for pointing out this mistake. It's now fixed.

wodenokoto · on July 9, 2014

I use python for datamining, and most of my work is done exploring data in iPython.

> First, don't set any values in the outer scope that > aren't IN_ALL_CAPS. Things like parsing arguments are > best delegated to a function named main, so that any > internal variables in that function do not live in the > outer scope.

How do I inspect variables in my main function after I get unexpected results? I always have my main logic live in the outer scope because I often inspect variables "after the fact" in iPython.

How should I be doing this?

hyperion2010 · on July 9, 2014

If you are using the interpreter directly then that particular bit of advice is hard to follow since you basically live in global all the time. For that reason I would say that this advice applies mainly to .py files.