Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Anti-Patterns in Python Programming (lignos.org)
320 points by aburan28 on July 9, 2014 | hide | past | favorite | 237 comments


The more frequent and dangerous pitfalls are, in my humble opinion:

- Bare except: statements (that catches everything, even Ctrl-C)

- Mutables as default function/method arguments

- Wildcard imports!


Couldn't agree more! One of my all time new python interview questions gets a surprisingly large number of developers.

Given a function like:

    def append_one(l=[]):

        l.append(1)

        return l

What does this return each time?

    >>> append_one()

    >>> append_one()

    >>> append_one()


The l (lowercase L) and the 1 (one) look really similar. Could that be the cause of some confusion? Of course, the function name helps, but most developers have learned not to trust function names to be an accurate description of what the function does, especially in tricky interview questions.

Still, I'd change this to something like:

    def append_five(l=[]):

        l.append(5)

        return l
It tests the same thing (knowledge of how default parameters work), but without the confounding problem of similar-looking characters. Of course, syntax highlighting would help the applicant out.

All of that being said, I still don't doubt that many developers don't know what they should about default parameters.


I'm not a Python expert, but iirc from various blog posts the "l" variable does not get reset between function calls which will cause undesired behavior. So calling the function 3 times without argument would produce a list of size 1,2, and 3 with the third call rather than 3 lists of size 1. Can any Python guru's confirm?


The expression presented in the parameter list is only evaluated once, and that is when the method is defined. The confusion is that people assume the expression is evaluated every time the method is called.


The confusion is that people assume the expression is evaluated every time the method is called.

Because that's how it works in a lot of other languages, such as Ruby and Javascript.


I doubt that's the only reason. I fell for it myself at first without ever seeing a line of Ruby. It initially feels intuitive, and that's why I think most fall for it.


I think it's because the arguments are bound when the function is called. It's just natural that you'd expect the default values to also be bound at the same time.


Yes that is correct. The default value gets created when the function is interpreted ("compiled").


> The default value gets created when the function is interpreted ("compiled").

No. The default value gets "created" (the expression is evaluated and stored) when the def statement is executed. Take the following example:

  In [1]: def foo():
     ...:     def append_five(l=[]):
     ...:         l.append(5)
     ...:         return l
     ...:     return append_five
     ...:

  In [2]: a = foo()

  In [3]: b = foo()

  In [4]: a()
  Out[4]: [5]

  In [5]: b()
  Out[5]: [5]

  In [6]: _4 is _5
  Out[6]: False
We only wrote one function definition, but multiple lists are created. (They are created when the "def append_five" definition executes, during the execution of foo.)


I thought that was what he meant. Is there any sharp distinction between "interpreting" and "evaluating" in python that I am unaware of? I've always used the words more or less interchangeably. But now that I think about it that might be a little naive since I have no idea how it works under the hood


You could say that interpreting is first parsing and second executing/evaluating. The parser tokenizes and does a small amount of optimization such as ignoring unassigned values.


The parent wrote "compiled", which is certainly more incorrect than either "interpreted" or "evaluated."


The key is object mutability. A list type is mutable and a tuple type is immutable.

If the candidate correctly deduces what will happen, I'll ask them to write a bug-free version, which looks like one of the below:

def append_one(var=None):

    var = var or []

    var.append(1)

    return var

def append_one(var=None):

    if var is None:

        var = []

    var.append(1)

    return var

Mutability is a very subtle but very important concept to understand in python. Everyone who uses python for non-trivial code should know it well: https://docs.python.org/2/reference/datamodel.html


> The key is object mutability. A list type is mutable and a tuple type is immutable.

I don't think the question has much to do with mutability, it isn't surprising to me nor would I imagine most programmers that a list is mutable, that's very common.

The surprising part of this question is that the default value of 'l' continues to exist outside the lexical scope of the function, the expected behavior is that the value of 'l' is initialized at function call time and is garbage collected after each call. As it sits, using default values in python is sort of like defining a global that only has a named reference inside the function block, which is very strange.


It has something to do with mutability, because if an object is immutable, the behavior of Python matches what the naive developer expects. It's only mutable objects that break those expectations.

Don't even get into unexpected behavior in classes:

    In [1]: class A(object):
       ...:     l = []
       ...:

    In [2]: a, b = A(), A()

    In [3]: a.l.append("Something")

    In [4]: a.l
    Out[4]: ['Something']

    In [5]: b.l
    Out[5]: ['Something']

    In [6]: class B(object0:
       ...:
    KeyboardInterrupt

    In [6]: class B(object):
       ...:     l = None
       ...:     def __init__(self):
       ...:         self.l = []
       ...:

    In [7]: c, d = B(), B()

    In [8]: c.l.append("Something")

    In [9]: c.l, d.l
    Out[9]: (['Something'], [])


The other scoping issue in python that always struck me as strange is that loop variables aren't scoped to the loop, they continue to exist after the loop completes. I can see the logic for this feature even if I don't agree with it, but what I really don't get is that the loop variables are not defined if you iterate over something that is empty:

   >>> for item in [1]:
   ...   print item
   1
   >>> item
   1

   >>> for i in []:
   ...   print i
   
   >>> i
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   NameError: name 'i' is not defined
I would expect i == None. That oddity makes it dangerous to use the feature unless you're really careful (e.g. using a for - else construct).


Then there's for-else:

    In [1]: for i in []:
       ...:     pass
       ...: else:
       ...:     print 'Else!'
       ...:
    Else!

    In [2]: for i in []:
       ...:     break
       ...: else:
       ...:    print 'Else!'
       ...:
    Else!

    In [3]: for i in range(2):
       ...:     break
       ...: else:
       ...:     print 'Else!'
       ...:

    In [4]: for i in range(2):
       ...:     pass
       ...: else:
       ...:     print 'Else!'
       ...:
    Else!
The syntax could be interpreted as:

  if len(l) == 0:
    print "Else!"
  else:
    for i in l:
      pass
The "catch cases where a `break` is triggered" case isn't common enough for this syntax feature to be encountered very often, leading to confusion when people come across it (though at least it's not a bug where a common use-case has weird behavior to new-comers).


> but what I really don't get is that the loop variables are not defined if you iterate over something that is empty

if you conceptualize how a for-loop has to work as a while-loop using Python's iterator protocol (which is the only way the iterator protocol itself makes sense), it seems pretty intuitive.

That is, this:

  for item in items:
      ...1
  else:
      ...2
becomes, approximately:

  try:
      while True:
          __hidden_iter = items.iter()
          try: 
              item = __hidden_iter.next()
          except StopIteration:
              raise __NormalLoopExit
          ...1
  except __NormalLoopExit:
      ...2
If you have an empty loop, the first assignment doesn't complete (instead raising StopIteration in evaluating the right side, which raises the notional exception __NormalLoopExit, which invokes the else: clause, if any) so the variable never gets around to being created.


> if an object is immutable, the behavior of Python matches what the naive developer expects

If the object was immutable then append wouldn't work. That's hardly matching expectations.


Most immutable objects don't have methods that would mutate the value, but fail because the object is immutable...

I guess the clarification to what I was saying is that, in the simple case (integers, strings, None) the objects are immutable. It's only getting into cases where the value of the object itself is mutable, that you run into issues. If all objects (or all objects 'allowed' as default values) were immutable, then this behavior would not trigger.

So saying that mutability has nothing to do with it isn't entirely true. It's the immutability of the types of values used in most simple cases that hides this issue from developers until they run into a more complex case.


Read my post that has the "correct answers" which show you how to do it. The key is setting the default to None and then doing something like:

if val is None: val = []

or the more idiomatic python way:

    val = val or []


I believe the former is more idiomatic, but I don't have a reference.

You want to explicitly check against `None` so that you're not overwriting all falsey values of `val` - even though you should generally try to enforce argument types, your second example would cause unexpected behavior in some cases, particularly those that have non-falsey 'default' assignments


I'm well aware of that way to do it, but it doesn't excuse a different way being unintuitive.


Why? You would just get a new list back.


I can see why you might think so, but remember the Zen of Python is to have only one obvious way to do something.

    >>> [1,2,3] + [4,5]
    [1, 2, 3, 4, 5]
Thus appending should do something different than addition.

    >>> x = [1,2,3]
    >>> x.append([4,5])
    >>> x
    [1, 2, 3, [4, 5]]


I'm sorry, but I don't understand why you would think this as unexpected behaviour? For the class A, the list l is a class-level attribute, hence it can be referred via either a or b objects, but for class B, after initialisation, l is an object attribute, so it is different for both c and d.


It's not the concept that can be confusing, it's the syntax python chose.

In most of the languages I'm familiar with, there are very clear syntax differences when working with class attributes. For example, in many languages class attributes have to be accessed via the class name instead of from an instance of the class making it clear to the programmer they are working with a class attribute, e.g. MyClass.myClassVariable not myInstance.myClassVariable. Additionally, the way you define class attributes in python is the way you define instance attributes in many languages, which just adds to the confusion. e.g. in Java or C# you can define class variables directly in the class body, but an explicit 'static' keyword is needed, undecorated definitions are assumed to be instance variables.

Finally, I think the definition of class B above is a little more nuanced, class B has both a class attribute named l AND an instance attribute named l.

B.l == None and B().l == []


Ah, gotcha!

It's been a while since I've done major OOP coding in any language other than Python, so I'm a little rusty. The issues you raise are perfectly legitimate and would be understandably confusing to newcomers to the language. :)


In Python everything is an object, including a function. The default value isn't a global, it belongs to the function.

I wonder if people who weren't exposed to languages which work differently ala C++ would be as surprised?


I'm not a Python dev, but I've been meaning to learn for a while. So this is really interesting stuff. A few questions, if you don't mind.

I understand mutability and immutability in other languages (and I gave your link a quick read to make sure there weren't any weird Python-specific rules), so I understand how the list can change and still be the same object, but a tuple or string would not. But why does that mean that the default parameter object remains in existence throughout all calls, instead of being recreated each time it is called?

Is there a reason for this being the default behavior? It seems like the majority of the time you would want to use a default parameter, you'd want it to behave like your bug-free examples.


Think of the default parameter values as arguments to the initializer for the function object. If you passed a list into the constructor of a class, you wouldn't be surprised that if you modified the list outside the class that it would modify the same list inside the class.

While that explains how it works, I actually completely agree with you. This is surprising behavior and, in a language that prides itself on not being surprising, seems, well, surprising.

I have to wonder if performance isn't the big reason for it. If your default is [], it isn't a big deal to re-evaluate, but if your default is get_default_cities_from_slow_web_service(), having that re-evaluated on every function call would be catastrophic. Given the choice between two negatives, the choice they made is probably reasonable.


You pretty much nailed it right there.

Before I ever ask this question (I do a lot of tech interviews sadly) I always ask the candidate about object mutability vs immutability. Almost everyone knows the textbook answer, and only a few know the actual implications of it. This tests which they know :)

Default kwargs of a function are defined at function definition. However, they are only in scope, for the scope of said function. It is a weird but important subtle difference.


I think only the second is truly bug-free. The first only does what the user expects if they pass in a non-empty list:

  my_list = []
  append_one(my_list)
  # my_list didn't get anything appended to it
This shows up another subtle trap related to the "truthiness" (or falsiness in this case) of things like the empty list.


Since append doesn't return a value, how about:

  def append_one(var=None):
      return (var or []) + [1]
Would this take longer and/or use more storage for long lists as vars?


When you use + on two lists, a new list is created, and elements from both are copied into the new one. Whereas the append operation modifies the list, and simply adds a value. Keep in mind that a python "list" is really like a C++ vector, so while sometimes append operation sometimes allocates a new array, and copies all the values, in general is O(1). The add operation is O(n).

And besides all that, there is nothing wrong with doing an append on one line, and returning the variable on the next. It's clear and readable.


I like this, very elegant actually.


I'd argue that the key difference from other languages is (re)assignment rather than (im)mutability.


Yes, that's correct. The default value is only interpreted once, when the `def` statement is called. After that point, it's completely mutable. You have to see Python functions as objects and default parameter values as object variables.


The problem is not the existence of the object variables, but that they are in such an unfortunate place. The rest of the parameter list is declaring fresh local variables. It's inconsistent that the left side of the equals sign is per-invocation, and the right side is per-def.


Not the op, but I'd accept the confusion response of: [[]] [[],[]] [[],[],[]]

because the behavior is the same, whether or not they misread an 'l' as a 1.


Python doesn't seem to agree with you :^)

  In [1]: a = []

  In [2]: a.append(a)

  In [3]: a
  Out[3]: [[...]]

  In [4]: a[0]
  Out[4]: [[...]]

  In [5]: a[0][0]
  Out[5]: [[...]]

  In [6]: a[0][0][0]
  Out[6]: [[...]]

  In [7]: a[0][0][0][0]
  Out[7]: [[...]]

  In [8]: a.append(a)

  In [9]: a
  Out[9]: [[...], [...]]

  In [10]: a[0][1][0] is a
  Out[10]: True

  In [11]: id(a)
  Out[11]: 4547140064

  In [12]: id(a[0][1][0])
  Out[12]: 4547140064


Yeah, the whole infinite loop thing. Wasn't fully thinking when I wrote my reply. Good catch.


I would caution you not to interview on things you would not be happy to see in your code base.

In my experence your much better off with people that look at odd syntax and say, "I don't know what that does" vs those who do.


Well, this is pulled from a list of common python errors.

Using the default value in some capacity isn't that uncommon... Though maybe you were speaking to a more general case? for example, decoding a an obfuscated C file.


I've seen this trotted out time and time again, and at least in this simplified form it's a red herring. If you're going to mutate the argument, it doesn't make sense to give it a default value. If you're going to return a modified form of the input you need to make a copy of it. Doing both is simply absurd.


Disagree. Would say it's a decent violation of expectations for the same instance to be passed into every invocation. Of course, the counterargument is 'know your tools,' which I'm partial to, but the fact that this pops up is an indication it is counterintuitive.


I actually agree it's counterintuitive. But this particular example makes no sense, nobody should be writing real-world code that looks like this in the first place. Either modify the original or return a copy, don't try to do both.


Wow, that is really ugly semantics. Here are some notes of mine on how hard R works to avoid exposing this sort of aliasing/mutability issue to the user: http://www.win-vector.com/blog/2014/04/you-dont-need-to-unde...


Yeah i really don't understand why this is just assumed to be a common 'gotcha' to be recognized and avoided by every competent python programmer. What exactly does the python spec specify as the desired behavior here? If you have this 'broken stair' that everyone should just know to step over, shouldn't somebody actually fix the stair!?

I know python is not unique in having warts like this, but it's pretty b.s. in general that unexpected behavior is just thought to be okay, especially in a language meant to be very accessible, and most especially since it's being used as a perfectly valid metric for disqualifying new python programmers from employment.


It's not broken when you understand that functions are objects and default parameters are just members of those objects. Each time the function is executed you get local vars that point to these object members. If one is a mutable type, any changes you make to it will then obviously persist.


It leaves me wondering, are the parameters also scoped to the class(seeing as they're declared at the same time)? Wouldn't this cause an issue with concurrent access to the function?


In Python there's no such thing, because GIL. Maybe in JPython.


At what level would you test an interviewee with this kind of question: Python guru, Python expert, Python ninja, Python rockstar, or merely "is familiar with Python"? Your example is a very common gotcha that has been covered ad nauseam, but IMO it's still the kind of bug that would be caught immediately in code review and is very easily fixed.


I thought we were past using trick questions like this anyways! Since, you know, it would be easy for an experienced yet anxious programmer to get tripped up on this, but someone who just browsed "python interview questions 101" to breeze on through. Also it selects against experienced multi-language developers, since language-specific quirks like this are not generally useful information to keep front-loaded, but are trivial to become re-familiar with in a work environment, or even gasp learn for the first time from a co-worker or helpful article.

If the industry as a whole cared about evidence-based, non-superstitious, non-monoculture-reinforcing hiring practices, we'd realize that tripping people up and judging programming capability based on minutia is as unfair as it is self-defeating.


I don't use this as a trick question. I ask them to describe mutable vs immutable objects and gotchas. Then I write this function and ask them to describe in excruciating detail what it does, why, and how.

It is simply a easy way to gauge a candidate's proficiency with the language. It also helps if they know that this is a problem. You'd be shocked to know a lot of people on the market for jobs writing python don't get this question correct, but the smart ones often do when talking through it even if they didn't originally.


I think the idea is to see whether the interviewee is a kind of person who always googles and reads on "gotchas of language X" whenever he/she learns X.


Possibly the most interesting anti-pattern I saw was:

a_list_of_words = "my list of words".split(" ")

I never enquired why, since there were bigger issues in the code e.g. "unit testing" by running the code, taking the result and putting it as the check value. By running repr(value), copying out the string then comparing self.assertEqual(repr(value), '[<Object1: unicode_value>, ...]')


I do that all the time in the interpreter, especially when slicing pandas DataFrame objects, e.g.:

    df_subset = df['date buyer nwidgets'.split()]
That is far easier to type than the explicit list, with all its punctuation. Now, it's definitely weird that they did a `split(" ")` rather than just using the default, but the idea is the same.

I do try to strip stuff like that out before I put it into a script, replacing it with the explicit list, but I'm never sure if that actually improves anything. It's not as if the explicit list is any easier to read.


It's not weird, it's wrong:

    In [1]: "a    string     r".split(" ")
    Out[1]: ['a', '', '', '', 'string', '', '', '', '', 'r']

    In [2]: "a    string     r".split()
    Out[2]: ['a', 'string', 'r']


In their case though that wouldn't have been a problem as each word was split on a single space.


Split without argument is equivalent to:

    >>> filter(None, " quick   hack for  split".split(" "))
    ['quick', 'hack', 'for', 'split']


Yeah, or re.split(r'\s+', ...), or something - the issue here is that you need to know about this.

From the comment a few levels up I understood that the code which used the str.split with " " argument didn't signify that someone who written it knew about its semantics. If he did and it was really what was intended then ofc it's completely ok, but if not, it can easily lead to bugs.

For example, if the user is required to input several ints separated with whitespace, this:

    map(int, input_str.split())
will rise only in expected cases, while this:

    map(int, input_str.split(" "))
can lead to rejecting correct input just because someone pressed space twice. It's very frustrating for the user, too, because whitespace are hard to spot visually.

So, I don't know if this qualifies as antipattern, but I think if I saw .split(" ") instead of .split() in the code I'd at the very least expect the comment explaining why it's used.


I don't mean to be pedantic, but a list (I am assuming df is a list) requires an int.

(That sentence I wrote about using hashable types need not apply, sorry!)

If you ran that code, you would get this error:

    TypeError: list indices must be integers, not list


That's a pandas dataframe (idiomatically denoted df), not a list. It has funky slicing properties, and he's selecting columns of the dataframe in a perfectly valid way.


This is actually useful: you may want to experiment with different list_of_words in the future and typing the words between [" ", " ", " "] is time consuming. It's also less readable.


How is this a real "anti-pattern"?

Inefficent, or bizarre way to do it maybe.

Anti-pattern is supposed to mean something more, though.

In this case there are no adverse effects and no ambiguity -- so, I guess the programmer was just lazy to construct the list.


perhaps that line was written by someone used to Perl, where they would have had

@a_list_of_words = qw/my list of words/;

there


Or rubyst:

  a_list_of_words = %w{my list of words}


You'd be sure if they wrote:

    >>> qw = str.split
    >>> qw('my list of words')
    ['my', 'list', 'of', 'words']


Why even use a list here? Tuples are for immutable/constant data.

a_tuple_of_words = ("my", "tuple", "of", "words")

or

a_tuple_of_words = "my", "tuple", "of", "words"


...because it's a list? Tuples were supposed to have a structure (at least that's what all the rest of the world thinks of them), so iterating through combination of apples, cars and languages makes no sense whatsoever.

But yes, Python misses entirely the point of tuples, treating them as read-only lists.

http://dozzie.jogger.pl/2014/04/11/python-tuples-the-useless...


> Tuples were supposed to have a structure (at least that's what all the rest of the world thinks of them)

No, a structure is, you know, a structure -- what C calls a struct. Python calls it a namedtuple. If some people call it just a tuple, well, that's a difference in terminology, but it doesn't mean Python is confused about the concepts, it's just using terminology you're not used to.

Also, if we're going to be pedantic about the meaning of data types, your blog post is wrong about lists. You say "position in the list doesn't matter", but that means ordering doesn't matter, and an unordered collection of similar objects is a set, not a list. Python makes this distinction clear: a list is ordered, a set is not.


> [...] that's a difference in terminology, but it doesn't mean Python is confused about the concepts

No, it menas exactly this. The term "tuple" and its use predates Python. Sorry, no banana.

> [...] your blog post is wrong about lists. You say "position in the list doesn't matter", but that means ordering doesn't matter

Oh, so what's the difference in meaning of element True on position 1 and element True on position 20? Position in list doesn't matter if we're talking about meaning of the elements.


> The term "tuple" and its use predates Python.

References, please? And not mathematical references; programming references. C was using the keyword "struct" long before Python to refer to what you are calling a tuple.

> what's the difference in meaning of element True on position 1 and element True on position 20?

The fact that the index is 1 instead of 20. Both elements have the same type, and might well refer to the same property of some sequence of things; but the index being 1 instead of 20 means the element True is describing that property relative to the first item in some sequence, instead of the 20th item. That's why position in the list makes a difference: the ordering of the items, as well as the type of the items, carries information.

(Of course, in Python the list items don't even have to be of the same type; but most uses of Python lists in practice that I've seen do assume that all the elements are "the same kind of thing".)


> References, please? And not mathematical references; programming references.

ML has had tuples several decades before Python existed.


I'm not sure what you're getting at here, or what you are expecting tuples to be like. They can have as much "structure" as you need--they're just a collection.

Lighter-weight, immutable collections have a use case. The code in OP appears to be one where it makes sense. I follow the rule where variables are mutable IFF they need to be mutable.


Tuples by pythonists are used as they were mere lists, just immutable. This is clearly displayed by Python's own interface.

For the rest of the world, tuples are not immutable lists. They are tuples, i.e. collections of "objects" that could share nothing about their type. Tuples often are not even iterable! (Erlang, Haskell)

The fact that tuples in Python can have as much structure as one wants is derived from dynamic typing, not from the tuples' nature. The same you could say about Python's lists.

This is a really subtle issue. It takes to know more languages to see it clearly.


Have you looked at named tuples? They shipped in the python standard library sometime in the last few years (they are at least in 3.3) and are clearly intended for storing structured data.

A typical rule of thumb in Python land is that heterogeneous data probably belongs in a tuple, so practice goes a little further than immutable lists.

I think you could improve your demonstration of the usage in the standard library by examining a random selection of usages to try to find out what is typical. But maybe you already looked at more than you talk about in the article (and I understand that this might not be an interesting use of your time).


No, I haven't looked at them. Python 2 has them since release 2.6, so it's out of my reach for any practical purpose at the moment (I need to preserve compatibility with Python 2.4).

> A typical rule of thumb in Python land is that heterogeneous data probably belongs in a tuple, so practice goes a little further than immutable lists.

The problem with Python tuples is it's two things mixed: immutable lists and a container for heterogenous data. It's the same situation as JavaScript's objects.


If it's so subtle, does it matter? This sounds like you just have a problem with the word "tuple" applied to an object that behaves differently from tuples in a statically-typed language.

Would you feel better if they named it "ImmutableList" instead?


Can't speak for GP, but I would [feel better with that name].

(Although I agree with you that statically-typed-language-tuples don't seem to make sense in Python.)

But hey... Python's weird choice of how to name the ImmutableList could be worse, right?

For example, someone could be malicious enough to call their general-purpose associative array a "hash", just because a hashmap (note: not a hash) is a good implementation for large associative arrays. Wow, that'd be hilariously misleading, wouldn't it? Good times!

Or imagine someone was silly enough to name their auto-resizing arrays "vectors", even though in all previously existing contexts a "vector" is a sort of thing which absolutely cannot be meaningfully resized/extended. Ha. Think of the tiny cognitive burden placed on generations of future programmers-who-study-math, trying to juggle these two very-similar-but-distinct concepts, multiplied by the number of such future programmers. Amazing practical joke, right?

/rant


Not really that important, but I think map is a better name than associative array.


No, it's not a problem with word "tuple" behaving differently from statically-typed language. It's a problem with word "tuple" behaving differently from all the rest of the world.

Yes, I would feel better if it was named "ImmutableList" or any other way that is not misleading about the purpose.


I'm sorry, I don't see clearly what a tuple should be. What would be different about Python tuples if they were true tuples?


In most languages you can't usually:

1. Iterate over a tuple

2. Convert a list to a tuple

3. Construct a tuple of a length not known at compile-time

Python allows these because "why not?" but it does break their "one and only one way to do it" rule and confuses beginners a hell of a lot.

There are definitely borderline cases. For instance, should a Vector be a list or a tuple? A Vec3 type is obviously a tuple, but a large Vector destined for BLAS is obviously a list.


> Python allows these because "why not?"

No, it allows them because the distinction that those restrictions are founded on is only useful in a statically-typed languages, and Python isn't statically typed.

> For instance, should a Vector be a list or a tuple?

A real vector/array should be its own data type (probably implemented in a C, or similar low-level, extension) that happens to implement the interface expected of an indexable, iterable collection, neither a list nor a tuple.


> [this] distinction [...] is only useful in a statically-typed languages

...like Erlang.


Yeah, while snarkily made its a good point that Erlang does make use of it without being statically typed.

There is a deep difference that goes beyond use of tuples in language approach between Python and Erlang here where it comes to types in which Erlang, while dynamically typed, has a deep concern for types in its pattern matching system to make path decisions while Python is very much centered on using dynamic OO techniques -- how objects respond to messages -- to do that.

So I'd still say its the same kind of deep language approach difference at work.


So why does Python distinguish between a list and a tuple at all?


Because the distinction between a mutable list and an immutable list is still meaningful in a dynamic language like python.


One is mutable, one is not. Performance. Also, look up namedtuple. Useful for returning multiple values.


Python tuples are used in both ways.

Even in Haskell, though, people often write all kinds of type-class magic to allow "iterating" over a tuple. For example, a Binary instance over a tuple wants to call "put" on each element.

Haskell's (Oleg's) HList is basically a tuple with iteration/list-like operations.


The distinction between "tuple" and "immutable list" doesn't make any sense outside of a staticly-typed language, since the only difference is what other values a particular value is type-compatible with.


Yes, of course it doesn't. You have just vanished whole Erlang. Or is it statically typed?...


NB a tuple of one item only requires a trailing comma, and a tuple of zero items is represented as ()


> Mutables as default function/method arguments

It would really make sense to change the semantics of Python to fix this issue.


Change them how, to no longer have functions be first class objects? The behavior of mutable default arguments is clear if you know how Python treats function objects. Any "fix" would handicap the language.


> to no longer have functions be first class objects?

There are other dynamic languages with functions as first class objects which don't share the "mutable default arguments" gotcha.

But having said that, any change regarding this would break backward compatibility.


Can you clarify?

    def foo(default_arg = []):
Why can't that just be shorthand for:

    def foo(default_arg = ParamNone):
        if default_arg == ParamNone:
            default_arg = []
How would that break first class functions?


As a minor point, use "default_arg is ParamNone", since "==" probably won't do the right thing.

What breaks is something like:

  def foo(default_arg = slow_f()):
    pass
Under the shorthand gets turned into:

  ParamNone = object()

  def foo(default_arg = ParamNone):
    if default_arg is ParamNone:
        default_arg = slow_f()
    pass
This is fine, since everyone would know that the shorthand means to not put slow code there. Instead, people will start writing it as:

  _foo_arg = slow_f()
  def foo(default_arg = _foo_arg):
    pass
Of course, then what happens with:

  _foo_arg = slow_f()
  def foo(default_arg = _foo_arg):
    _foo_arg = 5
? Under expansion it becomes:

  _foo_arg = slow_f()
  def foo(default_arg = ParamNone):
    if default_arg is ParamNone:
        default_arg = _foo_arg
    _foo_arg.add(5)
This violates Python's scoping rules, because _foo_arg is now being used in local scope instead of global scope. Eg:

  >>> def f(x=None):
  ...   if x is None:
  ...     x = spam
  ...   spam = 3
  ... 
  >>> spam = 9
  >>> 
  >>> f()
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 3, in f
  UnboundLocalError: local variable 'spam' referenced before assignment
Which means you now need a new scoping rule, just to handle default parameters without making things more confusing.

It also turns what was a simple O(1) offset into a precomputed list into a globals() lookup for many cases.


In Python 3 there's the nonlocal keyword to deal with the scoping thing.

The default arguments thing is worse than a lot of the stuff Python 3 corrected.


I don't see how nonlocal could fix this. Could you explain?

More specifically, given a better 'default arguments thing', how would you interpret:

    x = [2]
    def f(x=x*5):
        x.append(4)
With the earlier conversion it's:

    x = [2]
    def f(x=DefaultArg):
        if x is DefaultArg:
            x = x*5
        x.append(4)
This isn't going to work because the x inside of f() is different than the outside x, and you'll get the error message I mentioned.

If you add a nonlocal, as in:

    x = [2]
    def f(x=DefaultArg):
        nonlocal x
        if x is DefaultArg:
            x = x*5
        x.append(4)
then you'll get "SyntaxError: name 'x' is parameter and nonlocal".

What other solution are you thinking of?


Ah I see. Yeah nonlocal can't fix that.


Why not just memoize the result of the slow function?


I think by your use of "just memoize" that I didn't explain myself well enough. I don't mean to focus on the slow aspect, nor even the function aspect. It could be

   def foo(default_arg = [0]*(256*256)):
     ...
and still get the same namespace issues.

Memoization is not always going to be an available solution. For example, it may be that slow_f() returns a stateless object, so can be reused, while slow_f(x) returns something stateful. You can think of my examples as either using default arguments as a single element memo, or using a module variable for the same. Both premised on the idea that the developer knows enough to make the right decision.


How would the above work for duck-typed lists, for example? How would the language "know" which types are mutable?


No, you could just evaluate the expression every time the function is called (like Ruby does) instead of evaluating it once at the time the function is defined (like Python does).


Remove a `static` keyword from C (used inside a function) while you're at it...


Wildcard imports are OK, if and only if the module is specifically designed with this usage in mind.

Example: nose.tools


Agreed on all counts. However I do find myself using mutables as default arguments sometimes because the generated documentation is clearer.

For example, this is a real method in one of my projects:

  def listen(self, address, ssl=False, ssl_args={}):
      pass
I like the way this turns up in the docs because it's immediately clear that ssl_args needs to be a dict. Otherwise I have to describe it in words.


You should not ever use mutable default args. The documentation benefit is drastically outweighed by the potential for nasty bugs. http://pythonconquerstheuniverse.wordpress.com/2012/02/15/mu...


In case you're using Python 3, why not write ssl_args: dict, ssl: bool=False?


Well, that's throwing people implementing subclasses under the bus, IMO.

Why not just add @param annotations in your docstrings instead?


> Well, that's throwing people implementing subclasses under the bus, IMO.

If they need to touch this argument in an overridden method and they don't know what they are doing, then yes.

> Why not just add @param annotations in your docstrings instead?

I'm using Sphinx and it renders them separately. I want the empty dict to show up in the function signature.


A single cryptic bug due to this practice will more than negate the minor doc readability benefits you get from that. And there will likely be many more than one cryptic bug.

There are other ways to emphasize it ought to be a dict/mappable. Change its name to be suffixed as "_dict", for example?


Also, not knowing what scope a variable has when creating closures.


I agree that bare excepts are bad, however they do not catch Ctrl-C. If you _do_ want to catch Ctrl-C you have to except a KeyboardInterrupt explicitly.


This is untrue (just tested) for at least Python 2.7.x. under Linux, try ... except catches Ctrl-C.


Mmm... you should always use 'if x is not None:' imo.

It's very common for libraries to make values evaluate to False, and very easy to get bugs if you just lazily test with 'if x'.

Sqlalchemy springs to mind immediately as one of the common ones where using any() and if x: is a reeeeeallly bad idea; there are plenty of others.

I'm pretty skpetical about modifying your coding behavior based on what libraries you happen to be currently using.

'If x' isn't your friend.


Especially taking into account the more bizarre bugs (features?) of python:

    (bool(datetime.time(0)), bool(datetime.time(1))) == (False, True)
I always consider `if x:` a bug, unless x can only be a boolean. Furthermore, it seriously hinders readability and clarity of the code.


> I always consider `if x:` a bug, unless x can only be a boolean.

Oh, like this? ;)

   var eventA = new Date(), eventB = new Date();

   if (!parseInt((eventA - eventB) / 1000)) {
     console.log("these events occurred simultaneously");
   } else {
     // troll harder with confusing use of 'asynchronous'
     console.log("these events occurred asynchronously");
   }


I was specifically looking through these comments for your example.

I got bit by this once, and it's certainly... strange (it's also surprising). It's considered "behavior consistent with the rest of Python" [1] (which I can agree with) even if it makes little sense in terms of immediate readability to someone who hasn't previously encountered it. Fortunately, the workaround is easy, and it is documented.

There's at least a couple of spats on the mailing list regarding this feature that are of interest to the curious or at least those who are interested in the history of such behavior.

[1] http://bugs.python.org/issue13936


Agreed. Especially if you program in more than one language, trying to remember the subtleties of each one's collection of rules for "truthiness" is a fraught exercise. And isn't it Python people who like to say, "explicit is better than implicit"?


    "Explicit is better than implicit."
If you mean is not None, you should say is not None.

It's fast and readable and there are no "just be aware that" disclaimers to tack on afterwards.


It depends.

If you're checking to see if that value is None, then yes - you should check that.

If you're merely checking if the value is truthy, then using "if x:" is completely legitimate.


I've found "if x" to be much less readable, especially when I'm looking at code written in a language I'm not familiar with. When I'm reading such code, I want to be 100% what something is doing, and not have to read documentation when avoidable.


I agree. The author brings this to light as well and warns about the implications of changing what is considered "truthy" in x.


Check out Raymond Hettingers Transforming Code into Beautiful, Idiomatic Python talk on youtube [1].

Great talk on avoiding some of the common pitfalls new python developers step in. Exposes some nice language features.

[1]: https://www.youtube.com/watch?v=OSGv2VnC0go



The only thing I disagree with is "use nested comprehensions" thing. In my mind: x = [letter for word in words for letter in word]

is inside-out or backwards or backwards. I want the nested for being the less specific case:

   x = [letter for letter in word for word in words]
makes more sense in my mind.

(It's also my first answer to the "what're some warts in the your language of choice).


I'm in the camp that if your list comp needs more than one for clause, it's complicated enough to be broken out into actual for loop.


Everybody, listen to this person!


Then it turns into this:

    x = []
    for word in words:
       for letter in word:
          x.append(letter)
Which in addition to being far more verbose and less readable, is also less efficient.


I'll tell you what I tell my team: it's barely more verbose, and the readability is up for extremely serious debate (I mean, everyone understands nested for loops, but the nested list comprehension thing is wierd... why is something that appears way way before the innermost "for" the same as something in that for loop's declaration?)

It may also be less efficient. When I'm shown numbers that the difference between the comprehension and the for loops are (in each specfic instance, or in aggregate for the program in question) is above statistical noise AND it's a significant factor in overall runtime (I won't ever worry about a millisecond when the runtime is 1s), then I'll gladly say: put them in.

Until then, just use the loops. Use of really strange language features that are surprising, not exactly idiomatic (this argument is common for this case) and not shown to be of actual benefit, are detrimental in a polyglot environment.


When I see a list comprehension I can see with a single glance what it's doing. Not so with the 4 line for loop. Comprehensions aren't a strange language feature in Python either...it's one of the central features of Python.

Don't use something until it's proven to yield a great benefit is a very conservative approach. That may be appropriate in some cases, but I'm very glad that I am not in such a team since that would be incredibly frustrating. I much prefer an approach where you go with the choice that's most likely the better one, even if it's not 100% proven better or not a big difference.


I'm talking strictly about multi "for" comprehensions. They just are too confusing to me and most of the people I've worked with ever. But we also use lots (most) python features fully, just that one has been the source of dozens of bugs in this one codebase, not to mention others I've worked on with other people. It is a shitty non-intuitive syntax.

Nested for loops, flatten(), various itertools functions and chained generator expressions all suffice, and I have yet to see them provide measurable slowdown to actual code compared to good algorithms and decent factoring. Like I said, I'll even use multi-for comprehensions if there is a measurable difference over nested for-loops.

Also, I think you are intentionally misrepresenting what I said - when I said don't use "weird stuff" I explicitly excluded idiomatic language things. That includes (for python) single for comprehensions. The multi-for comprehension is something I rarely come across in the wild despite it's long time existence in python - it's a weird one.


I think you are trying to justify your strange preferences after the fact. How exactly were there bugs caused by nested for loops that you encountered? It's not like if you mess up the order it will actually run without throwing an exception. Nested list comprehensions are idiomatic python. It's really strange that you don't let your team use them because you are afraid of them.


Well, Google doesn't recommend it either (http://google-styleguide.googlecode.com/svn/trunk/pyguide.ht...), but I guess they are all bloody noobs or whatever.

I mean, single level comprehension is good. Nested list comprehension is OK only in most trivial cases. In my opinion, if I see how a person uses list comprehension, I can tell, what kind of person this is.

There are people who, for example, do this def all_is_okey_dorey(lst): return all([some_predicate_fn(x) for x in lst])

instead of this def all_is_okey_dorey(lst): for x in lst: if not some_predicate_fn(x): return False return True

and can live with themselves somehow.

Or there are people, who refuse to acknowledge the existence of anything besides Python 3.x and when forced to write in 2.x use list comprehension instead of iterator comprehension.

Thing is, the validity of using nested list comprehension depends not on the amount of for loops you have, but on the thing you want to do with the item. If it's just selection, then it might be ok. If you want to apply some kind of function to it, then it's most probably the case of trying to be too clever.


If you think two loops is not a "simple case" of a list comprehension as Google suggests to use them for, perhaps you shouldn't be doing code reviews. It sounds like your team would be held down by your weak grasp of the language.


The only thing wrong with that list comprehension version is those [ ]

    all(some_predicate_fn(x) for x in lst)
Much better than the loop.


It's much more readable, people won't make mistakes on it the way they do when trying to be too clever with their multiple-for statements in list-comprehensions.

Trying to pack too much on a single line is one of the sins of perl, and I'm happy to read python code that is comfortable being multi-line.


For loops can often be avoided. I would write this particular example in one of these ways, that I think are readable:

    x = []
    for word in words:
        x.extend(word)

    from itertools import chain
    x = [letter for letter in chain(*words)]

    x = list(chain(*words))


Wouldn't chain(*words) require unpacking all of words before feeding it into the chain function, storing a second copy of the word list in memory?


Yes it would, but I don't care about these small efficiencies, say 97% of the time ;)

The lazy version in Python 3 would be this one:

    list(chain(*map(iter, words)))
For Python 2 one has to use itertools.imap instead of map.


It's not all or nothing. One, last? for loop can be list comp.


Even clearer:

    x = [for word in words: for letter in word: letter]
This also has the advantage of being readable left to right without encountering any unbound identifiers like all other constructs in Python.


As someone who spends a significant amount of time in python that is much harder to follow. There is an expectation for order in list comprehensions and what you wrote completely violates it.


> write a list comprehension (...) code just looks a lot cleaner and what you're doing is clearer.

I know how to use list comprehensions, but often avoid using them and use the standard for loops. List comprehensions look nice and clean for small examples, but they can easily get long and become mentally hard to parse. I would rather go for three 30 character lines instead of one 90 character line.


Depends on how you want to program: imperative vs functional.

Personally I think list comprehensions are the most beautiful part of Python, though sometimes I use map() when I'm trying to be explicitly functional (I realize it's allegedly slower, etc).

Generally I think list comprehensions are cleaner and allow you to write purer functions with fewer mutable variables. I disagree that deeply nested for loops are necessarily more readable.


Map is not allegedly slower, it is demonstrably slower.

$ python -mtimeit -s'nums=range(10)' 'map(lambda i: i + 3, nums)' 1000000 loops, best of 3: 1.61 usec per loop

$ python -mtimeit -s'nums=range(10)' '[i + 3 for i in nums]' 1000000 loops, best of 3: 0.722 usec per loop

Function calls have overhead in python, list comprehensions are implemented knowing this fact and avoiding it so the heavy lifting ultimately happens in C code.


As you point out though, it depends on where the function is coming from:

$ python -mtimeit -s'nums=range(10)' '[str(i) for i in nums]' 100000 loops, best of 3: 2.57 usec per loop

$ python -mtimeit -s'nums=range(10)' 'map(str, nums)' 1000000 loops, best of 3: 1.88 usec per loop

$ python -mtimeit -s'nums=range(10)' 'import math' '[math.sqrt(i) for i in nums]' 100000 loops, best of 3: 3.25 usec per loop

$ python -mtimeit -s'nums=range(10)' 'import math' 'map(math.sqrt, nums)' 100000 loops, best of 3: 2.55 usec per loop


Fair enough. I said "allegedly" because I had never personally measured the performance difference.

Even though you could construe map as "half as fast" (or twice as slow) as the equivalent comprehension, I don't see a difference of ~1 usec making any difference in my code thus far. Good to know, though.


Yup, for very large calculations, or certain use cases, it can make much larger differences. It all depends on your use case.


Deeply nested I'd usually agree, but if you end up with deeply nested comprehensions as a result, in either case, you're probably better off with a little bit of restructuring/splitting things more functions to at least make the nesting cleaner to read.

However, I think he was referring to if the conditionals/additional modifications needed to build your list get a bit excessive, so you'd have a like... [dostuffto(A) for A in alsodostuffto(LIST) if conditional(A)] (but with more complex operations at each step).

Granted at that point you can argue that you should do just as my example shows and put the "dostuffto" into more encapsulated functions, but sometimes that doesn't seem like the right choice.


I'm a bit torn on it.

In a case where you need to do a lot of nested appends, I've found that even a long list comprehension can be easier to read. You just have to be sure to properly indent it and break it up into multiple lines. My rule is that every extra `for` starts a new line, and sometimes moving the predicate to its own line when it's too long, too.


For a concrete example, I was just recently converting a list-of-dicts into a dict-of-dicts. Here's an isolated snippet:

http://pastebin.com/8q46bK0v

To my eye, the list comprehension version is reasonable. But I like the imperative style better: it uses the most basic language features and at a glance you can tell what it does. My favourite is the dictionary comprehension version, it's the shortest but still conveys clearly what it's doing.


Yeah, I prefer the dict comprehension.

I use dict comprehensions quite frequently in my own code as well.


I agree. I try to use list comprehensions when I can, but the fact is that as soon as you have to do any sort of logic in your map, it quickly gets unwieldy. I'd much rather use a functional style in many of these cases, but python's syntax and crummy lambdas (as compared to what would be natural to do in say JavaScript or Ruby), means that in the majority of cases I find myself using an imperative style.


I always struggle to understand why a list comprehension

  alist = [foo(word) for word in words]
is considered more Pythonic than map

  alist = map(foo, words)


List comprehensions are more flexible and easier to read in the non-trivial case. Sure in the trivial case you show a map might be considered neater, but just adding a filter is enough to make the list comprehension more readable in my mind. Python's lambda syntax also makes using maps and filters quite ugly.

Compare:

   alist = [x**2 for x in mylist if x%3==0]
to

   alist = map(lambda x: x**2,filter(lambda x: x%3==0, mylist)

Plus python also has set comprehension and dict comprehension, which share essentially the same syntax.


Don't forget generator comprehensions which are almost identical to list comprehensions. But instead of evaluating the whole result set and returning it, they return a generator that you can then iterate over. Very neat stuff.


In Haskell, the latter looks like:

  alist = (map (**2) . filter (\x -> x `mod` 3 == 0)) myList
Or:

  alist = (map (**2) . filter ((== 0) . (`mod` 3))) myList
If alist is a transformation, and not applied to myList, it's cleaner:

  alist = map (**2) . filter ((== 0) . (`mod` 3))
Though Haskell also has list comprehensions, with more "mathy" syntax:

  alist = [x**2 | x <- mylist, x `mod` 3 == 0]


hi, you are lacking a closing parenthesis in your second example.


    alist = [foo(word) for word in words if word.startswith('a')]
    alist = map(foo, filter(lambda word: word.startswith('a'), words))
Which reads better?


I don't use FP practices in Python much, but if I did I'd define the filter outside the map, like so:

    begins_with_a = lambda x: x.startswith('a')
    alist = map(foo, filter(begins_with_a, words))


Even with named functions, Python's use of global functions instead of methods for iterators force you to read the expression from the inside out. I think Lisp languages nailed this with their threading macros, which allow natural left-to-right reading, but Ruby's strategy is better than Python's, too, while maintaining very similar syntax.

    ;; clojure
    (let [begins-with-a #(.startsWith % "a")
          foo #(do-some-stuff-with %)]
      (-> words (filter begins-with-a) (map foo))))

    # ruby
    words.filter { |e| e.start_with?(?a) }.map { |e| foo(e) }
It doesn't really make sense for things like `len` and `map` to be global functions in object-oriented languages.


Those are not equivalent. You need to wrap the map in a list() call.


In Python 3 you'd be right, but I think people still call mostly that language "Python 3" and mean Python 2 when they say "Python".


Admittedly, my thought would be to chain the calls, not nest them:

   alist = words.filter(lambda word: word.startswith('a')
                .map(foo)
That being said, my Python is limited and I don't know it filter/map are available as methods of a list. At the end of the day, there are cases where list comprehensions are much cleaner/understandable... and cases where the reverse is true.


> I don't know it filter/map are available as methods of a list.

They're not. Which is a shame in my opinion, because as you've written it you can clearly read the operations in the order they happen, ie. filter followed by map. Instead, you do have to do the second line of what blossoms wrote above.

And I don't think it's possible to write a list comprehension that reads in execution order, either :(


Map and filter, of course; less syntactical noise, simple function semantics, and plenty of precedent and equivalents in all other languages.


It's not like list comprehensions lack equivalent in other languages. Let's take Haskell for example. Python lists comprehension could use a where clause from Haskell though so one could really pack everything into a one-liner :)

(as it is now, list comprehensions requiring various references to result of a function call evaluate the function each time it's used)


you can consider a list comprehension to be a sort of literal representation of the result of map. I think literals have a benefit for code readability and should be used when feasible (i.e. the literal is compact enough).

the other reason its considered more idiomatic in Python is just because the compiler does a better job of parsing and optimizing list comprehensions.


You use list comprehensions in other places you wouldn't use map. The difference in text size as you posted is minimal, but the list comprehension - once you're used to reading them - tells you exactly what's going on.

Where as map could be anything. It could be redefined for all you'd know.


Don't know if this is why, but the list comprehension takes an expression at the "foo(word)" location, and is therefore more general than map, which requires a function. The comprehension in that case is simpler.

  words = ['w1', 'w2', 'w3']

  [word[1] for word in words]

  ['1', '2', '3']
  
  map(lambda x: x[1], words)

  ['1', '2', '3']
I like looking at the list comprehension better. The use of lambda looks forced in this case. I also imagine there's a penalty for calling the (anonymous) function in map.


In that case I'd use

  map(itemgetter(1), words)


Instead of my map example, or instead of a list comprehension in general?


Instead of a list comprehension equivalent to your example.


Those are not equivalent.

You cannot index a map object, for example.


I think the main reason is the lambda syntax. List comprehensions also let you do filter and nested loops.


This is a nice little article, but I wonder about some of the design decisions. In particular:

> The simplifications employed (for example, ignoring generators and the power of itertools when talking about iteration) reflect its intended audience.

Are generators really that hard? (Not a rhetorical question!)

The article mentions problems resulting from the creation of a temporary list based on a large initial list. So, why not just replace a list comprehension "[ ... ]" with a generator expression "( ... )"? Result: minimal storage requirements, and no computation of values later than those that are actually used.

And then there is itertools. This package might seem a bit unintuitive to those who have only programmed in "C". But I think the solution to that is to give examples of how itertools can be used to create simple, readable, efficient code.


Point 3 of the iteration part is not good advice. With [1:] you're making a copy of the list just to iterate over it...


You're right. I still wouldn't recommend looping over indices, but rather using itertools.islice(xs, 1, None) instead of xs[1:].


You're right... But at least it's a shallow copy.


I find that the testing for empty is a bit misguided if you want to be rigorous with types. for instance:

     >>>def isempty(l):
     >>>    return not bool(l)
     >>>isempty([])
     True
     >>>isempty(None)
     True
If embedded within your program logic this kind of pattern can waste precious time with debugging. You can catch your errors much more quickly if you are explicit with your comparisons.


Failing to use join is a big one.

I have seen countless instances of people writing the logic to output commas in between items (like for CSV export) that they want to concatenate into a string.

    header_line = ','.join( header for header in headers )
    csv_line    = ','.join( str(dataset[key]) for key in dataset.keys() )
Example for a case of a dictionary mapping a string to a bunch of numbers.


Proper Python would use the csv module for this operation, as your CSV export would break if `header` or `dataset[key]` contains a comma.


Yeah. This was a special case for a one line CSV that was requested by my client. It was a dictionary with a bunch of single measurements.


Any reason not to do the first one more compactly?

     ','.join(headers)


Oops. Good catch!


You should just use dataset.values() (or itervalues if you're using Python 2) instead of iterating over the keys, and then looking them up.


You're relying on dataset.keys() being in the same order as headers. Really, you should use the csv module, even for one-line CSV files.


> PEP 8 is the universal style guide for Python code.

> If you aren't following it, you should have good reasons beyond "I just don't like the way that looks."

Core dev and Guido have said many times PEP 8 are not holy.

See https://mail.python.org/pipermail/python-dev/2010-November/1...

In essence, a "stupid reason" like "I don't like it" is a valid reason not to adopt PEP 8.

In fact, I don't like the PEP 8 recommendation on docstring. I like Google's docstring (aka napoleon in Sphinx contrib-module).

http://sphinxcontrib-napoleon.readthedocs.org/en/latest/exam...


Interesting read, a couple of things I noticed though:

1. In "Checking for contents in linear time" both examples are the same. Perhaps remove the list entirely in the second example

2. Itertools.islice helps if you need to slice a list with a bajillion elements


# Avoid this

lyrics_list = ['her', 'name', 'is', 'rio']

words = make_wordlist() # Pretend this returns many words that we want to test

for word in words:

    if word in lyrics_list: # Linear time

        print word, "is in the lyrics"
# Do this

lyrics_list = ['her', 'name', 'is', 'rio']

lyrics_set = set(lyrics_list) # Linear time set construction

words = make_wordlist() # Pretend this returns many words that we want to test

for word in words:

    if word in lyrics_list: # Constant time

        print word, "is in the lyrics"
the second example should read ... if word in lyrics_set: ...


Just to point out, if you really will have a tiny list and that's knowable, it's possible this example would be best with a straight linear time check. It could be fewer operations than hashing a string and looking it up. Practically pedantry though.


Thanks for pointing out this mistake. It's now fixed.


I use python for datamining, and most of my work is done exploring data in iPython.

> First, don't set any values in the outer scope that > aren't IN_ALL_CAPS. Things like parsing arguments are > best delegated to a function named main, so that any > internal variables in that function do not live in the > outer scope.

How do I inspect variables in my main function after I get unexpected results? I always have my main logic live in the outer scope because I often inspect variables "after the fact" in iPython.

How should I be doing this?


If you are using the interpreter directly then that particular bit of advice is hard to follow since you basically live in global all the time. For that reason I would say that this advice applies mainly to .py files.


Agreed.

There's a big difference between "scripting" and "writing software" in terms of best practices.

If you're writing some ETL scripts in an IPython notebook, it would be overkill to encapsulate everything to keep your global scope clean.


I'd try to test smaller chunks of code for validity. If any block of code is longer than 12 lines, I get nervous that I don't understand what it's doing. Refactor your code into functions as you confirm the code behaves as expected in the interpreter.

It's very difficult to write automated tests when all logic is in outer scope rather than chunked into functions.


Instead of this:

    # Do this  
    lyrics_set = set(lyrics_list) # Linear time set construction  
    words = make_wordlist()  
    for word in words:  
        if word in lyrics_set: # Constant time  
            print word, "is in the lyrics"
You could do this:

    lyrics_set = set(lyrics_list)  
    words = set(make_wordlist())  
    matched_words = list(lyrics_set & words)  
    for word in matched_words:  
        print word, "is in the lyrics"


Of, off the top of my head:

    for word in (set(lyrics_list) & set(words)):
        print('{} is in the lyrics'.format(word))


Even shorter, nice. How about this one liner, the last bit is looking a bit messy any ideas?

    print " is in the lyrics \n".join([set(lyrics_list) & set(words)]), "is in the lyrics"


How about

  >>> lyrics_list = ["her", "name", "is", "rio"]
  >>> words = ["is", "rio"]
  >>> print '\n'.join("{} is in the lyrics".format(word) for word in set(lyrics_list) & set(words))
  rio is in the lyrics
  is is in the lyrics


Heh, I just replied with pretty much this exact code to someone else trying to make it a one-liner.

Yes, it's possible. I would never, ever publish code like this, though. It's opaque.


Just because it can be written as a one-liner doesn't mean it should be written as a one-liner :)

Don't ever do this.

    print('\n'.join(['{} is in the lyrics'.format(word)) for word in (set(lyrics_list) & set(words))])


I never said it should, it was a challenge.


   matched_words = set(lyrics_list).intersection(make_wordlist())


I can't think of a single case where using sentinel values is necessary or appropriate in Python.

Generally speaking, one should just return from within the loop.


I get what you're saying, but even when sentinels are used inside a function, returning a -1 to the caller seems like a pretty bad API. It's OK to raise ValueError! I had thought the idiomatic sentinel value was an instance of object() you could is against, anyway.


I agree - especially when you're returning a list index, since `[-1]` is a valid index in Python.


Funky OS of FFI APIs... but Python has a nice way of abstracting away the pattern using the second argument to iter().


> Consider using xrange in this case. Is xrange still a thing? doesn't range use a generator instead of creating a list nowadays?


Yes, I'd revise as "Consider using Python 3 in this case." This is a chief reason why I now avoid Python 2. Python 3 is more than five years old. Where we have discretion to choose Python 3, it's time to exercise that discretion. Not because 3 is greater than 2. Because Python 2 has prominent pains that are healed in Python 3.


Well, I'd say no, not in "Python".

It seems to me that when people say "Python" they still mostly mean Python 2, where range returns a list and xrange a generator. In Python 3 there is no xrange and range returns a generator, but I think people still most often call that language "Python 3", not "Python".


I don't think there us such thing as a pattern per language, unless the language is really unique. IMO what does exists is Language Bad Practices, which are actually tied to the language itself.

An (anti)pattern is something abstract and can be applied to any other similar language.


Speaking of find\_item, is the `for..else` loop (which can be used to write find\_item in another way) considered Pythonic? I personally like `for..else` loops but I don't know where the consensus is at.


I don't see `else` used very often on loops, but it is an unambiguous language feature. That's Zen enough for me.


http://en.wikipedia.org/wiki/No_true_Scotsman

Don't ask what's "more Pythonic" or "less Pythonic", Python is not a cult, it's a very practical scripting language. Ask for benefits and weaknesses of a given approach in given circumstances.


Speaking a language in the same way as other speakers of that language makes you easier to understand.


When it comes to basic formatting, symbol naming and high-level code organization, sure.

But computer languages, unlike human languages, are precise. Their intent is clear. And you'll never encounter a case where the Python 3 interpreter hasn't heard of that particular Python 3 keyword you're using.

It's also not an excuse to avoid certain features of a language, when using them leads to a better and simpler solution, just because they're less popular. Programming is not an exercise in popularity.

On the other hand, human language is fuzzy and full of phrases that consist of statements having nothing to do with their meaning. Such as me saying "your argument doesn't hold water".

Human languages also have additional layers entirely separate from the primary meaning of a conversation, such as sending social cues like "how smart am I", "do I like you", "do I fit in this group", and "am I a leader or a follower". Each layer of concern drives a certain way of expression and imitation, none of which occurs (or should occur) when writing computer code.

A better example to compare to programming code would be mathematical notation. As long as you express your intent shortly, using the available mathematical notation, people will be fine, and your intent will be clear.

I've never seen someone ask in a math forum if their formula is more Mathematic one way, or another way.


an anti-pattern is a design pattern gone bad. these aren't anti-patterns. they're just noobie mistakes and non-idiomatic expressions.


is it an anti-pattern to call something an anti-pattern because the author isn't sure what an anti-pattern is?


read this right after using a range in a loop in order to get the index. Can't believe I went this long without knowing about enumerate.


It first mentions enumerate as a footnote after looping with a range:

https://docs.python.org/2/tutorial/controlflow.html#for-stat...

But the tutorial in the Python docs has pretty high information density and good coverage of things like this.

(I think there is some risk that this comment will be interpreted as If you don't know enumerate you need to look at the tutorial. That isn't what I intend, I just want to point out that the tutorial is a reasonably dense resource that hits on a lot of stuff like enumerate.)


Question, how do you over multiple long lists (in python 2) especially if zip itself takes a long time to zip them, for example.


You use Python 3.

J/K, while this is technically a limitation of Python 2, there actually is izip in itertools package which is a generator and works in similar way to zip in python 3.


Awesome! I knew there was some lazy zip-ish thing for python 2. I personally hate using range(len(foo)) as anyone else.


You just use a for i in range(len(... and ignore this web article's opinion on it since the situation calls for it


I would say that the leaky out scope problem combined with those really similar typos is perhaps most diffcult to spot out.


> Things like parsing arguments are best delegated to a function named main, so that any internal variables in that function do not live in the outer scope.

Why a function named `main`? We're not writing C here, there's no need for a function named `main`. Let's call it something that's actually useful, like `parse_cmd_arguments`


Set membership checking is constant-time? Somehow I don't believe that.


Sets are basically hash tables that store only keys and not values. You can easily write your own Set implementation by subclassing `dict` and stripping out references to `.values()`, etc.

As such, testing for membership is an O(1) hash table lookup. If you're skeptical:

    $ python -m timeit -s 'nums=range(1000000)' '100000 in nums'
    1000 loops, best of 3: 1.4 msec per loop
    
    $ python -m timeit -s 'nums=range(1000000)' '500000 in nums'
    100 loops, best of 3: 7.15 msec per loop
    
    $ python -m timeit -s 'nums=range(1000000)' '900000 in nums'
    100 loops, best of 3: 13 msec per loop
    
    $ python -m timeit -s 'nums=set(range(1000000))' '100000 in nums'
    10000000 loops, best of 3: 0.0572 usec per loop
    
    $ python -m timeit -s 'nums=set(range(1000000))' '500000 in nums'
    10000000 loops, best of 3: 0.057 usec per loop
    
    $ python -m timeit -s 'nums=set(range(1000000))' '900000 in nums'
    10000000 loops, best of 3: 0.0584 usec per loop


It's a hash table, so constant time lookup in the average case. It could be argued that it's actually O(log(n)), but it would not matter for any practical n.


islice should be used, not xrange




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: