Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A surprising number of people either make N queries because they don't what their code is doing (really easy for a beginner to do with Django models when not using prefetch) or, more commonly, they make queries that get N rows back.

You almost never need N rows. N rows are bad because Python has to parse them. Odds are, there's a way to get a constant number of rows back for every query in every view of your webapp.



I've definitely seen the first - probably the first major performance issue that everyone runs into when using an ORM.

By "N rows", do you mean having a view that only renders a constant number of rows, but your database query returns an larger collection that's then filtered in Python code? That seems like an obvious bug that can be usually fixed with a LIMIT clause.


LIMIT/OFFSET isn't the optimal way to do pagination either, because the database has to scan to offset before fetching the limit. I'm having a hard time trying to find the particular resource that dives into it, but here are the basics:

  SELECT *
  FROM t
  OFFSET 1000 LIMIT 10; -- does not use index, scans table
vs

  SELECT *
  FROM t
  WHERE id between 1000 and 1010; -- uses index

Edit: http://www.slideshare.net/Eweaver/efficient-pagination-using... see slides 12-16


I was wondering what you do when say record 1001 gets deleted...your second query will only return 9 rows.

It's explained in your linked slides that LIMIT is ok, it's the offset that you need to worry about so your query looks like it would be better written:

  SELECT *
  FROM t
  WHERE id > 1000 LIMIT 10;


Interesting, I wasn't aware of this. Thanks for the explanation - I'll have to keep this in mind the next time I'm implementing pagination for a very large table.


A simple way to implement fast pagination is to set item.idx whenever you insert an item into a collection. It starts at 0 and gets incremented each time.

From there, given the number of items per page, you can trivially determine which items to display given a page number or which page to display given an item.idx. And the query uses the efficient WHERE.

Of course, the last_seen pagination is only usable for pagination where users aren't deep linking into pages (like HN).


In many cases it's trickier than that too: if you're using your database's UUID's as the primary key, you'll need to find another column to use BETWEEN on and index that to make this work.

Plus, as stated in the slideshow, you can't provide jumps to arbitrary pages. In some cases this doesn't matter, but worth taking note of.

(pagination is frustrating!)


Side note, don't you need an ORDER BY when using LIMIT and OFFSET, as the database doesn't guarantee the order unless you specify it?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: