Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Nuitka: a Python compiler (nuitka.net)
267 points by woadwarrior01 on Dec 19, 2014 | hide | past | favorite | 135 comments


I built a small app that retrieves data from a remote server, builds a summary and then pushes it to a local html file.

Number of people that tried it when I told them that to make it work they had to download python, pip, paramiko and install the pycrypto binary? Zero. Number of people that tried it after I just gave them a zip file with an executable? No longer zero.

I tried different libraries but Nuitka was the only one that made everything work seamlessly. I owe this guy a beer.

Edit: I no longer owe Kay a beer. I found the donation link on his website.


This issue has been addressed multiple times with considerable success by pyinstaller, py2app, cxfreeze. You can get your single-zip-file distribution package without nuitka, although it may have other advantages.


I recently ended up rewriting large parts of a python project in common-lisp due to this exact issue; in this case it was one stage of a pipeline that I prototyped with an existing python library that could output to xml, but delivery of python applications on windows is painful.


I think this is one of the reasons Go was a breath of fresh air to a lot of Python and Ruby programmers. The ability to quickly generate a static binary is a great feature.


If you packaged it correctly, you could get by just requiring Python and pip to be installed. But still, the zip and single executable is still faster.


This project looks completely misguided. The talk focused on the trivialities of mapping Python to C++ rather than on the interesting problems to be encountered when trying to optimize Python while maintaining its extremely dynamic semantics. Also the benchmarking effort is laughable; pystone is not to be taken seriously (only exercises a tiny part of the language) and pybench does microbenchmarks, which are optimized away. You should try the "real-world" benchmarks from the PyPy and Unladen Swallow projects. And what is the size of the generated code? (E.g. how big would the binary for the entire standard library be?) In your blog, please use less boring subjects than "version x.y.z released". ~ Guido van Rossum

Find Guido's quote in the first comment.

https://ep2013.europython.eu/conference/talks/nuitka-the-pyt...


Well, that is harsh and somewhat unwarranted ("In your blog, please use..." Who gave him that authority?). I like Hayden's comment on this all:

> The good thing [GvR] did to Python is to make his opinion be just that. I can do Nuitka without and beyond his control.[1]

[1]: https://ep2013.europython.eu/conference/talks/nuitka-the-pyt...


If you posted that comment to somehow discredit this project, then I'd say you miscalculated (if you had other motives, then thanks for posting it, I guess?). Most of the folks on HN know better than to accept the word of an authority figure -- particularly one so harshly worded -- and so will check out Nuitka on their own.

Regardless of whether or not Hayden succeeds here, good on him for at least trying. GVR hasn't even made an effort to improve Python's performance (i.e, by most accounts, Python 3 is even slower than Python 2). In fact, he seems intent on actively discouraging such efforts. For example, why is CPython still using a stack-based interpreter, when several people have already worked toward implementing a register-based one and were only met with derision?

I generally agree with GVR's general approach of regarding premature optimization as a mistake, and developer time is usually more important to optimize for than processor time.

On the other hand, sometimes you have to optimize code, and dropping down to C is unnecessarily risky (not to mention tedious, although that comes with the territory). The fact that there's even an alternative today with Numpy and Cython seems to have happened in spite of GVR, not because of him.

I'm sad that someone as influential as GVR would be so consistently rude and dismissive toward an effort at improving Python, no matter how misguided he viewed the effort. This is the kind of attitude that makes leading open source projects suck.

Edit: By the way, I don't mean to imply that I share the pessimism some have about Python's future. Between Python's considerable rate of adoption in upper education, numerics, machine learning, bioinformatics, and various other scientific computing fields, combined with novel approaches to the language like PyPy, Pyston from Dropbox, Nuitka, and Numba, I'm overall pretty optimistic. But it's becoming clear that the committee-driven approach of CPython, driven/impeded by the BDFL, isn't bearing the fruits it has in the past.


It's sad to me to compare my sense of the Python community ten years ago, to what it is now. It seems like there was a time in the old days when Python was meant to be fun (and even the documentation matched that attitude), whereas between version politics, it's wider adoption for 'serious' work, and GvR's hostile attitude these days it really doesn't seem like that spark is even there anymore.

I miss the Monty Python jokes and the freewheeling 'BASIC, evolved' feel of old 1.x sometimes even.


I wasn't trying to discredit the project. I think the project is neat and I wanted to see what other people on HN would say about GvR's comment.


For me, this is not about execution speed AT ALL. I wouldnt even mind if it was slower.

It is about being able to have a easy, dead-simple way to provide Windows users with a executable. Without changing my code, without weird build systems. Like it or not, Windows users are still the majority out there.


I like the Lisp world's term "delivery" for this part, where you package up an application to ship to end-users. Optimization can be one part of delivery, but not necessarily the most important part.


Is that not py2exe's job?


In theory. In reality, python freezers have a lot of problems. It can be a real hassle to get a single file, and even if you do they turn out massive. I've had nothing but problems with them in the past.


I've had good experiences with PyInstaller, if you haven't tried it yet (or recently).


Seconded. I recently moved a work project from py2exe to pyInstaller and was very happy with the results.


I'd be happy if the setuptools bdist installer could create a single file that works in 32 and 64-bit Windows and would take care of running 2to3 during the install. The situation now is too painful for me to bother making Windows installers any more.


Exactly. I recently wanted to build a task tracking app for the company I work at. I initially decided on writing it in Python as a server-based app (Flask + SQLAlchemy) since I am familiar with it. Once I found out how damn difficult it is for end users to actually deploy the app, I opted for node-webkit (Backbone.js + nedb) instead. I definitely have no regrets.


Similar to Go IIRC, the binaries aren't slim, but it's one push away from deployment and was advertised as a feature.


I agree. In fact, I think that the slowness of Python implementations is a feature. It forces developers to use standard libraries, which in turn makes the program more concise. This is certainly the case in MATLAB where vectorization is pretty much needed for non-trivial programs, and this leads to improved readability.


Slowness may have unintended benefits, but also forces you to use C code where you otherwise might have been OK


Which makes it hard to upgrade the language .. especially if you change the C interface at the same time as you change the language.


For implementing Python, you have at least five options:

- Naive interpreter (CPython). Everything is a dict. Slow. - Transliterate to some hard-compiled language, but all data is still one kind of dynamically typed object (Nuitka). A little faster, and compatible, but has limited optimization potential. - Infer types and try to create appropriate code in a faster language (Shed Skin). Hard to do, but promising. (Shed Skin has one implementor.) - Restrict the language (RPython) Potentially much faster, but incompatible. - Build a JIT compiler/interpreter combo and handle all the hard cases that require recompiling during execution (PyPy). Hard to do, and results in a huge system, but almost compatible. After 10 years of work, it's finally happening.

If you're willing to restrict the language, it's much easier. RPython was written only to help build PyPy, but the concept could be extended to allow most of Python. Both Shed Skin and RPython insist that type inference succeed at disambiguating types. If you're willing to accept using an "any" type when type inference fails, you can handle more of the language.

The big boat-anchor feature of Python is "setattr", combined with the ability to examine and change most of the program and its objects. This isn't just reflection; it's insinuation; you can get at things which should be private to other objects or threads and mess with them. By string name, no less. This invalidates almost all compiler optimizations. It's not a particularly useful feature. It just happens to be easy to implement given CPython's internal dictionary-based model. If "setattr" were limited to a class of objects derived from a "dynamic object" class, most of the real use cases (like HTML/XML parsers where the tags appear as Python attributes) would still work, while the rest of the code could be compiled with hard offsets for object fields.

The other big problem with Python is that its threading model is no better than C's. Everything is implicitly shared. To make this work, the infamous Global Interpreter Lock is needed. Some implementations split this into many locks, but because the language has no idea of which thread owns what, there's lots of unnecessary locking.

Python is a very pleasant language in which to program. If it ever gets rid of the less useful parts of the "extremely dynamic semantics", sometimes called the Guido von Rossum Memorial Boat Anchor, it could be much more widely useful.


> The big boat-anchor feature of Python is "setattr", combined with the ability to examine and change most of the program and its objects. [...] If "setattr" were limited to a class of objects derived from a "dynamic object" class, most of the real use cases [...] would still work, while the rest of the code could be compiled with hard offsets for object fields.

Isn't __slots__ made for that specific usecase? (when you want to optimize your code by specifiying specific attribute names)

I think the "dynamic" behavior is a sane default (since most people don't need that optimization).


setattr and similar dynamic features do make Python harder to optimize, but they're not that different from what you see in JavaScript. JavaScript has had an incredible amount of work spent optimizing it, but the result is a bunch of pretty damn fast language JITs that implement the full language, usually only slowing down in cases where actual use of those features requires it. Is it really that hard to come up with something like that for Python?

(Threading is a separate issue, though.)


> Is it really that hard to come up with something like that for Python?

Nope, we just need a large company willing to spend tons of money funding the effort.

PyPy has made incredible strides in this area, especially for long running processes where the JIT has time to warm up. But they need a lot more funding if people ever want Python to get fast.


I'm very impressed that PyPy got their JIT working. But it took 10 years from initial funding by the European Union. It's a hard problem. They had to come up with some new, elegant solutions to make it work. See "https://pypy.readthedocs.org/en/release-2.3.x/jit/pyjitpl5.h...

JavaScript is a bit easier because it doesn't have concurrency. In Python, you can change a method of an object while an instance of that object is being executed in another thread. So you have to worry about invalidating code currently being executed asynchronously.


Guido's response, which is entirely unprovoked and rude (given that it is for a small volunteer effort, that has already achieved something admirable, doesn't ask anything from him, and doesn't harm his CPython in any way), seems to me worse than anything I've read from Linus (who's just a dog that barks but doesn't bite, and just uses the insults for emphasis).

This is pure condescending tone...


Looks like the response I would expect from someone who has seen tons of "Why not just compile it and see it get magically faster?!?!" queries.

It's easy to compile it, but making the compilation actually useful for a dynamic language is HARD. There's a reason that most dynamic languages will either interpret or JIT, and it's not because the JIT writers overlooked something obvious.

A naive translation of the code will remove a little bit of overhead from the bytecode dispatch, but the resulting code bloat will blow out instruction caches for any reasonable sized code. In small programs with one translation unit, analysis can sometimes work to speed things up, but it quickly becomes either undecidable or intractable.

Basically, on dynamic languages, you can often see static compilation being worse than a naive attempt at native code, especially once the hot path for the interpreter fits in cache, but the generated native code no longer does. Wanting to see results on real world benchmarks is entirely reasonable.


I don't know. Lua (the non JIT version) runs circles around Python -- without compilation.

And when adding compilation into the game, I don't see CL as any less dynamic (probably more) than Python, and yet it goes to near C speeds too.


CL goes near C speeds with an INCREDIBLE amount of work to make your program as static and C-like as possible. It also requires intimate knowledge of your particular implementation.

Sometimes it's not even possible (!) to get C speed because of things like float boxing across function boundaries.


I'll give you that -- but, while not C-speed, it still runs circles around Python when coding in a idiomatic style too.


Looks like typical impulsive defensiveness, doesn't portray him well as a person


Seriously? I think Python is fantastic, and IMHO this is due to Guido. But seriously, how can he possibly comment on someone else's attempt to improve the performance of Python? He should look in a mirror when saying that ;-)

It's a hard problem, I wish more people would attempt to take care of it.


I like Guido but that comment seems too harsh and commenting on how boring his blog titles are is completely unnecessary. The talk was very interesting and is very neat for only a spare time project.


I don't know why (espacially key-) people in development (Torvalds, now Rossum etc.) seem to have a hard time phrasing their thoughts with a little more consideration. Would it hurt to put it more like "this project is not of interest to me" - what is completely misguided in working on something and trying things? Or to restrain oneself from bashing the effort as being "laughable" and instead try to offer ideas for improvement or presenting the alternatives without discrediting the whole thing. And if he is bored, why not just skip the whole thing?


People in these positions (e.g. Torvalds) are generally extremely opinionated (which is good, because it can provide direction to a project in its early days), but have also spent years listening to people without sufficient skill or experience attempting to provide ideas, patches, or commentary that they're not qualified for.

At some point, the effort of letting someone down gently fifty times a day gives way to a curt but efficient form of communication.

In this case, GvR is just laying out what he sees as issues, take them or leave them. If you don't care for his opinion, fine, if you do, there you go.


W/R to language design, if you start designing without really caring about it, you probably won't finish. And that means that the designers of popular languages typically have unreasonable opinions. How that manifests itself depends a lot on the person, though.


Wow, I'm really surprised. Guido is usually incredibly nice, but he comes across as very condescending in that comment.

I love Python, it's my favorite language by far - but I love all those different tools (numba, pypy, hope, nuitka, cython etc) that make you sacrifice a small amount of dynamic magic in exchange for significant speed ups.

I don't have to use them - but when I need to write fast code, it's really nice to be able to do so in Python.


Are the names on the comments verified? This seems very much out-of-character for Guido, he's generally been quite supportive of efforts to improve Python's speed (eg. Unladen Swallow, PyPy) or add static typing as a library. It occurs to me that with this blog interface, I could post as "Ken Thompson" and nobody would be the wiser.


I can verify that van Rossum's comments are by him. I saw him walk out of the aforementioned Nuitka presentation at EuroPython a few years back, and clearly out of annoyance with it.

See also https://groups.google.com/forum/#!topic/python-static-type-c... if you wish additional evidence.

That Google Groups thread shows that he's irritated because he thinks that the Nuitka author doesn't understand the issues: "... he is incredibly naive about what kind of optimizations he'd like to apply. (He basically doesn't seem aware of the difficulties arising with static analysis of Python.)" While the Unladen Swallow and PyPy developers do understand the issues.

I think it's in character. I gave a talk at another EuroPython on measuring performance. Partway through he asked "isn't this just a repeat of the timeit module"? My response was "yes, I'm explaining why it works the way it does." I think that was sufficient to mollify his irritation.


Stefan Behnel (one of the Cython leads) posted a more detailed critique of the project after the EuroPython talk too:

http://blog.behnel.de/posts/indexp241.html

Nuitka's author's reply here:

https://webcache.googleusercontent.com/search?q=cache:fB1tgJ...

I sympathise with the developer because I think the tone of the criticism he's been getting is very far short of nice.

...But...On a purely technical level...I see a lot of problems with his response.

I think there's real merit in this project on the distribution side of things --- not the optimisation side. I hope he pivots the project in that direction.


That post by Behnel is a good find. FWIW, I was at the talk, and my memory was that I also agree that the speaker didn't know what he was talking about regarding optimizations and took incredibly too long to talk about it.

There are previous projects, like Michael Salib's "Starkiller: a static type inferencer and compiler for Python" ( http://dspace.mit.edu/handle/1721.1/16688 ), and John Aycock's "Converting Python Virtual Machine Code to C" (http://legacy.python.org/workshops/1998-11/proceedings/paper...) which explored that optimization space, and no doubt others.

So when the talk is titled "for the first time, there is a consequently executed approach to statically translate the full language extent of Python, with all its special cases, without introducing a new or reduced version of Python", a listener would expect that it be more advanced than previous work. To be fair, neither Aycok's nor Salib's work was complete, but they did enough groundwork to show that static analysis of the type that Nuitka was exploring would not be able to achieve its stated goals.

But all of this pointing to a discussion from 2012 is pretty pointless, as people do use it for distribution - something which wasn't touched on during the talk, as I recall - while the author did talk about aiming for type inference at compile time, when all evidence is that the speaker had little idea of the actual issues, as previous reported in multiple previous attempts.

I honestly don't know what to do when someone presents a talk as a pure hobbyist, puttering around on a project with little interest in what others have done, and who doesn't understand the audience enough to gauge which details are of interest and which aren't. There's a culture mismatch, certainly, but that's what makes it harder to assign fault.

Should Hayen have realized that the project wasn't at the right level for EuroPython, or that the future project goals were overstated? Did EuroPython get more advanced over time? Did van Rossum not allow that weekend hobbyists don't have the experience to judge things? Should van Rossum never say anything negative in public about a project (and if so, at what level of fame does one need to bear that in mind)? I certainly have no answers to those.


That is a very good point, and it would be a shame if Nuitka's developer (and people in this thread) would have gotten a wrong impression on GvR from something like that.


Apologies if this is ignorant, but how can we be sure that this was actually Guido van Rossum commenting?


Because he basically said the same thing live in the crowd. He's vocally against this project.


Oh, okay. I can believe that. I was just curious.


So this would be the same Guido van Rossum who made a completely misguided analysis of the language he had birthed, tried to make it "grow up" in version 3.x, brow-beated us all about the dubious benefit of this "new" cruft-encrusted language, causing strategic drift for the entire ecosystem? The best thing that could happen to Python would be a benevolent, and mercifully silent, retirement of GvR. And I speak as a long term Pythonista.

Kudos to anybody who is trying to move Python into a performance envelope similar to Lua and Javascript, not to mention Golang, Julia, or Clojure.


GvR fading away would help but not cure the problem he started, i.e Python design by committee. The Python culture results in substandard implementations of new features after waiting years for them to come to fruition. (asynccore/asyncio.. :( )

Of all the current dynamic languages Python is the slowest moving on almost every front. Ruby become popular around the time Python 3.X was coming out, at the time it was much slower and riddled with 1.8/1.9 issues. Since that time Ruby has surpassed Python almost everywhere whereas Python 3.X which should have had the freedom to do great things considering it broke backwards compat with 2.X has languished in it's slow performance.

Sure there is great things happening in and around the numpy community.. but that is tiny compared to the great things happening in Julia, Rust, Ruby, Golang and Javascript. (I include the last 2 despite my personal opinion being that they are not as good, but one can't deny they have made progress).

I loved Python but it's standing still and I can't afford to do that anymore. I said my very solemn goodbyes to programming Python full time a while ago and I think it's one of the best decisions I have made.

That being said. Good on this dude for doing something about Python performance without doing Cython style stuff.

/rant


...what's wrong with asyncio? Why the frowny face there? I've found it to be a joy to use, and the killer feature of 3.4.


Clarification. It's the only killer feature of the entire 3.X release lineage. And it's late (hence, if I understand correctly, the growly).


It's not the only killer feature. Where Python 2 lacks consistency Python 3 fixes it. Where Python 2 core libs have stagnated, the Python 3 ones have been improved. All future development from core CPython developers is going into Python 3. That's a pretty killer feature.


Slow, incremental improvement is not a killer feature. Compiled Python 10x faster. That would be killer. Proper multi-core concurrency. Killer. Modest (yet breaking) improvement over 6 years while all the action goes on in other languages? Not killer.

But you cite a very important point: all the improvement, modest as it is for the majority of users, is going to 3.x and yes, that's why I've moved. But I can tell you it's only because of the constant nagging and threats about abandoning 2.x. Nobody would have moved for any actual "feature" of 3 were it not for the fear of being abandoned. It's a stick-only strategy. No carrot.


In my book, these are only killer features compared to Python 2, not in comparison to the rest of the world, and more along the lines of minimum necessary to justify the pains of a breaking change.


How is a library like asyncio a killer feature in 3, when you have twisted in version 2?


asyncio has support from the syntax of Python, a nicer API, and better documentation.


Agreed on all points though I think you underestimate to what extent Numpy is "carrying" Python. Without Numpy Python would be dead and buried in my view. Sure tons of stuff is "pure" Python but the core of its cred comes from the huge quality of some libraries which are built around Numpy (Pandas but one example).

Separately though, sometimes when you want to break the dead hand of the committee-driven, value-destroying hold of a small group of individuals, you have to mercifully push aside the figurehead that provides their credibility. That figurehead is Guido van Rossum, and his post speaks more than what I could about how his time is over.


I have a hard time believing that NumPy is "carrying" Python to the extent you suggest.

While a large number of people use Python because of NumPy, my work in chemical structure analysis and search doesn't depend on NumPy. I use the package about once a year, and few of the chemistry analysis tools I have a dependency on it. I used to be involved with the Biopython project, and while NumPy is strongly recommended for a couple of the modules, it's not a requirement.

Then there's all the people who use it because of Django, and Zope before that. (I remember that the 2000 Python conference seemed to be half Zope developers.) Plus the win32 extensions get about 7,000 downloads per week, implying a pretty active base in that area.

Checking the PyCon talks, only about 5% of them seem to deal with numeric work. (Then again, there's SciPy ... but there's also DjangoCon and local non-NumPy meetings; certainly few in our local user group meeting work with NumPy.)

How then did you draw your conclusion?

Python-the-language did make three changes to support the language: multi-dimensional array slicing, the Ellipsis notation, and most recently the infix operator for matrix multiplication. But I believe that any language which couldn't handle a NumPy-like module would not be that successful in the first place.


I think you're overestimating the amount of people using Numpy. There's no way it's more popular than Django, scientific computing is mainly academic anyways.


In addition the majority of scientist don't install numpy from pipy because it almost never works. Many use the numpy that comes with the distribution(example Ubuntu), a Python distribution (ActiveState, PythonXY, Enthought, Anaconda), or binary installers in Windows. So I would say that numpy usage numbers are a lot higher than what was shown in the other comment.


...not by as much as you might imagine.

django

    Downloads (All Versions):
    25082 downloads in the last day
    171415 downloads in the last week
    743271 downloads in the last month
numpy

    Downloads (All Versions):
    12920 downloads in the last day
    80640 downloads in the last week
    327163 downloads in the last month


I am sorry, and at risk of downvotes, web dev is a much faster (dare I say it more fickle) crowd than science. Python is not a whole lot better than Golang, Ruby and of course, Node.js at this use case. Absolute numbers are not persuasive to me here. There are far fewer scientists than web devs (your stats imply ~ 1:2). Indeed I would venture to suggest that your data means that numpy punches well above its weight relative to other languages' science:webdev ratios. Let's not forget that Numpy and its direct linear ancestor, Numeric, has been dominant for more than a decade. Show me the Python web framework which can say the same. Flask/Bottle/Pyramid/Django/Tornado - pick your fashion.

Another point (to the grandparent posts): putting Numpy into a "science" pidgeon hole is erroneous. It is massively used in finance, engineering, bioinformatics, and statistics.


> There are far fewer scientists than web devs

... and there can still be more scientists using Python than web devs using it.

Python web development is niche, despite the good frameworks. If you look at job postings, it's eclipsed by RoR, .NET, Java... maybe even PHP.

> Let's not forget that Numpy and its direct linear ancestor, Numeric, has been dominant for more than a decade. Show me the Python web framework which can say the same.

Django was released almost 10 years ago, and I remember it was already popular, a lot of people migrated from Plone.


The point is that Python's USP is Numpy. Python's USP is not web dev. If Python were to disappear there are many, many credible web dev alternatives. Not true Numpy. Take that from an R, Julia, C user. Numpy has the perfect combo of speed wrapped in an expressive language. I don't see that anywhere else.


And I think django tends to have more downloads than numpy. For django I tend to have a separate virtualenv so I know what to deploy on the server (so that's at least 2 downloads, plus one more for every new project I start). When I use python to calculate something/analyze data I don't care about deploying, and I think the scientific crowd is less likely to care about best practices in general.


PyPI download numbers are bogus. More than 90% are from mirror bots, not actual users.


Ah yes, the 250,000 mirrors.


Just a thought, not so certain myself how important it really is, but: The web dev world also might mostly abandon Python very fast, vs once you got a foothold in academia there is a good chance you're going to keep that for years. Having a language in the universities' curricula gives exposure for a long time.


Exactly. Web dev is a fickle business. See Backbone -> Angular -> React in just 18 months. Python Numpy is much more entrenched.


scientific computing is mainly academic anyways.

...and just about every aspect of engineering and a non-trivial part of finance.


It's not entirely academic. I, unfortunately, have to rely on it for game development.


Python 4


yes please.


> The best thing that could happen to Python would be a benevolent, and mercifully silent, retirement of GvR

Just out of curiosity, who do other pythonistas think would be the best candidate for new BDF"L"? Travis Oliphant is one who comes to my mind.


Why is one needed at all.


I think python has succeeded despite van Rossum, not because of him.


It's like asking Linus for his opinion of the App Store.


I think this is a really solid approach. It's the same thing I was trying to do for PHP with phc (http://phpcompiler.org/) - compiling down to C using the built in stdlib and C API, then using full program optimization.

Long term, it won't be as fast as a full JIT, as the Facebook HPHP team showed, but Python doesn't have a JIT of the same caliber, so this is probably useful for tons of people.

Dunno if the author is around, but they might find some of the stuff from my PhD relevant, especially how the static analysis worked and some of the challenges of compiling using the C API: http://paulbiggar.com/research/#phd-dissertation


This is probably a better link, for those of us (like me) who had never heard of Nuitka: http://nuitka.net/pages/overview.html



Checked contributors list on Github[0] and most of the code is written by a single person, Kay. Isn't that amazing? A project of this large and value, done by single person in his own time. I wish this project gets more popularity.

Bought a beer to Kay :-)

[0] - https://github.com/kayhayen/Nuitka/graphs/contributors


Great project. Compiling a simple PySide application worked fine on the first try. If you bundle all dependencies you get a binary about the same size as a binary produced by py2exe. I will do some speed comparisons next.


It is somewhat faster than CPython already, but currently it doesn't make all the optimizations possible, but a 258% factor on pystone is a good start (number is from version 0.3.11).

Holy cow. And here I'd set aside Python in part because of the difficulty in getting performant executables (I was getting visible lag in a turn-based SDL game ...)


Use Cython.

It's a hassle getting your head around it, and getting it set up, but you'll never look back once you do.

If you rely on PyPy, you get what you're given and then you're stuck. You can't really guess how to rewrite your code to make it faster.

A small blog post I wrote on this: https://honnibal.wordpress.com/2014/10/21/writing-c-in-cytho...

Here's a non-trivial example: https://github.com/honnibal/thinc/blob/master/thinc/learner....

This code is driving a library of very fast NLP tools that I'm writing.


Unforntuately cython is even harder to package up than normal python because it generates one shared library per module.

...not really suitable for games.


You may need to fiddle around with how modules are loaded by Python, but you can do whatever you want with the C files that Cython generates.

kivy-ios should have all the details for building everything into one big Python blob, if that's what you need.


Amen to Cython. When used well (eg. with C types for variables and a few compiler directives, which is not very painful at all), Cython emits code that is extremely similar to idiomatic C for my scientific computing workloads.


Might want to take "somewhat faster" with a grain of salt. I just tried it on a smallish program (CPU-bound, mostly string operations) and got a 20x slowdown.


They have a site showing benchmarks http://speedcenter.nuitka.net/ (down at the moment), but here's a google cache of one, showing also the generated C code: https://webcache.googleusercontent.com/search?q=cache:speedc...


But how does it compare to PyPy?


To be honest, I found that in my use case (game dev, supported by bindings to a C library), PyPy was actually slower than CPython. I'd be curious to dig up my code and compare performance to a Nuitka-compiled executable.


out of curiosity, which bindings did you use?


libtcod. It was a roguelike.


Those Python bindings seem to use ctypes, which is indeed slower[1] on PyPy. They recommend[2] using cffi instead.

[1] http://pypy.org/performance.html

[2] http://pypy.readthedocs.org/en/latest/extending.html


For games, other soft realtime, and run once code, pypy is slower. The newer GC is better, but the JIT still causes pauses. The slow warm up of the jit, no type hint saving, and slower interpreter is why it is not good on run once code. Of course if your python extensions are calling optimized assembly routines, or hardware, then it will be faster than pypy code written in python.

Also, using a JIT is not possible on some platforms (iOS, or if the CPU architecture isn't supported by pypy).

Not to say that pypy isn't better for certain tasks of course.


The solution is easy then, PyPy should have an AOT compilation option, like ART does.


That's pretty much what Nuitka is. Unfortunately, PyPy is only fast because it can take advantage of runtime information in the JIT. AOT compilation loses that advantage.


Not easy at all. ART generally compiles quite static languages (Java). Python is an incredibly dynamic language. The challenges for AOT compilation are vastly different.


It seems to be, by and large, the work of a single person on his spare time. Very impressive!


On a tangentially related note: here is a comparison of various Python runtimes (interpreters and compilers including Nuitka and PyPy) on a fairly complex scientific code: http://arxiv.org/abs/1404.6388

The shootout, however, is from August 2013 so getting a bit old by now.


I would love to make distributing games built on Python easier. Jessica McKeller suggested this should be a priority[0] to make the language more accessible by getting kids involved in making games and being able to share them, easily, with their friends. These days Javascript is kicking our butt in this area.

[0] https://www.youtube.com/watch?v=d1a4Jbjc-vU

I've been writing little helper libraries on top of Python 3.4 + pysdl2 as I've been working on games and demos for my (unfortunately cancelled) Pycon talk. One area I've only played with, unsuccessfully, is getting packaging going on Nuitka or some other compiler. If anyone wants to get together to make it awesome get in touch.


I noticed that PyPy does not show any real speed improvement over CPython in these benchmarks: https://www.techempower.com/benchmarks/

If you filter by Python, you can compare some results running on both CPython and PyPy. I would be curious to know what it is about these benchmarks that makes PyPy perform poorly. I would also be interested to see how Nuitka performs.

At the moment I'm also very excited about Pyston from Dropbox: https://github.com/dropbox/pyston


I'm supremely sceptical about those benchmarks. Go take a look at the code being tested. I would welcome a serious look at developmental time versus resource used, using code that is probable in production. I read the code used to test Django. It's not reasonable code.


It's an open sourced benchmarking comparison. If you can improve them, shoot them a pull request!

https://github.com/TechEmpower/FrameworkBenchmarks


I think pypy is of interest because of pypy-stm that attempts to circumvent the GIL. Not only for speed benefits.


Correct me if I'm wrong here, but isn't that only helpful if you have multithreaded python programs? I have found that if my process is too slow, I can consider porting it to numpy/numba, cython, using pypy or dividing up the work using multiprocessing. multiprocessing is barely more work than using the threading module, and completely avoids the GIL AFAIK.


Well, multiprocessing (the modules) is problematic because not everything can be pickled. If you are working with simple functions this can be fine, but often we use third party libraries that use lambdas (that can't be pickled). Often you don't know why something can't be pickled.

What STM (as I understand from blog posts from the pypy team) is that it provides real threading without having to worry about pickling.

And yes, it only helps if you have multithreaded programs. However, multithreaded and parallel programs are very relevant today.


I found that PyPy sometimes has unexpected slowdowns. When we were porting from Python to PyPy on some offline processing tools, the most crazy one was building strings via += and sum(arrays,[]), which is much slower than cpython.


There was a good blog post this by Armin. Basically, if you have to concatenate strings, don't do so via += but use "".join([]).


I find that unexpected. Java has had string builder optimization for a long time, and CPython is much much faster in this respect. It's not always easy to use "".join when using a string, so you end up having to build a separate array of strings in some cases. And building arrays isn't always that fast either. And [].join doesn't exist, so summing arrays is always kinda slow.

Anyway, all that is to say: I really like PyPy, and we use it a lot, but those _unexpected_ crazy slowdowns are unfortunate.


Why are so people so fast to trust a new compiler?

Have you audited that compiler's source and building the compiler from it with a known good compiler? Are you inspecting the resulting .pyc, and in this case, the resulting PEs? It's a super easy way to inject a compromise into a package that will probably get widely distributed.

full disclosure: link is down for me, so haven't read the article. Been a comment that has been building up for a while for me, and not specific to Nuitka. Same goes for new frameworks/languages/etc.


>Have you audited that compiler's source and building the compiler from it with a known good compiler? Are you inspecting the resulting .pyc, and in this case, the resulting PEs? It's a super easy way to inject a compromise into a package that will probably get widely distributed.

Because nobody is that paranoid?


And if they are, they probably aren't in the business of distributing precompiled binaries.


I think it unlikely someone would design something complicated like a new compiler / programming language, open source it and attach their real name to the project just to hide an exploit in it.

There's much lower hanging fruit.


Why limit yourself to new compilers? Every new version of an old compiler could be suspect, every line of code in general. There's no point in being this paranoid, nobody has time for all the audits that might be theoretically desirable. Almost everything you do on a computer relies on trusting an untold number of components.


Mirror: http://web.archive.org/web/20141218012211/http://nuitka.net/

BTW how much faster is this than PyPy or CPython? That is the question that I think is most on people's minds.


Seems to be 2x faster than CPython, but as I understood it is working on optimizations now.


As others have pointed out, these claims have to be taken very carefully and the main value (at least for now) seems to be the easy packaging and deploying.


I program Django websites and work in an environment where we like to deploy often. I'm not sure we could tolerate the extra wait in compile time for deployment. At least we would need to take a serious look at the speed/resource gain before considering it.


Deployment in a controlled environment is not the ideal use case here - the benefits come from when you are shipping to non-technical users.


It's been said before but it's fairly unlikely that Python is the bottleneck for typical web-apps.


For all that people may think this isnt a good idea, this is a very similar idea to what Unity3d is persuing with IL2CPP (http://blogs.unity3d.com/2014/05/20/the-future-of-scripting-...).

This seems immidately relevant to address pythons biggest problems; its slow, and its hard to distribute. Relevant quote:

    All code generation is done to C++ rather than architecture 
    specific machine code. The cost of porting and maintenance 
    of architecture specific code generation is now more amortised


Git repo is unresponsive. I think we killed the poor guys box.

EDIT: https://github.com/kayhayen/Nuitka


I'm very interested in some benchmarks on this and various other compilers/packagers/runtimes for Python.

Sidenote: A very long time ago I had some great luck with Perl's packagers: PAR, PerlApp, etc. but they all basically just bundle in the basic Perl runtime with the guts of your program so there's no real benchmark difference. I suspect the same is not true of efforts like this for Python.


This is awesome! I tested it on some code that uses some not-compeltely-trivial python features like metaclasses, etc. and it worked like a charm.

I couldn't get it to work against this obfuscated hello world though: http://benkurtovic.com/2014/06/01/obfuscating-hello-world.ht... :)


That one's relying on CPython implementation details, the author says so in his detailed description.


I really like the idea, that it compiles Python but is (99,9%) compatible to CPython -- since so you can still use the many C extensions (I don't want to do my projects any more without my own Python extensions written in C).


I don't know much about compilers and runtimes but how come there are SO MANY of them for Python. Is there something about Python that makes it so easy to write runtimes for it?


I think it's because lots of people write lots of code in python, but then end up with performance problems at some point and try to tackle those. Have a nice regular language makes it relative easy to rewrite a runtime to scratch your itch.

I suspect we would have seen this more with languages like Perl, except Perl is virtually impossible to parse.


I think it's a matter of numbers. There are a lot of people who use Python. (https://blog.pythonanywhere.com/67/ estimates in the low millions.) Some of them like to work on alternate implementations of Python.

I think Python is average in this respect. By comparison, here's an incomplete list of C compilers: http://en.wikipedia.org/wiki/List_of_compilers#C_compilers and you can also look at the list of Pascal compilers.


I think it's largely because while a lot of people really love the language, they struggle with the performance and scaling issues that arise with the standard CPython implementation.


Python is the best "glue" language. Large 3rd party libs, easily readable/writable (even by non-programmers), ld (it predates Java), established, great user community.

So, people in different realms (MSoft IronPython, JVM/Enterprise Jython, restofus CPython, scientists/performance Cython/PyPy,Stackless,Psyco, academics/experimenters many more) all want to use it and make versions that integrate with their tools.


Popularity and a syntax that can be parsed, plus being widely taught. Pretty much the same reason why we had plenty of Pascal compilers back in the days.

And quite unlike why we have that many JavaScript transpilers.

A few of the compilers might be running on the AST, and thus have no complete alternative implementeation. A good module support for that helps, although it's still not close to VM-based languages like Java.


My immediate question, which I don't see an answer to on the site or Github page: is it self-hosting yet?


Python is a fun language, it has as many web frameworks as its runtime/interpreter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: