I built a small app that retrieves data from a remote server, builds a summary and then pushes it to a local html file.
Number of people that tried it when I told them that to make it work they had to download python, pip, paramiko and install the pycrypto binary? Zero. Number of people that tried it after I just gave them a zip file with an executable? No longer zero.
I tried different libraries but Nuitka was the only one that made everything work seamlessly. I owe this guy a beer.
Edit: I no longer owe Kay a beer. I found the donation link on his website.
This issue has been addressed multiple times with considerable success by pyinstaller, py2app, cxfreeze. You can get your single-zip-file distribution package without nuitka, although it may have other advantages.
I recently ended up rewriting large parts of a python project in common-lisp due to this exact issue; in this case it was one stage of a pipeline that I prototyped with an existing python library that could output to xml, but delivery of python applications on windows is painful.
I think this is one of the reasons Go was a breath of fresh air to a lot of Python and Ruby programmers. The ability to quickly generate a static binary is a great feature.
If you packaged it correctly, you could get by just requiring Python and pip to be installed. But still, the zip and single executable is still faster.
This project looks completely misguided. The talk focused on the trivialities of mapping Python to C++ rather than on the interesting problems to be encountered when trying to optimize Python while maintaining its extremely dynamic semantics. Also the benchmarking effort is laughable; pystone is not to be taken seriously (only exercises a tiny part of the language) and pybench does microbenchmarks, which are optimized away. You should try the "real-world" benchmarks from the PyPy and Unladen Swallow projects. And what is the size of the generated code? (E.g. how big would the binary for the entire standard library be?) In your blog, please use less boring subjects than "version x.y.z released". ~ Guido van Rossum
If you posted that comment to somehow discredit this project, then I'd say you miscalculated (if you had other motives, then thanks for posting it, I guess?). Most of the folks on HN know better than to accept the word of an authority figure -- particularly one so harshly worded -- and so will check out Nuitka on their own.
Regardless of whether or not Hayden succeeds here, good on him for at least trying. GVR hasn't even made an effort to improve Python's performance (i.e, by most accounts, Python 3 is even slower than Python 2). In fact, he seems intent on actively discouraging such efforts. For example, why is CPython still using a stack-based interpreter, when several people have already worked toward implementing a register-based one and were only met with derision?
I generally agree with GVR's general approach of regarding premature optimization as a mistake, and developer time is usually more important to optimize for than processor time.
On the other hand, sometimes you have to optimize code, and dropping down to C is unnecessarily risky (not to mention tedious, although that comes with the territory). The fact that there's even an alternative today with Numpy and Cython seems to have happened in spite of GVR, not because of him.
I'm sad that someone as influential as GVR would be so consistently rude and dismissive toward an effort at improving Python, no matter how misguided he viewed the effort. This is the kind of attitude that makes leading open source projects suck.
Edit: By the way, I don't mean to imply that I share the pessimism some have about Python's future. Between Python's considerable rate of adoption in upper education, numerics, machine learning, bioinformatics, and various other scientific computing fields, combined with novel approaches to the language like PyPy, Pyston from Dropbox, Nuitka, and Numba, I'm overall pretty optimistic. But it's becoming clear that the committee-driven approach of CPython, driven/impeded by the BDFL, isn't bearing the fruits it has in the past.
It's sad to me to compare my sense of the Python community ten years ago, to what it is now. It seems like there was a time in the old days when Python was meant to be fun (and even the documentation matched that attitude), whereas between version politics, it's wider adoption for 'serious' work, and GvR's hostile attitude these days it really doesn't seem like that spark is even there anymore.
I miss the Monty Python jokes and the freewheeling 'BASIC, evolved' feel of old 1.x sometimes even.
For me, this is not about execution speed AT ALL. I wouldnt even mind if it was slower.
It is about being able to have a easy, dead-simple way to provide Windows users with a executable. Without changing my code, without weird build systems. Like it or not, Windows users are still the majority out there.
I like the Lisp world's term "delivery" for this part, where you package up an application to ship to end-users. Optimization can be one part of delivery, but not necessarily the most important part.
In theory. In reality, python freezers have a lot of problems. It can be a real hassle to get a single file, and even if you do they turn out massive. I've had nothing but problems with them in the past.
I'd be happy if the setuptools bdist installer could create a single file that works in 32 and 64-bit Windows and would take care of running 2to3 during the install. The situation now is too painful for me to bother making Windows installers any more.
Exactly. I recently wanted to build a task tracking app for the company I work at. I initially decided on writing it in Python as a server-based app (Flask + SQLAlchemy) since I am familiar with it. Once I found out how damn difficult it is for end users to actually deploy the app, I opted for node-webkit (Backbone.js + nedb) instead. I definitely have no regrets.
I agree. In fact, I think that the slowness of Python implementations is a feature. It forces developers to use standard libraries, which in turn makes the program more concise. This is certainly the case in MATLAB where vectorization is pretty much needed for non-trivial programs, and this leads to improved readability.
For implementing Python, you have at least five options:
- Naive interpreter (CPython). Everything is a dict. Slow.
- Transliterate to some hard-compiled language, but all data is still one kind of dynamically typed object (Nuitka). A little faster, and compatible, but has limited optimization potential.
- Infer types and try to create appropriate code in a faster language (Shed Skin). Hard to do, but promising. (Shed Skin has one implementor.)
- Restrict the language (RPython) Potentially much faster, but incompatible.
- Build a JIT compiler/interpreter combo and handle all the hard cases that require recompiling during execution (PyPy). Hard to do, and results in a huge system, but almost compatible. After 10 years of work, it's finally happening.
If you're willing to restrict the language, it's much easier. RPython was written only to help build PyPy, but the concept could be extended to allow most of Python. Both Shed Skin and RPython insist that type inference succeed at disambiguating types. If you're willing to accept using an "any" type when type inference fails, you can handle more of the language.
The big boat-anchor feature of Python is "setattr", combined with the ability to examine and change most of the program and its objects. This isn't just reflection; it's insinuation; you can get at things which should be private to other objects or threads and mess with them. By string name, no less. This invalidates almost all compiler optimizations. It's not a particularly useful feature. It just happens to be easy to implement given CPython's internal dictionary-based model. If "setattr" were limited to a class of objects derived from a "dynamic object" class, most of the real use cases (like HTML/XML parsers where the tags appear as Python attributes) would still work, while the rest of the code could be compiled with hard offsets for object fields.
The other big problem with Python is that its threading model is no better than C's. Everything is implicitly shared. To make this work, the infamous Global Interpreter Lock is needed. Some implementations split this into many locks, but because the language has no idea of which thread owns what, there's lots of unnecessary locking.
Python is a very pleasant language in which to program. If it ever gets rid of the less useful parts of the "extremely dynamic semantics", sometimes called the Guido von Rossum Memorial Boat Anchor, it could be much more widely useful.
> The big boat-anchor feature of Python is "setattr", combined with the ability to examine and change most of the program and its objects. [...] If "setattr" were limited to a class of objects derived from a "dynamic object" class, most of the real use cases [...] would still work, while the rest of the code could be compiled with hard offsets for object fields.
Isn't __slots__ made for that specific usecase? (when you want to optimize your code by specifiying specific attribute names)
I think the "dynamic" behavior is a sane default (since most people don't need that optimization).
setattr and similar dynamic features do make Python harder to optimize, but they're not that different from what you see in JavaScript. JavaScript has had an incredible amount of work spent optimizing it, but the result is a bunch of pretty damn fast language JITs that implement the full language, usually only slowing down in cases where actual use of those features requires it. Is it really that hard to come up with something like that for Python?
> Is it really that hard to come up with something like that for Python?
Nope, we just need a large company willing to spend tons of money funding the effort.
PyPy has made incredible strides in this area, especially for long running processes where the JIT has time to warm up. But they need a lot more funding if people ever want Python to get fast.
I'm very impressed that PyPy got their JIT working. But it took 10 years from initial funding by the European Union. It's a hard problem. They had to come up with some new, elegant solutions to make it work. See "https://pypy.readthedocs.org/en/release-2.3.x/jit/pyjitpl5.h...
JavaScript is a bit easier because it doesn't have concurrency. In Python, you can change a method of an object while an instance of that object is being executed in another thread. So you have to worry about invalidating code currently being executed asynchronously.
Guido's response, which is entirely unprovoked and rude (given that it is for a small volunteer effort, that has already achieved something admirable, doesn't ask anything from him, and doesn't harm his CPython in any way), seems to me worse than anything I've read from Linus (who's just a dog that barks but doesn't bite, and just uses the insults for emphasis).
Looks like the response I would expect from someone who has seen tons of "Why not just compile it and see it get magically faster?!?!" queries.
It's easy to compile it, but making the compilation actually useful for a dynamic language is HARD. There's a reason that most dynamic languages will either interpret or JIT, and it's not because the JIT writers overlooked something obvious.
A naive translation of the code will remove a little bit of overhead from the bytecode dispatch, but the resulting code bloat will blow out instruction caches for any reasonable sized code. In small programs with one translation unit, analysis can sometimes work to speed things up, but it quickly becomes either undecidable or intractable.
Basically, on dynamic languages, you can often see static compilation being worse than a naive attempt at native code, especially once the hot path for the interpreter fits in cache, but the generated native code no longer does. Wanting to see results on real world benchmarks is entirely reasonable.
CL goes near C speeds with an INCREDIBLE amount of work to make your program as static and C-like as possible. It also requires intimate knowledge of your particular implementation.
Sometimes it's not even possible (!) to get C speed because of things like float boxing across function boundaries.
Seriously? I think Python is fantastic, and IMHO this is due to Guido. But seriously, how can he possibly comment on someone else's attempt to improve the performance of Python? He should look in a mirror when saying that ;-)
It's a hard problem, I wish more people would attempt to take care of it.
I like Guido but that comment seems too harsh and commenting on how boring his blog titles are is completely unnecessary. The talk was very interesting and is very neat for only a spare time project.
I don't know why (espacially key-) people in development (Torvalds, now Rossum etc.) seem to have a hard time phrasing their thoughts with a little more consideration.
Would it hurt to put it more like "this project is not of interest to me" - what is completely misguided in working on something and trying things? Or to restrain oneself from bashing the effort as being "laughable" and instead try to offer ideas for improvement or presenting the alternatives without discrediting the whole thing. And if he is bored, why not just skip the whole thing?
People in these positions (e.g. Torvalds) are generally extremely opinionated (which is good, because it can provide direction to a project in its early days), but have also spent years listening to people without sufficient skill or experience attempting to provide ideas, patches, or commentary that they're not qualified for.
At some point, the effort of letting someone down gently fifty times a day gives way to a curt but efficient form of communication.
In this case, GvR is just laying out what he sees as issues, take them or leave them. If you don't care for his opinion, fine, if you do, there you go.
W/R to language design, if you start designing without really caring about it, you probably won't finish. And that means that the designers of popular languages typically have unreasonable opinions. How that manifests itself depends a lot on the person, though.
Wow, I'm really surprised. Guido is usually incredibly nice, but he comes across as very condescending in that comment.
I love Python, it's my favorite language by far - but I love all those different tools (numba, pypy, hope, nuitka, cython etc) that make you sacrifice a small amount of dynamic magic in exchange for significant speed ups.
I don't have to use them - but when I need to write fast code, it's really nice to be able to do so in Python.
Are the names on the comments verified? This seems very much out-of-character for Guido, he's generally been quite supportive of efforts to improve Python's speed (eg. Unladen Swallow, PyPy) or add static typing as a library. It occurs to me that with this blog interface, I could post as "Ken Thompson" and nobody would be the wiser.
I can verify that van Rossum's comments are by him. I saw him walk out of the aforementioned Nuitka presentation at EuroPython a few years back, and clearly out of annoyance with it.
That Google Groups thread shows that he's irritated because he thinks that the Nuitka author doesn't understand the issues: "... he is incredibly naive about what kind of optimizations he'd like to apply. (He basically doesn't seem aware of the difficulties arising with static analysis of Python.)" While the Unladen Swallow and PyPy developers do understand the issues.
I think it's in character. I gave a talk at another EuroPython on measuring performance. Partway through he asked "isn't this just a repeat of the timeit module"? My response was "yes, I'm explaining why it works the way it does." I think that was sufficient to mollify his irritation.
I sympathise with the developer because I think the tone of the criticism he's been getting is very far short of nice.
...But...On a purely technical level...I see a lot of problems with his response.
I think there's real merit in this project on the distribution side of things --- not the optimisation side. I hope he pivots the project in that direction.
That post by Behnel is a good find. FWIW, I was at the talk, and my memory was that I also agree that the speaker didn't know what he was talking about regarding optimizations and took incredibly too long to talk about it.
So when the talk is titled "for the first time, there is a consequently executed approach to statically translate the full language extent of Python, with all its special cases, without introducing a new or reduced version of Python", a listener would expect that it be more advanced than previous work. To be fair, neither Aycok's nor Salib's work was complete, but they did enough groundwork to show that static analysis of the type that Nuitka was exploring would not be able to achieve its stated goals.
But all of this pointing to a discussion from 2012 is pretty pointless, as people do use it for distribution - something which wasn't touched on during the talk, as I recall - while the author did talk about aiming for type inference at compile time, when all evidence is that the speaker had little idea of the actual issues, as previous reported in multiple previous attempts.
I honestly don't know what to do when someone presents a talk as a pure hobbyist, puttering around on a project with little interest in what others have done, and who doesn't understand the audience enough to gauge which details are of interest and which aren't. There's a culture mismatch, certainly, but that's what makes it harder to assign fault.
Should Hayen have realized that the project wasn't at the right level for EuroPython, or that the future project goals were overstated? Did EuroPython get more advanced over time? Did van Rossum not allow that weekend hobbyists don't have the experience to judge things? Should van Rossum never say anything negative in public about a project (and if so, at what level of fame does one need to bear that in mind)? I certainly have no answers to those.
That is a very good point, and it would be a shame if Nuitka's developer (and people in this thread) would have gotten a wrong impression on GvR from something like that.
So this would be the same Guido van Rossum who made a completely misguided analysis of the language he had birthed, tried to make it "grow up" in version 3.x, brow-beated us all about the dubious benefit of this "new" cruft-encrusted language, causing strategic drift for the entire ecosystem? The best thing that could happen to Python would be a benevolent, and mercifully silent, retirement of GvR. And I speak as a long term Pythonista.
Kudos to anybody who is trying to move Python into a performance envelope similar to Lua and Javascript, not to mention Golang, Julia, or Clojure.
GvR fading away would help but not cure the problem he started, i.e Python design by committee. The Python culture results in substandard implementations of new features after waiting years for them to come to fruition. (asynccore/asyncio.. :( )
Of all the current dynamic languages Python is the slowest moving on almost every front. Ruby become popular around the time Python 3.X was coming out, at the time it was much slower and riddled with 1.8/1.9 issues. Since that time Ruby has surpassed Python almost everywhere whereas Python 3.X which should have had the freedom to do great things considering it broke backwards compat with 2.X has languished in it's slow performance.
Sure there is great things happening in and around the numpy community.. but that is tiny compared to the great things happening in Julia, Rust, Ruby, Golang and Javascript. (I include the last 2 despite my personal opinion being that they are not as good, but one can't deny they have made progress).
I loved Python but it's standing still and I can't afford to do that anymore. I said my very solemn goodbyes to programming Python full time a while ago and I think it's one of the best decisions I have made.
That being said. Good on this dude for doing something about Python performance without doing Cython style stuff.
It's not the only killer feature. Where Python 2 lacks consistency Python 3 fixes it. Where Python 2 core libs have stagnated, the Python 3 ones have been improved. All future development from core CPython developers is going into Python 3. That's a pretty killer feature.
Slow, incremental improvement is not a killer feature. Compiled Python 10x faster. That would be killer. Proper multi-core concurrency. Killer. Modest (yet breaking) improvement over 6 years while all the action goes on in other languages? Not killer.
But you cite a very important point: all the improvement, modest as it is for the majority of users, is going to 3.x and yes, that's why I've moved. But I can tell you it's only because of the constant nagging and threats about abandoning 2.x. Nobody would have moved for any actual "feature" of 3 were it not for the fear of being abandoned. It's a stick-only strategy. No carrot.
In my book, these are only killer features compared to Python 2, not in comparison to the rest of the world, and more along the lines of minimum necessary to justify the pains of a breaking change.
Agreed on all points though I think you underestimate to what extent Numpy is "carrying" Python. Without Numpy Python would be dead and buried in my view. Sure tons of stuff is "pure" Python but the core of its cred comes from the huge quality of some libraries which are built around Numpy (Pandas but one example).
Separately though, sometimes when you want to break the dead hand of the committee-driven, value-destroying hold of a small group of individuals, you have to mercifully push aside the figurehead that provides their credibility. That figurehead is Guido van Rossum, and his post speaks more than what I could about how his time is over.
I have a hard time believing that NumPy is "carrying" Python to the extent you suggest.
While a large number of people use Python because of NumPy, my work in chemical structure analysis and search doesn't depend on NumPy. I use the package about once a year, and few of the chemistry analysis tools I have a dependency on it. I used to be involved with the Biopython project, and while NumPy is strongly recommended for a couple of the modules, it's not a requirement.
Then there's all the people who use it because of Django, and Zope before that. (I remember that the 2000 Python conference seemed to be half Zope developers.) Plus the win32 extensions get about 7,000 downloads per week, implying a pretty active base in that area.
Checking the PyCon talks, only about 5% of them seem to deal with numeric work. (Then again, there's SciPy ... but there's also DjangoCon and local non-NumPy meetings; certainly few in our local user group meeting work with NumPy.)
How then did you draw your conclusion?
Python-the-language did make three changes to support the language: multi-dimensional array slicing, the Ellipsis notation, and most recently the infix operator for matrix multiplication. But I believe that any language which couldn't handle a NumPy-like module would not be that successful in the first place.
I think you're overestimating the amount of people using Numpy. There's no way it's more popular than Django, scientific computing is mainly academic anyways.
In addition the majority of scientist don't install numpy from pipy because it almost never works. Many use the numpy that comes with the distribution(example Ubuntu), a Python distribution (ActiveState, PythonXY, Enthought, Anaconda), or binary installers in Windows. So I would say that numpy usage numbers are a lot higher than what was shown in the other comment.
I am sorry, and at risk of downvotes, web dev is a much faster (dare I say it more fickle) crowd than science. Python is not a whole lot better than Golang, Ruby and of course, Node.js at this use case. Absolute numbers are not persuasive to me here. There are far fewer scientists than web devs (your stats imply ~ 1:2). Indeed I would venture to suggest that your data means that numpy punches well above its weight relative to other languages' science:webdev ratios. Let's not forget that Numpy and its direct linear ancestor, Numeric, has been dominant for more than a decade. Show me the Python web framework which can say the same. Flask/Bottle/Pyramid/Django/Tornado - pick your fashion.
Another point (to the grandparent posts): putting Numpy into a "science" pidgeon hole is erroneous. It is massively used in finance, engineering, bioinformatics, and statistics.
... and there can still be more scientists using Python than web devs using it.
Python web development is niche, despite the good frameworks. If you look at job postings, it's eclipsed by RoR, .NET, Java... maybe even PHP.
> Let's not forget that Numpy and its direct linear ancestor, Numeric, has been dominant for more than a decade. Show me the Python web framework which can say the same.
Django was released almost 10 years ago, and I remember it was already popular, a lot of people migrated from Plone.
The point is that Python's USP is Numpy. Python's USP is not web dev. If Python were to disappear there are many, many credible web dev alternatives. Not true Numpy. Take that from an R, Julia, C user. Numpy has the perfect combo of speed wrapped in an expressive language. I don't see that anywhere else.
And I think django tends to have more downloads than numpy. For django I tend to have a separate virtualenv so I know what to deploy on the server (so that's at least 2 downloads, plus one more for every new project I start). When I use python to calculate something/analyze data I don't care about deploying, and I think the scientific crowd is less likely to care about best practices in general.
Just a thought, not so certain myself how important it really is, but:
The web dev world also might mostly abandon Python very fast, vs once you got a foothold in academia there is a good chance you're going to keep that for years. Having a language in the universities' curricula gives exposure for a long time.
I think this is a really solid approach. It's the same thing I was trying to do for PHP with phc (http://phpcompiler.org/) - compiling down to C using the built in stdlib and C API, then using full program optimization.
Long term, it won't be as fast as a full JIT, as the Facebook HPHP team showed, but Python doesn't have a JIT of the same caliber, so this is probably useful for tons of people.
Dunno if the author is around, but they might find some of the stuff from my PhD relevant, especially how the static analysis worked and some of the challenges of compiling using the C API: http://paulbiggar.com/research/#phd-dissertation
Checked contributors list on Github[0] and most of the code is written by a single person, Kay. Isn't that amazing? A project of this large and value, done by single person in his own time. I wish this project gets more popularity.
Great project. Compiling a simple PySide application worked fine on the first try.
If you bundle all dependencies you get a binary about the same size as a binary produced by py2exe. I will do some speed comparisons next.
It is somewhat faster than CPython already, but currently it doesn't make all the optimizations possible, but a 258% factor on pystone is a good start (number is from version 0.3.11).
Holy cow. And here I'd set aside Python in part because of the difficulty in getting performant executables (I was getting visible lag in a turn-based SDL game ...)
Amen to Cython. When used well (eg. with C types for variables and a few compiler directives, which is not very painful at all), Cython emits code that is extremely similar to idiomatic C for my scientific computing workloads.
Might want to take "somewhat faster" with a grain of salt. I just tried it on a smallish program (CPU-bound, mostly string operations) and got a 20x slowdown.
To be honest, I found that in my use case (game dev, supported by bindings to a C library), PyPy was actually slower than CPython. I'd be curious to dig up my code and compare performance to a Nuitka-compiled executable.
For games, other soft realtime, and run once code, pypy is slower. The newer GC is better, but the JIT still causes pauses. The slow warm up of the jit, no type hint saving, and slower interpreter is why it is not good on run once code. Of course if your python extensions are calling optimized assembly routines, or hardware, then it will be faster than pypy code written in python.
Also, using a JIT is not possible on some platforms (iOS, or if the CPU architecture isn't supported by pypy).
Not to say that pypy isn't better for certain tasks of course.
That's pretty much what Nuitka is. Unfortunately, PyPy is only fast because it can take advantage of runtime information in the JIT. AOT compilation loses that advantage.
Not easy at all. ART generally compiles quite static languages (Java). Python is an incredibly dynamic language. The challenges for AOT compilation are vastly different.
On a tangentially related note: here is a comparison of various Python runtimes (interpreters and compilers including Nuitka and PyPy) on a fairly complex scientific code: http://arxiv.org/abs/1404.6388
The shootout, however, is from August 2013 so getting a bit old by now.
I would love to make distributing games built on Python easier. Jessica McKeller suggested this should be a priority[0] to make the language more accessible by getting kids involved in making games and being able to share them, easily, with their friends. These days Javascript is kicking our butt in this area.
I've been writing little helper libraries on top of Python 3.4 + pysdl2 as I've been working on games and demos for my (unfortunately cancelled) Pycon talk. One area I've only played with, unsuccessfully, is getting packaging going on Nuitka or some other compiler. If anyone wants to get together to make it awesome get in touch.
If you filter by Python, you can compare some results running on both CPython and PyPy. I would be curious to know what it is about these benchmarks that makes PyPy perform poorly. I would also be interested to see how Nuitka performs.
I'm supremely sceptical about those benchmarks. Go take a look at the code being tested. I would welcome a serious look at developmental time versus resource used, using code that is probable in production. I read the code used to test Django. It's not reasonable code.
Correct me if I'm wrong here, but isn't that only helpful if you have multithreaded python programs? I have found that if my process is too slow, I can consider porting it to numpy/numba, cython, using pypy or dividing up the work using multiprocessing. multiprocessing is barely more work than using the threading module, and completely avoids the GIL AFAIK.
Well, multiprocessing (the modules) is problematic because not everything can be pickled. If you are working with simple functions this can be fine, but often we use third party libraries that use lambdas (that can't be pickled). Often you don't know why something can't be pickled.
What STM (as I understand from blog posts from the pypy team) is that it provides real threading without having to worry about pickling.
And yes, it only helps if you have multithreaded programs. However, multithreaded and parallel programs are very relevant today.
I found that PyPy sometimes has unexpected slowdowns. When we were porting from Python to PyPy on some offline processing tools, the most crazy one was building strings via += and sum(arrays,[]), which is much slower than cpython.
I find that unexpected. Java has had string builder optimization for a long time, and CPython is much much faster in this respect. It's not always easy to use "".join when using a string, so you end up having to build a separate array of strings in some cases. And building arrays isn't always that fast either. And [].join doesn't exist, so summing arrays is always kinda slow.
Anyway, all that is to say: I really like PyPy, and we use it a lot, but those _unexpected_ crazy slowdowns are unfortunate.
Why are so people so fast to trust a new compiler?
Have you audited that compiler's source and building the compiler from it with a known good compiler? Are you inspecting the resulting .pyc, and in this case, the resulting PEs? It's a super easy way to inject a compromise into a package that will probably get widely distributed.
full disclosure: link is down for me, so haven't read the article. Been a comment that has been building up for a while for me, and not specific to Nuitka. Same goes for new frameworks/languages/etc.
>Have you audited that compiler's source and building the compiler from it with a known good compiler? Are you inspecting the resulting .pyc, and in this case, the resulting PEs? It's a super easy way to inject a compromise into a package that will probably get widely distributed.
I think it unlikely someone would design something complicated like a new compiler / programming language, open source it and attach their real name to the project just to hide an exploit in it.
Why limit yourself to new compilers? Every new version of an old compiler could be suspect, every line of code in general. There's no point in being this paranoid, nobody has time for all the audits that might be theoretically desirable. Almost everything you do on a computer relies on trusting an untold number of components.
As others have pointed out, these claims have to be taken very carefully and the main value (at least for now) seems to be the easy packaging and deploying.
I program Django websites and work in an environment where we like to deploy often. I'm not sure we could tolerate the extra wait in compile time for deployment. At least we would need to take a serious look at the speed/resource gain before considering it.
This seems immidately relevant to address pythons biggest problems; its slow, and its hard to distribute. Relevant quote:
All code generation is done to C++ rather than architecture
specific machine code. The cost of porting and maintenance
of architecture specific code generation is now more amortised
I'm very interested in some benchmarks on this and various other compilers/packagers/runtimes for Python.
Sidenote: A very long time ago I had some great luck with Perl's packagers: PAR, PerlApp, etc. but they all basically just bundle in the basic Perl runtime with the guts of your program so there's no real benchmark difference. I suspect the same is not true of efforts like this for Python.
I really like the idea, that it compiles Python but is (99,9%) compatible to CPython -- since so you can still use the many C extensions (I don't want to do my projects any more without my own Python extensions written in C).
I don't know much about compilers and runtimes but how come there are SO MANY of them for Python. Is there something about Python that makes it so easy to write runtimes for it?
I think it's because lots of people write lots of code in python, but then end up with performance problems at some point and try to tackle those. Have a nice regular language makes it relative easy to rewrite a runtime to scratch your itch.
I suspect we would have seen this more with languages like Perl, except Perl is virtually impossible to parse.
I think it's a matter of numbers. There are a lot of people who use Python. (https://blog.pythonanywhere.com/67/ estimates in the low millions.) Some of them like to work on alternate implementations of Python.
I think it's largely because while a lot of people really love the language, they struggle with the performance and scaling issues that arise with the standard CPython implementation.
Python is the best "glue" language. Large 3rd party libs, easily readable/writable (even by non-programmers), ld (it predates Java), established, great user community.
So, people in different realms (MSoft IronPython, JVM/Enterprise Jython, restofus CPython, scientists/performance Cython/PyPy,Stackless,Psyco, academics/experimenters many more) all want to use it and make versions that integrate with their tools.
Popularity and a syntax that can be parsed, plus being widely taught. Pretty much the same reason why we had plenty of Pascal compilers back in the days.
And quite unlike why we have that many JavaScript transpilers.
A few of the compilers might be running on the AST, and thus have no complete alternative implementeation. A good module support for that helps, although it's still not close to VM-based languages like Java.
Number of people that tried it when I told them that to make it work they had to download python, pip, paramiko and install the pycrypto binary? Zero. Number of people that tried it after I just gave them a zip file with an executable? No longer zero.
I tried different libraries but Nuitka was the only one that made everything work seamlessly. I owe this guy a beer.
Edit: I no longer owe Kay a beer. I found the donation link on his website.