This Week in Mojo 2023-06-02
Mojo Playground Release
These Python packages are now available on Mojo Playground:
An optimization has been implemented to move an object instead of copying, if it's not used again.
- Issue #231: Unexpected error when a Python expression raises an exception
- Issue #199: The REPL fails when a python variable is redefined
- Blog post titled Accelerating AI model serving with the Modular AI Engine detailing the integration with NVIDIA Triton and performance improvements compared to the Tensorflow and Pytorch backends.
- Maxim made a post about Parsing CSV in Mojo and used SIMD to get speedups
- Tech With Tim did a high level introduction to the language
- Elemento wrote an article on The Story of Modular
- Gautam Ethiraj expressed his Mojo FOMO (Fear Of Missing Out)
Guido van Rossum, Python creator and CPython core developer said this about Mojo🔥 on python.org:
It’s definitely early days for Mojo. I plan to talk to Chris about how he plans it to become a true Python superset that will run existing Python code out of the box.
So far, Mojo is a separate language (they have struct but not class, Int but not int, etc.), and their “CPython compatibility” strategy involves calling a helper function that you pass the name of a module, and it will import that module and execute it in CPython, returning a Mojo proxy object. It then treats CPython as a separate language runtime and when values are passed between Mojo and CPython they are “marshalled” (boxing/unboxing Int/int, etc.). This is syntactically very smooth, because Mojo and Python have compatible syntax (so you can write e.g. x+y where x is a Mojo value and y a CPython object), but doesn’t speed up running CPython at all.
I presume that the reported speedups on PyTorch etc. are obtained by rewriting key parts of the PyTorch kernel in Mojo, applying the optimizations shown in the notebooks on the Mojo site.
Lex Fridman Interview
Lex Fridman did a 3.5 hour interview with Chris Lattner, these are answers directly related to Mojo that give new information or more context for Mojo Team Answers, but the whole interview is worth watching for more general philosophy and Modular specific information.
Curly brace languages are typically run through formatting tools now which automatically indents them, so you end up with indentation and curly braces, why not get rid of the clutter and have a more beautiful thing? Some languages allow you to use braces or indentation which adds a complicated design space, that you don't need if you just use Python style indentation.
Compile Time Metaprogramming
One of the things that makes Python very beautiful is it's very dynamic and expressive through the powerful dynamic metaprogramming features. But you can't use those features on things like a GPU due to performance costs, so we take the interpreter and allow it to run at compile time. This gives us Python style expressive API's that enable libraries like PyTorch, with no performance penalty.
This is similar to newer languages like Zig, which allow you use the core language during compile time the same way you would during runtime. As opposed to C++ templating where it's a completely different language and universe, one of the goals of Mojo is to make things really easy to use and learn so there's a natural stepping stone.
Autotune and adaptive compilation
Libraries like PyTorch have pushed ML towards an abstract specification of a compute problem, which then gets mapped in a whole bunch of different ways, this is why it has become a metaprogramming problem.
Hardware systems and algorithms are really complicated, most programmers don't want to know the intricate details of how the hardware actually works, so how do we allow people to build more abstracted and portable code, a lot of the algorithms are the same but parameters like cache size, vector length or tail size might need to change to optimize for different hardware.
Instead of having humans go and test all these things with different parameters which can grow to complex multidimensional spaces, why don't we have computers do that for us. So you can specify different options and have the compiler empirically test what the fastest implementation is for the target you're compiling to.
Speedup moving from CPython to Mojo
Interpreters have an extra layer of bytecode that they have to go and read and interpret, and it makes them slow from this perspective. The first thing that converting your code to Mojo does is get a 2x to 10x speedup without changing the code.
In Python everything's an object, the memory layout of all objects is the same, so you're always passing around a pointer to the data which has overhead from allocation and reference counting, so you can move that out of the heap and into registers and that's another 10x speedup.
Modern CPU's allow you to do Single Instruction Multiple Data (SIMD) to run the same operation on a vector of data which Python doesn't expose, Mojo builds it into the language and this can lead to more huge speedups.
Python also has the Global Interpret Lock (GIL) due to reference counting and other implementation details, in Mojo you can take direct advantage of threads.
There's even more performance improvements via things like memory hierarchy etc. Mojo allows you to take advantage of all these powerful things that have been built into hardware over time.
Python has types like strings, integers, dictionaries etc. but they all live at runtime, in Mojo you specify what the actual types are which allows the compiler to do way better optimizations, gets rid of the expensive indirections, and gives you code completion. You can progressively adopt types where you want them, but you don't have to use them if you don't want to. Our opinion is not that types are the wrong or right thing, but they're a useful thing.
Mojo is aiming to be a full superset of Python, the world went through a very long painful migration from Python 2 to Python 3, I don't want people to have to go through that if they want to move to Mojo, they shouldn't have to rewrite all their code.
Built in types like C++
I want to get magic out of the compilers and put it into the libraries, we can build an
Int that's beautiful and has an amazing API and does all the things you expect an integer to do, but maybe you don't like it and want to build a
BigInt, you can do that and it's not a second class citizen. This is opposed to a language like C++ where the builtins have special promotion rules and other things that are hacked into the compiler.
Standard Library Completeness
We provide integers, floats, buffers, tensors and other things you'd expect in an ML context, honestly we need to keep designing, redesigning, and working with the community to build that out and make it better, it's not our strength right now. But the power of putting it in the library means we can have teams of experts that aren't compiler engineers that can help us design, refine, and drive this forward.
Mojo is very useful but only if you're a super low level programmer right now, and we're working our way up the stack. Mojo is currently like a
0.1, and in a year from now it will be way more useful to a wider variety of people, but we decided to launch it early so we can build it with the community. We have a roadmap that's transparent about the current state we're optimizing for building the right thing the right way. There is a dynamic where everyone wants it yesterday, but I still think it's the right thing.
We're tapping into some deep long held pressures in the Python, AI and hardware worlds, so people want us to move faster! We decided to release early because in my experience you get something way better when you build in the open and work with the community.
Mojo doesn't force, but enables the use of value semantics. This allows you to pass objects like collections and strings into a function, and it doesn't copy any data unless you mutate it. This is the best of both worlds, where the original object won't be modified if you mutate it inside a function, but it also won't copy the data leading to a performance hit if it doesn't need to. The language allows the type author to express this behaviour without forcing it into all cases.
It's related to work done in the Rust, Swift, C++ and many other communities, it's a body of work that's been developing over many years. Mojo takes the best ideas and remixes them so you get the power of the Rust ownership model, but you don't have to deal with it when you don't want to, which is a major help in usability and teaching.
A tensor is like an abstraction around a gigantic parralelizable data set, using frameworks like PyTorch and Tensorflow you can also represent the operations over those data sets, which you can then map onto parralelizable cores or machines. This has an enabled an explosion in AI and accelerators, but also an explosion in complexity.
Researchers are smart in various domains like calculus but they don't want to know anything about the hardware or C++, so they train the model and then you have teams who are specialized in deploying the model which might have to split out onto various machines so the complexity explodes, making changes takes weeks or months because all these teams with different expertise need to coordinate which is always a huge problem.
Why is it so difficult that it takes a team of 45 people to deploy a model when it's so easy to train? If you dig into this, every layer is problematic. PyTorch and Tensorflow weren't really designed for this complicated world we have today, they were designed for when models could be trained and fit onto a single GPU. Tensorflow can scale to many machines, but most researchers are using PyTorch which doesn't have those capabilities.
The main thing that Modular is fighting against is all this complexity:
- Hardware accelerators and software stacks to interact with the hardware
- Modeling constantly changing with new research and huge amounts of diversity
- Serving technology like zero copy networking, asyncio etc. that hasn't made it into machine learning
These things have been built up over many years in their own silos and there hasn't been a first principles rethink, Modular has an amazing team to create a unified platform that solves this issue because we've worked on a lot of these silos including Tensorflow, TPUs, PyTorch Core, ONNXRuntime, Apple accelerators etc.
Our joking mission statement is to "Save the world from terrible AI software", so we need a syntax, and we wouldn't have to build the programming language if it already existed, if Python was already good enough we would have just used it, we're not doing very large scale expensive engineering projects for the sake of it, it's to solve a problem.
In the early days of PyTorch and Tensorflow things were basically CPU and CUDA, so for a dense layer with matrix multiplication you could kick off a CUDA kernel on GPU, or use something like Intel MKL for CPU. Now you have an explosion of hardware on one end with thousands of different types of hardware, and explosion of development in AI models on the other end with thousands of different operators. From giant TPU stacks to CPU's on mobile devices, whenever someone releases new hardware they need teams of people rewriting the compiler and kernel technology, which keeps out the little competitors. There is only a handful of people compiler experts out there which excludes a tonne of people. Mojo and the Modular stack brings programmability back into this world, allowing more general programmers to extend the stack without having to go hack the compiler itself. This opens it up to researchers and hardware innovators and people who know things that compiler people don't know.
There have been a number of languages that have attempted to speed up Python, with Mojo we're trying to understand what the limit of the hardware and physics is, and how to express that in software. Typically that ends up being a memory problem, you can do a lot of math inside these accelerators, but you get bottlenecked sending data back and forth to memory as the training size gets large. Typically engineers that are really familiar with the hardware and specific models would optimize for a single use case, but these models are getting so large that details can't fit in one humans head, so we need to generalize. What the Modular stack allows is someone to use it for a new problem and it'll generally work quite well.
We're not working forwards from making Python a little bit better, we're working backwards from the limits of physics.
We use the CPython existing interpreter which runs Python bytecode, Mojo uses the CPython objects to make it fully compatible, as well as the ability to use all the C packages, so Mojo keeps the objects in that representation when they're coming from CPython.
Static Metaprogramming Features
Python has amazing dynamic metaprogramming features, and they translate to beautiful static metaprogramming features which has profound implications. People use Python for things it wasn't meant to do, because it was designed very thoughtfully in this space.
Guido van Rossums's ThoughtsPython creator
We talked before Mojo launched, he found it very interesting. I have a tonne of respect for Guido in how he steered such a gigantic community towards what it is today. It was really important to get his eyes and feedback on Mojo, what he's most concerned about is how to avoid fragmenting the community. It's really important we're a good member of the community, we think Guido is interested in the path out of the reasons why Python is slow. Python can suddenly go all the places it's never been able to go before, so it can have even more impact on the world.
Python using Mojo code
We learnt a bunch of tricks along the way converting an entire community of programmers from Objective-C to Swift, we built a lot of machinery to deeply integrate with the Objective-C runtime, we're doing the same thing with Python. When a new library gets built in Mojo people should be able to use it in Python. We need to vend Python interfaces to the Mojo types, that's what we did in Swift and it worked great, it's a huge challenge to implement for the compiler people, but it benefits millions of users and really helps adoption.
If you look at companies like OpenAI building huge ML models, they're innovating in the data collection and model architecture side, but they're spending a lot of time writing CUDA kernels. How much faster could all that progress go if they weren't hand writing all those CUDA kernels. There are projects trying to solve subsets of this problem but it's fragmenting the space, Mojo provides a
Unifying Theory to stop this problem slowing people down.
I think Julia is a great language with a lovely community, but it's a different angle to Mojo, our goal is to take something great in Python and make it even better, so programmers don't have to learn an entirely new language.
The thing that will most help adoption is you don't have to rewrite all your Python code, you can learn a new trick, and grow your knowledge that way. You can start with the world you know, and progressively learn and adopt new things where it makes sense.
Hardware is getting very complicated, part of my thesis is that it's going to get a lot more complicated, part of what's exciting about what we're building is the universal platform to support the world as we get more exotic hardware, and they don't have to rewrite their code every time a new device comes out.
A lot of the feedback we've received is that people want to run it locally, so we're working on that right now, we just want to make sure we do it right.
Releasing Source Code
When we launched Swift, we had worked on it for four years in secrecy, we launched at a big event saying developers would be able to deploy code using Swift to the app store in 3 months. We had way more bugs than we expected, it wasn't actually production quality, and it was extremely stressful and embarrassing. Pushing major versions became super painful and stressful for the engineering team, and the community was very grumpy about it, there was a lot of technical debt in the compiler. I don't want to do that again, we're setting expectations saying don't use this for production yet, we'll get there but lets do it in the right way. We want to build the worlds best thing, if we do it right and it lifts the industry it doesn't matter if it takes an extra two months. Doing it right and not being overwhelmed with technical debt is absolutely the right thing to do.
A lot of people have very big pain points with Python packages, it becomes a huge disaster when code is split between Python and building C code, Mojo solves that part of the problem directly. One of the things we can do with the community, is we'll have an opportunity to reevaluate packaging, we have an entirely new compiler stack so maybe we can innovate in this area.
It uses variants to avoid performance cost and allows it to run on various hardware, e.g. a function can return a variant of None/Error, but it maps to Python try / except syntax.
We want to avoid this after learning the hard way from Swift that it distracts from building the core abstractions for the language, and we want to be a good member of the Python community. We want to be able to evolve Mojo with Python.
Mojo Team Answers
OS Kernel development
Yeah just to clarify, when modular-ites use the word
kernel they typically mean high performance numeric kernel which may be targeted at an accelerator or GPU or CPU etc. Secondary meanings are
OS kernel or
Jupyter kernel, because the word is overloaded.
Mojo is a general purpose language and can be used to replace C use cases like Rust does etc, but that isn't where we're focusing initial development. That doesn't mean we're excluding it, just that the libraries etc aren't the focus for us to build. We hope the community will be interested in filling that in and building out the use cases in time though.
Paid Licenses like MATLAB
Broadly speaking, we see Mojo as a technology, not a product. We have AI based products, and mojo is something that is very important to those products, but it also stands alone for other uses. Mojo is still young and building the right thing for the long term is the priority for us right now.
Mojo generators happen in SSA form, we haven't enabled full imperative reflection over the MLIR representation, but would like to build towards that. This is the "ultimate python decorator at compile time" after all
MLIR Dialect for Unique Requirements
I worked on Google TPUs (which have several public architecture papers), I'm familiar with difficult to program accelerators w funky requirements 🙂.
One of the major ideas in Mojo wrt MLIR and hardware is to expose "compiler engineering" to library developers instead of having to hack the compiler. That said, we have great ambitions and plans, and I don't want to get us over our skiis. We need to get lifetimes and traits (and numerous other smaller features) explained in the roadmap done before we can go out and play. The architecture is in place though.
Triton Compiler Tech
GPUs are very important to our work obviously, and we'll have something more to share about that later this year. Zooming out though, your point about "Triton had to build a compiler in order to express a new programming model" is really a key observation. One of our goals is to enable building programming models like this
as a library using the metaprogramming features in the language.
Folks shouldn't have to design an entirely new compiler/EDSL to achieve such a thing
Implicit Type Declaration
Within a function, implicitly declared variables get the type of their first value assigned into them. This is probably not the right thing - within a def, we will need to maintain dynamic typing (including type transformations like python has) for compatibility. Our base object isn't super built out and set up for this yet, which is why we have a "default to the first type" approach.
This is mostly just a placeholder for now. This has known problems and will need to be reworked when we get traits/typeclasses/protocols filled in. Do you have a specific interest/concern in mind? One problem with AnyType is that we will need to decide if it is implicitly copyable/movable, if that is trivial, etc. There are lots of properties we'll want to be able to express elegantly; none of this has been designed, but there is a lot of prior art in rust/swift/haskell/etc.
Thank you for filing this. This is known (to me) to not be supported. We have the infrastructure to do this now, but we need to decide whether we want it. There are various folks (incl on this forum) that are proposing that we eliminate 'let' declarations to simplify things, and I'd rather resolve that direction before investing more time into let declarations.
Incidentally, this discussion will come up "real soon now" as it is all tangled into the lifetime proposal. This should be coming to the community for discussion in the next two weeks.
What do we call Mojo users?
I'm fond of mojician 🪄
Generics for non trivial types
This is going to be tricky to address in the immediate term. In the absence of traits/protocols (which is scheduled to start soon) we can't reason about what members a generic AnyType has, nor can we constrain that type. This is actually a pretty big deal, because we don't have the infra to map back to what a substituted type's destructors are. As a consequence of this, it is only possible to use trivial types like Int/FP with generic algorithms. This is incredibly constraining right now 🙁
There is a separate issue where register_passable and memory-only types have different concrete ABIs / conventions. This is solvable in a simple way (just treat register passable types as memory abi when generic) or a fancier way (delay binding of ABI until type substitution)... but until we solve the trait issue, we'll still only be able to express generic algorithms over trivial types, even if they are memory only. So solving this in the immediate term isn't much of a relief.
The best workarounds right now are pretty ugly:
- Limit your generic code to trivial register passable types; e.g. add an explicit delete() method that you manually manage instead of a del method that is automatically invoked.
- Copy and paste things to make them non-generic.
sorry, this is pretty annoying to me too. I really want to get on top of this of course.