PyXL logo PyXL

πŸ§ͺ GPIO round-trip at 480ns

Python, in hardware. 480ns GPIO. No interpreter. No C. Just PyXL.

TL;DR

What is PyXL?

PyXL is a custom hardware processor that executes Python directly β€” no interpreter, no JIT, and no tricks. It takes regular Python code and runs it in silicon.

A custom toolchain compiles a .py file into CPython ByteCode, translates it to a custom assembly, and produces a binary that runs on a pipelined processor built from scratch.

What PyXL is not

It's a real processor for Python, built for determinism and speed.

Where does it run?

PyXL runs on a Zynq-7000 FPGA (Arty-Z7-20 dev board). The PyXL core runs at 100MHz. The ARM CPU on the board handles setup and memory, but the Python code itself is executed entirely in hardware.

The toolchain is written in Python and runs on a standard development machine using unmodified CPython.

Wait β€” what’s a GPIO?

GPIO stands for General Purpose Input/Output. It’s a simple hardware pin that software can read from or write to β€” a way to control the outside world: LEDs, buttons, sensors, motors, and more.

In MicroPython (like on the PyBoard), your Python code interacts with C functions that handle hardware registers underneath. It’s reasonably fast, but still goes through a Python VM and a software stack before reaching the pin.

PyXL skips all of that. The Python bytecode is executed directly in hardware, and GPIO access is physically wired to the processor β€” no interpreter, no function call, just native hardware execution.

Now for the GPIO test. What was the video?

I have connected two pins in the Arty board with a jumper cable.

Then, I wrote a python program that measures the time from when GPIO pin1 is set to 1, until 1 is measured on the other pin connected to it.

The video shows a comparison between PyXL and PyBoard that runs MicroPython VM.
Let's focus on how PyXL does its thing.

The program

from compiler.intrinsics import *


def main():
    pyxl_write_gpio_pin1(0)              # Reset output pin

    c1 = pyxl_get_cycle_counter()        # Cycle counter (100 MHz)

    pyxl_write_gpio_pin1(1)              # Set output pin
    while pyxl_read_gpio_pin2() == 0:    # Wait until input pin is set to 1
        continue

    c2 = pyxl_get_cycle_counter()        # Cycle counter (100 MHz)

    return (c2 - c1) * 10                # Return result in nano seconds (each cycle is 10 ns)
          

As you can see, this is a regular python program, but it also has some unfamiliar function calls.
These functions originate from compiler.intrinsics module.

pyxl_get_cycle_counter()

Gets the current cycle counter from the PyXL CPU. This counter advances by 1 on every tick

pyxl_write_gpio_pin1()

Writes a value (0/1) to a GPIO pin. These are low-level intrinsics exposed by the compiler β€” currently hardcoded for this test, but will evolve into a more general pyxl_gpio_write(pin, value) API.

pyxl_read_gpio_pin2()

Reads the value from Pin2. Same API comment is true here as well.

Wait, why isn't there a call to the main function?

The main function is just defined, but not invoked. why?

At current stage, PyXL calls the main function automatically when it runs a program.

This is just a convenience feature (for dev) and will change in the future.

So how does it work?

As described above, the program is compiled to a CPython Bytecode and then compiled again to PyXL assembly. It is then linked together and a binary is generated.

This binary is sent via network to the Arty board, where an ARM CPU gets the application, copies it to a shared memory with the PyXL HW and starts running it.

A typical Python runtime (CPython or MicroPython in case of the PyBoard or Python for embedded in general) has a big overhead that is caused by running the ByteCode on a Software based VM. In PyXL there's no VM, the HW does everything.

As for reading and writing the GPIO - The GPIO headers are directly mapped to FPGA pins, and physically wired into PyXL's core top-level module. Think of it as the main function of the HW.

In this test, all code and data reside in predictable low-latency memory, ensuring deterministic behavior (real-time behavior). This means that for the same input, it'll take the exact same time to run.

So how do these platforms compare?

GPIO Roundtrip Latency (ns). Lower is better.

PyXL480ns
MicroPython (PyBoard)14,741ns

As you can see, PyXL is 30x faster than PyBoard.

Also, remember that PyXL's clock speed is lower than PyBoard.

The reason for not operating at a higher clock is that PyXL is prototyped on an FPGA and PyBoard has an ASIC. But the gist of it is that it's not a limitation of PyXL and higher clocks can be achieved.

Since a higher clock is achievable, we need to compare apples-to-apples and normalize the clock frequencies.
That brings PyXL’s normalized advantage to ~50x over PyBoard.

Why don't both tests run the exact same code?

To the keen eyes among you, you may have noticed in the video that the PyBoard code and the PyXL code aren't the same.

Both are Python, this is obvious, but there're two main differences:

1. API calls for measuring time and reading/writing GPIO pins. The reason being that this is not CPython that runs on a host, but systems that are aware of the underlying hardware, bringing their own runtime environment with them.
Each platform has its own hardware access API calls, but regular python code is still portable between the platforms (as long as they support whatever Python feature you want to use).

2. The PyBoard runs the test in a tight loop to compensate for jitter and cold cache.
MicroPython running on the PyBoard has runtime jitter. The results are between 14-25 micro seconds in my test. So I wanted to compare to PyBoard after significant warm up to show how much better PyXL is even in such case.
PyXL, by contrast, is fully deterministic. So long as the jumper is connected, PyXL returns a consistent 480ns every time.
This makes PyXL suitable for real-time use cases.

Big deal, who cares about making a signal go a bit faster?

This isn’t just a performance boost β€” it's an unlock. PyXL brings a level of responsiveness and determinism that Python has never had in embedded or real-time contexts.

Python VMs β€” even those designed for microcontrollers β€” are still built around software interpreters. That introduces overhead and complexity between your code and the hardware.

PyXL removes this barrier. Your Python code is executed directly in hardware. GPIO access is physical. Control flow is predictable. Execution is tight and consistent by design.

With this unlock, PyXL can be further developed and adapted to these use cases:

With PyXL, you can write performance-critical code once β€” in Python β€” and ship it as-is.

Sounds interesting? Let's talk.

Reach out if you're curious.