Mars - a kernel programming language
2023-10-28

This language has been sort of nebulously defined in my head for a while. I’ve tried and failed to make a grammar I’m happy with (fuck you Backus-Naur form) and I’d like to dump my brain out and make some sense of it. I’m heavily influenced by Odin lately, I think it is a wonderful language in terms of syntax and language features. I’ll be stealing borrowing many features from it that I enjoy.

But Odin isn’t a kernel programming language. It is possible to create a kernel with it, but it is first and foremost a language for application (mainly graphics) programming on modern systems. I want something that goes deeper. I’ll be calling this language Mars for the rest of the article.

Mars will be explicitly created for the Aphelion ISA. I don’t have plans to target other architectures.

Goals

  • Minimal Runtime - The only code generated is code that the programmer writes. Exceptions to this rule should be visible and explicit. The language and compiler should never go behind the programmer’s back.
  • No Standard Library - Standard libraries can often be monolithic and opaque. Some languages avoid this and have very neat, tidy libs, like Odin. I’ve decided to avoid the problem altogether by not hosting one at all. I believe this incentivises creativity, code-sharing, and independent library creation among users of the language. writing a standard library also sounds like ass
  • Extensive Control - There should be very little loss of capability or control between Mars and writing direct assembly. Almost anything that assembly can do, Mars should be able to do. The programmer should be able to control how the hardware is used as much as possible.

Types

Mars’ types are pretty much wholesale borrowed from Odin, but I leave behind some of its more complex ones for simplicity’s sake.

Mars will support all the regular integer types, following traditional naming conventions (no not yours, C).

i8, i16, i32, i64, int (alias for i64)
u8, u16, u32, u64, uint (alias for u64)

Mars has built-in boolean types, with sizes to match the integers.

b8, b16, b32, b64, bool (alias for b64)

It will support traditional size floating point types as well.

f16, f32, f64, float (alias for f64)

Not all platforms (versions of Aphelion) will support these operations. If the target does not have floating point instructions for a specific type, a software floating point implementation for these operations will be included in the runtime library. More on this later.

Mars will support basic composite types: struct, union, and enum. These all operate pretty much equivalently to their C implementations, with the exception of enums, which follow more closely to their Odin implementation (albeit without any runtime type information).

There will also be arrays and slices. Arrays must have a compile time constant length, and their size is part of the type itself. Arrays of different sizes are fundamentally different types. Slices are a language-defined struct containing a base pointer and a length. These fields are directly accessable. Slices are a generalization of arrays, but they function more like a pointer than a concrete array. They do not own the memory they point to, only reference it. Principles for pointers apply to slices.

Syntax

I’ll be borrowing from Odin here. I love its declaration syntax.

// name : type = value
x : int = 3
x :     = 3 // implicit type "int" from integer literal
x := 3 // same as above

x : int // variables can be declared without a value. 
        // they are automatically initialized to their type's "zero" value.

Constants must be determined at compile time.

// name : type : value
x : int : 3
x :     : 3
x :: 3 // same as above

Literals have special “untyped” types.

c :: 1    // untyped_int
c :: 1.0  // untyped_float
c :: true // untyped_bool
c :: null // untyped_null

Untyped literals can be implicitly converted to related concrete types. Integer literals can work for any numeric type, float literals can work for any float type (or integer, if the underlying value is equivalent to an integer, ex 1.0), bool literals can convert to any boolean type, and null can convert to any pointer type.

Contants also inherit untyped types from literals. This means that with c :: 10, c can be used anywhere that a literal 10 can be used.

Functions are represented as function pointers (sizeof(func) == sizeof(pointer)). As an example, a “find” function might look like this:

find :: fn(haystack, needle: []u8) -> (index: int, found: bool) {
    // ...
}

Functions support multiple returns as well as named return variables.

Strings Fucking Suck

Strings are a headache. I’ve decided not to deal with them at all. String literals are converted into utf-8 encoded u8 slices at compile time and are treated exactly as such.

This means "hello" is the exact same as {0x68, 0x65, 0x6C, 0x6C, 0x6F}.

Inline Assembly

Mars handles inline assembly in a different way than many other languages. I’ll give an example, and then describe how that example works.

x := 1  // implicit type for integers is int (i64)
y := 2
z := 3

asm (x -> "ra", y <- "rb", z <-> "rc") {
    "add ra, ra, 1",  // ra = ra + 1
    "mul rb, ra, 10", // rb = ra * 10
    "shl rc, rc, 4",  // rc = rc << 4
}

// x == 1
// y == 20
// z == 48

The block is fairly straightforward, it is simply multiple comma-separated strings, each a line of assembly. The interesting mechanics happen in the parameters.

The parameters describe how values should be arranged in registers before and after the assembly block.

  • x -> "ra" : The value in variable x must be placed into register ra immediately before the assembly block. Any changes made to ra does not affect x.
  • y <- "rb" : The variable y is set to the value in register rb immediately after the assembly block.
  • z <-> "rc" : The variable z is placed in register rc before the assembly block, and set to the value of rc immediately after the assembly block.

To recap, -> copies values into registers, <- copies values out of registers, and <-> does both. The compiler trusts that the only registers modified are the only ones explicitly used. If you need to use a register for storing an intermediate value, you can mark a register used with the blank identifer _ <-> "rd".

I believe this, compared to other languages, is an improvement in both syntax and mechanics.

Register Allocation

I’m not entirely set on this, but it’s a feature i think might be cool for a language that interacts with hardware.

When your code is compiled in a traditional language, the compiler chooses which registers to use and which values to place in them. I think it would be interesting to take some of this control back from the compiler and be able to manually allocate registers.

val : int = 3

register val "ra";

// variable val now lives in register ra
val += 1; // regular operations still work

release val;
// variable may now be placed wherever, and register ra is free for the compiler

This guarantees that from the time it is allocated until it is released, register ra will hold the value of val.

Registers cannot be doubly allocated (use in an asm block counts as allocation) and must be allocated and deallocated in the same scope. Here’s a few examples:

val : int = 10
register val "ra";

if val > some_other_value {
    release val;
}
// the location of val here is unpredictable at compile time, no good
val : int = 10
if val > some_other_value {
    register val "ra";
    // ... some code here
    release val;
}
// this is fine! register is properly deallocated in the same scope.
val : int = 10
register val "ra";
register val "rb"; // variables cannot be allocated to multiple registers
val : int = 10
other_val : int = 30
register val "ra";
register other_val "ra"; // value collision! this isn't allowed either

Allocated values are only temporarily deallocated for function calls, because the calling convention uses specific registers for argument passing. After the function returns however, the register is reallocated.

I’m not sure about all this, it might present some problems for multithreaded code. Let me know what you think if you have an idea!

Runtime Library + Generated Code

Through the use of compiler flags, the runtime library can be modified.

  • -runtime:include : This is the default. the runtime library is linked as a module with the name “mars_runtime”.
  • -runtime:none : This disables the runtime and disallows inclusion of runtime functions. With this, an error is generated anywhere a runtime function is needed.
  • -runtime:external : This treats runtime functions as external functions. References to runtime functions are generated but the actual runtime module is not linked.
  • -runtime:replace : A custom module may be substituted for the runtime library. The module must implement all runtime functions.
  • -runtime:inline : Every instance of a runtime function is inlined. Not recommended as it may balloon your code, but it’s an option.

Anyways…

This is more of an idea dump than a coherent design document. I might write about this later, but for now it’s a set of ideas and a shitty lexer/parser. Mars is subject to change a LOT as my preferences (and the Aphelion project as a whole) change.