Online / 5 & 6 February 2022

visit

Fuzion Language Update

The marathon run 🏃🏃‍♀️ 🏃‍♂️ from a language prototype to a full implementation and toolchain.


Fuzion is a modern general purpose programming language that unifies concepts found in structured, functional and object-oriented programming languages into the concept of a Fuzion feature. It combines a powerful syntax and safety features based on the design-by-contract principle with a simple intermediate representation that enables powerful optimizing compilers and static analysis tools to verify correctness aspects.

This talk will present the advances in the Fuzion languages since its first public announcement at FOSDEM 2021. This includes a simplified and cleaned-up syntax, improved type inference, its safety features, foreign language interface to Java and an overview of the existing and planned toolchain.

Fuzion is not only about the language itself, but just as well about the intermediate representation that is the basis for static analysis, optimization and back-ends. The talk will give an overview of the format of intermediate code.

Introduction

Fuzion is a modern general purpose programming language that unifies concepts found in structured, functional and object-oriented programming languages into the concept of a Fuzion feature. It combines a powerful syntax and safety features based on the design-by-contract principle with a simple intermediate representation that enables powerful optimizing compilers and static analysis tools to verify correctness aspects.

Fuzion was influenced by many other languages including Java, Python, Eiffel, Rust, Go, Lua, Kotlin, C#, F#, Nim, Julia, Clojure, C/C++, and many more. The goal of Fuzion is to define a language that has the expressive power present in these languages and allow high-performance implementations and powerful analysis tools. Furthermore, Fuzion addresses requirements for safety-critical applications by adding support for contracts that enable formal specification and enable detailed control over run-time checks.

Many current programming language are getting more and more overloaded with new concepts and syntax to solve particular development or performance issues. Languages like Java/C# provide classes, interfaces, methods, packages, anonymous inner classes, local variables, fields, closures, etc. And these languages are currently further extended by the introductions of records/structs, value types, etc. The possibility of nesting these different concepts results in complexity for the developer and the tools (compilers, VMs) that process and execute the code.

For example, the possibility to access a local variable as part of the closure of a lambda expression may result in the compiler allocating heap space to hold the contents of that local variable. Hence, the developer has lost control over the allocation decisions made by the compiler.

In Fuzion, the concepts of classes, interfaces, methods, packages, fields and local variables are unified in the concept of a Fuzion feature. The decision where to allocate the memory associated with a feature (on the heap, the stack or in a register) is left to the compiler just as well as the decision if dynamic type information is needed. The developer is left with the single concept of a feature, the language implementation takes care of all the rest.

Fuzion Feature Declarations

A Fuzion feature has a name, similar to the name of a class or a function. The main operation that can be performed on a feature is a feature call. The constituents of a feature declaration are as follows:

Formal Arguments

Features may have a list of formal arguments, which are themselves features implemented as fields. On a call to a feature with formal arguments, actual arguments have to be provided to the call, unless the list of formal arguments is empty.

Feature Result

The result of a feature call is an instance of the feature. Alternatively, a feature may declare a different result type, then it must return a value of that type on a call.

Closures

Features are nested, i.e., every feature is declared within the context of an outer feature. The only exception is the universe, which is the outermost feature in Fuzion. A feature can access features declared in its outer feature or, recursively, any outer feature of these outer features. This means, a feature declaration also defines a closure of the feature and its context.

When calling a feature f1 declared as an inner feature of f2, the call must include a target value which is the result of a call to f2, e.g., f2.f1.

Generics

Features may have generic type parameters. E.g. a feature declaration may leave the actual type used within that feature open and to be defined by the user of the feature.

The list of generic type parameters may be open, i.e., the number of actual generic type parameters is not fixed at feature declaration. This turns out to be useful in the declaration of choice types and functions as explained below.

Inheritance

Fuzion features can inherit from one or several other features. When inheriting from an existing features, all inner features of the parent automatically become inner features of the heir feature. It is possible to redefine inherited features. In particular, when inheriting from a feature with abstract inner features, one can implement the inherited abstract features.

A redefinition of an inherited feature may implement an inherited feature as a routine or as a field. An inherited feature that is implemented as a field, however, cannot be redefined as something else since fields might be mutable.

Inheritance may result in conflicts. An example would be two features with the same name that are inherited from two different parents. In this case, the heir must resolve the conflict either by redefining the inherited features and providing a new implementation or by renaming the inherited features resulting in two inner features in the heir feature.

Inheritance and redefinition in Fuzion does not require dynamic binding. By default, the types defined by features are value types and no run-time overhead for dynamic binding is imposed by inheritance.

A Contract

A feature may declare a contract that specifies what the features does and under which conditions the feature may be called.

An implementation

Features must have one of the following implementations

  • a routine is a feature implementation with code that is executed on a call

  • a field is a memory slot that stores a value and whose contents are returned on a call

  • an abstract feature has no implementation and cannot be called directly, but can be implemented by heir features

  • an intrinsic feature is a low-level feature implemented by the compiler or run-time system, e.g., the infix + operator to add two 32-bit integer values may be an intrinsic operation.

A feature implemented as a routine can contain inner feature declarations.

Feature examples

Here is an example that declares a feature point that functions similar to a struct or record in other languages:

point(x, y i32) is # empty
p1 := point 3 4
say "p1.x is {p1.x}"    # will print "p1.x is 3"
say "p1.y is {p1.y}"    # will print "p1.y is 4"

The next example shows a feature base that provides an inner feature plus that adds its argument to the value passed to the enclosing base:

base(v i32) is
  plus(w i32) => v + w

b1 := base 30
b2 := base 100
say (b1.plus 23)    # will print "53"
say (b2.plus 23)    # will print "123"

Fuzion Syntax Evolution

Fuzion provides two main syntax alternatives, a classic once using semicolons, braces and parentheses and a modern one using white-space and indentation. Both are equivalent, there should be tools between these two representations of source code. The following explains the main ideas how white-space is used instead of special symbols

Separating Statements

Flat line feeds instead of semicolons

The classic way to separate statements is by using a semicolon as in

stmt1; stmt2; stmt3

or

stmt1;
stmt2;
stmt3;

The Fuzion grammar knows a symbol called 'flat line feed', which is a line feed with the next line starting with white space up to the previous line's indentation level. A 'flat line feed' is considered equivalent to a semicolon, so the sequence of three statements above can be written without semicolons as follows:

stmt1
stmt2
stmt3

Indenting line feeds to form blocks

Code blocks in Fuzion can be build using braces similar to many other languages

if cond { stmnt1 } else { stmnt2 }

if cond {
  stmnt1
} else {
  stmnt2
}

if cond
  {
    stmnt1
  }
else
  {
    stmnt2
  }

An 'indenting line feed' in Fuzion is a line feed with the next line starting at a higher indentation level. The parser treats an 'indenting line feed' like a left brace '{'. Correspondingly, a linefeed that reduces the indentation level back to the original level is treated like a right brace '}'. The example above is hence equivalent to

if cond
  stmnt1
else
  stmnt2

Finally, an optional keyword 'then' may as well be used to separate an expression like the condition in an 'if' from a following expression without the need of braces:

if cond then stmnt1 else stmnt2

Separating calls, arguments and operator expressions

calls without parentheses

Fuzion calls do not need parentheses or commas to separate the called feature and its arguments, i.e., a call

f(a, b, c)

can be written as

f a b c

Parameters are then separated by white space. Line breaks, either flat or indenting as explained above, end the argument list.

Parentheses may be needed for nesting calls with arguments, e.g., the code

f (g x y) (h z)

is equivalent to

f(g(x, y), h(z))

If placed in parentheses, operator expressions may extend over several lines using an indenting line feed followed by additional flat line feeds, e.g.,

f (1 + 2 + 3 + 4 +
   5 + 6 + 7 + 8 +
   9 + 10 + 11 + 12)

Tuples of values such as '(a, b)' syntactically look like argument lists. A list of expressions enclosed in parentheses is treated as an argument list only if it immediately follows the name of a called feature. The code

f(a, b)

hence calls 'f' with two arguments 'a' and 'b'. In case there is white space after the name of the called feature as in

f (a, b)

the expression in parentheses is passed as an argument, so, in this case 'f' will be called with a single argument: the tuple '(a, b)'.

Array indexing vs. array creation

The square brackets '[' and ’]’ are used in Fuzion for two purposes: A pre-initialized array of fixed size can be created using an expression like '[x, y, z]', while an array can be indexed using 'a[i]'. Again, white space can be used to distinguish these two cases:

a[i]

read an element from an array while

f [x, y, z]

create array with elements x, y and z and passes it as an argument in a call to 'f'.

Operator expressions

Operators are parsed as prefix- or postfix operators if the are not separated by white space from their target operand, but they are separated by white space on the other side. This means that the call

f -x

is equivalent to

f (-x)

while

f - x
f-x

both subtract 'x' from 'f'. Furthermore,

f- x

is parsed as

(f-) x

i.e., it calls 'postfix -' on 'f' and, assuming the result is a function, calls this result with one argument 'x'.

The precedence of operators that are not separated by white space is stronger than that of a call, so

f a-b

is equivalent to

f (a-b)

while the precedence of calls is higher than that of operators separated by white space, i.e,

f a -b
f a - b

are equivalent to

f (a) (-b)
(f a) - b

respectively.

White space vs. explicit tokens

Attaching semantics to white space might appear dangerous and error-prone to those not used to it. However, more and more languages (Python, Scala-3, Nim, Raku, ...) make use of white-space and the experience is generally very positive, the code is cleaner, easier to read an even easier to maintain. Compiler errors complaining about unbalanced parentheses or braces are gone and mismatch between indentation and semantics may no longer occur.

Furthermore, even languages like Java allow code to have hugely different semantics if just a single white space is added, e.g., the Java code

if (cc) x(); elsey();
if (cc) x(); else y();

changes completely by insertion of a single space character.

Fuzion Type Inference

Fuzion is statically typed, every expression, every field, every routine result has a static type known at compile time. Type inference is used to avoid the need to specify these types explicitly in many situations, reducing the boilerplate code while keeping the safety of a statically types language.

Type inference in Fuzion occurs at explicit or implicit assignments. An explicit assignment is a field declaration with an initial value such as

v := expr

while an implicit assignment occurs, e.g., when a routine returns the value of its last expression as its result

sum (x, y u32) =>
  x + y

or when an argument is passed to a function

sum 1.234e6 567890

Type inference occurs in two directions: From the value to the field that the value is assigned to and from the type of the assigned field to the value. Field declarations using ':=' without an explicit type use the type inferred from the value, the same holds for routines defined using '=>'.

On the other hand, the type of expressions such as lambdas or numeric literals are inferred from what they are assigned to.

Type inference from expressions to feature result type

When declaring either a field using ':=' or a routine using '=>', the result type is inferred from the expression that is assigned or returned, respectively:

v := 123                    # type i32, default type for integer literal
v := "hello"                # type string
highNibble(b u8) => b >> 4  # type u8
origin => point 0 0         # type point

Type inference from expressions to type arguments

Type arguments are inferred from actual arguments as shown by the following example: Imagine a generic routine square as follows

square<T: integer<T>> (a T) => a * a

This could be called using explicit type arguments as in

say (square<i32> 23)

or the type can be inferred from the argument type

say (square 23)

As a consequence, type arguments can be omitted in many cases.

Type inference for lambdas

In Fuzion, inline functions (lambdas) are defined using '->' with the input variables on the left and the expression on the right. Types for a lambda are implicitly taken from the target the lambda is assigned to. E.g., the following example takes a function argument

print(s string, f (string, i32) -> string) =>
  say (f s 3)

print(f (i32, i32) -> i32) =>
  say (f 10 3)

print "Alice" (s,i -> s * i)   # lambda type is (string, i32) -> string
print "Bob"   (s,i -> s + i)
print (s,i -> s * i)           # lambda type is (i32, i32) -> i32
print (s,i -> s + i)

Type inference for numeric constants

Numeric constants that are assigned to a given type inherit that type. There is no immediate distinction between plain numeric literals like '123' or ones using a fractional parts and exponents such as '-0.1234E4'. The target type determines the type of the constant.

v1 u32 := 123456
v2 i8 := -128
v3 i16 := 32000
v4 u64 := 1000000000000000000
v5 i64 := -1000000000000000000
v6 u16 := 4711
v7 f32 := 123456
v8 f64 := 123.456E6
v9 f64 := 1.0E3

In case the constant is too large for the target type, or it has a fractional part not representable by the type, a compile-time error occurs:

v1 u32 := -123456       # error: negative value assigned to unsigned
v2 i32 := 3000000000    # error: value too large
v3 i8  := 128           # error: value too large
v4 u16 := 65536         # error: value too large
v5 f32 := 123.456E39    # error: overflow
v7 f64 := 123.456E309   # error: overflow
v8 f32 := 7E-46         # error: underflow

Fuzion Intermediate Files

Fuzion uses intermediate files during different stages of compilation: module files that contain library code, application files for whole applications and Fuzion intermediate representation files that serve as input to the back ends.

Fuzion uses a simple intermediate code to represent pre-compiled modules and whole applications. This intermediate code serves as the input for static analysis tools, optimizers and for different back-ends that produce executable applications. The goal in the design of the intermediate file format was high performance and simplicity for tools using this code.

The intermediate code is a binary format containing features and types in a way that may be mapped to memory and used directly, so overhead of parsing this format into an in-memory representation is avoided. In particular, if only parts of a pre-compiled module are used by an application, there is no need to read and unpack parts of the module intermediate representation that are not used.

For features containing code, a very simple stack-based format is used. There are currently only ten different instructions:

  • Unit -- produce a value of unit type
  • Current -- produce the current instance as a value
  • Constant -- produce a constant value
  • Assign -- perform an assignment to a field
  • Call -- perform a call to a given feature
  • Tag -- convert a value into a tagged value of a choice type
  • Match -- match a choice type
  • Box -- convert a value type to a ref type
  • Unbox -- extract the value from a ref type
  • Pop -- drop the top element from the stack

The intermediate code uses indices to refer to features and types within intermediate files. This means that lookup is very efficient, but it also means that a change in a library module requires recompilation of all dependent modules. Any incompatibilities would be found at compile time instead of resulting in something like Java's 'IncompatibleClassChangeError' at run-time.

Foreign language interface

Fuzion provides a tool 'fzjava' that takes a Java module file and converts it into Fuzion features. In the spirit of Fuzion, Java's packages, classes, interfaces, methods, constructors, static methods and fields are all converted into Fuzion features. Java methods that may throw an exception are converted into features resulting in a choice type that is either the exception type or the result type of that method.

A few intrinsic functions in the Java interpreter back-end use Java's reflection API to access the corresponding Java code.

Here is a example how Java code can be used from Fuzion:

javaString := java.lang.String.new "Hello Java 🌍!"                          # create Java string, type Java.java.lang.String
javaBytes  := javaString.getBytes "UTF8"                                     # get its UTF8 bytes, type is fuzion.java.Array<i8>
match javaBytes
  err error => say "got an error: $err"
  bytes fuzion.java.Array =>
    say "string has {bytes.count} bytes: $bytes"
    javaString2 := java.lang.String.new bytes 6 bytes.count-6                # create Java string from bytes subset,
    say "Hello "+javaString2                                                 # append Java string to Fuzion string and print it

C back-end

Fuzion now has a back-end that produces C source code to be compiled to machine code using clang and LLVM.

Fuzion safety

A main goal of Fuzion is to provide a development tool for safety-critical applications. Fuzion brings a number of features that address safety issues at different levels:

  • Language features

    • static typing
    • language simplicity
    • pre- and post-conditions for design by contract
    • no wrap-around for standard operations +, -, *, /, etc.
    • mutability restricted to local contexts
  • Run-time features

    • dynamic checks where static checks are not sufficient
    • (real-time) garbage collection
  • Environment features

    • static linking to specific library versions
    • static analysis tools
    • no dynamic code loading
    • simple intermediate representation, simpler tools

Conclusion and Next Steps

The Fuzion language definition and implementation are far from stable, but are getting closer to become useful. Big improvements come from the ability to pre-compile modules and from the foreign language interface for Java, which makes a giant code base accessible for Fuzion applications to build on.

Additionally, new projects such as the language server implementation for Fuzion by Michael Lill help by integrating Fuzion support in popular IDEs and editors.

Main points that are missing right now are

  • a powerful standard library
  • additional library modules for all sorts of application needs
  • low-level foreign language interface for C
  • actual implementations of static analyzers and optimizers
  • highly optimizing back-ends
  • garbage collection for the C back-end
  • documentation, tutorials
  • enthusiastic contributors and users!

Please feel free to contact me in case you want to use Fuzion or want to help making it a success!

Speakers

Photo of Fridtjof Siebert Fridtjof Siebert

Links