Introduction

Welcome to the documentation for the Mercury project!

This is the main user-facing documentation for the project as a whole. It consists of docs for Developers, as well as end-users.

If you would like to use the operating system, or help with development, this is where to get your information!

What is Mercury?

Mercury, or the Mercury Project, refers to the collection of libraries, crates, and other code under this organization. "Organization" is a loose term, and really it's anyone who contributes and follows the Code of Conduct.

Mercury OS refers specifically to the full Operating System built off of the ferrite kernel and all the other binaries.

Status

Right now, Mercury is mainly in the planning stage. We're still setting things up, learning, and planning out how we're going to do this. Actual code development is likely not going to take place for a while, but feel free to come back later to help with that!

Development

If you're wanting to help develop Mercury, you've come to the right place! Again, right now we're a heavy work-in-progress. But eventually this will be useful!

Contributor Agreements

By submitting resources to this project (code, art, writing, etc.), you must agree to the following terms:

  1. Resources will be licensed under the CNPLv7+ license
  2. You must follow the Code of Conduct
  3. You should follow the Design Goals and Best Practices when possible

Otherwise, feel free to start contributing!

Best Practices

These are various "best practices" for code written for the Mercury project. They should be followed when reasonable, to the best of your ability/understanding. Feel free to contact a maintainer with questions!

  • No compromises will be made for compatibility. If there is a better, if unusual, way to do things, it should be done that way.
  • Everything must be fully accessible. Everything else can be sacrificed for this.
  • unsafe code should be avoided when possible.
  • Everything should be documented as it is written.
  • Nesting should be avoided as much as possible.
  • If something can be excluded from the main kernel, it should be. It's a microkernel!
  • Features should be opt-in rather than opt-out.

Source Code

All of the source code for Mercury is on a self-hosted Gitea, courtesy of Lavender Software. Sign up there, and contact one of the maintainers to get access to the repositories.

The source for this site, and our website is available there as well.

Design

All crates/libraries are in a no-std environment. This means we only have access to the libcore functionality. However, we will be using the alloc crate to access the heap, andcollections to have access to data structures like Vec.

We should, however, have basic support for async and threading in core::.

Learning

Before jumping in, I highly recommend learning some stuff abotu Rust and embedded development with it. A thorough series of steps might be:

  1. Read through the Rust Book
  2. Work through the Interactive Rust Book
  3. Complete the rustlings exercises
  4. Take a quick look through the Embedded Rust Book
  5. Read the RISC-V Guide/RISC-V Bytes to learn more about the RISC-V architecture
  6. Read the OSDev Wiki entries on Microkernels and Message Passing
  7. Read the Async Book
  8. This has some good information about performance

Additionally you might want to learn about Vulkan if you're going to be hacking on the GUI:

  1. Go through the Vulkan Tutorial (Rust) to learn some of the basics
  2. Read through the Vulkano docs. (Vulkano is a safe wrapper around the Vulkan API. It's likely what we will be using)

Understanding the Design Goals

Mercury has several new and novel design decisions that make it radically different from other general Operating Systems.

Warning

A lot of these designs will likely change and shift as work get's done on the project.

First off, it's written in Rust, which allows for several nice features, including:

  • Memory safety
  • Easy dependency and build management with cargo
  • Great performance and reliability
  • Several compilation targets, with simple cross-compilation

It also uses microkernel architecture - this allows for us to keep the base kernel code small, and have additional features be modular, and easy to integrate into other projects. This also allows for a smaller attack surface, less bloat, smaller code, etc.

Additionally, Mercury is designed for ARM/RISC-V architecture machines. This is not only because they are simpler, but also because I believe they are the future of computing. For the future, I do not see myself wanting or attempting to implement x86 functionality.

We may also use Rhai for scripting, for easy user control & modification of the system.

It will also use a global configuration - similar to Guix or NixOS. This allows it to be easily setup. It will likely use RON for configuration.

Note: figlet-rs can be used for cool ASCII art!

Further design decisions are gone into detail in the next few chapters.

Code Organization

Info

These names and layout are all WIP.

All of the code will take place in separate repositories. Information on actually commiting, pulling, etc. is in the Workflow chapter.

Most of the code will be implemented as libraries, enabling for them to be used across systems, and worked on separately. Similarly drivers will be libraries in git submodules.

Overall Design

Connections

erDiagram
    BOOTLOADER ||--|| KERNEL: runs
    KERNEL ||..o{ GRAVITAS: uses
    KERNEL }|..o{ METEOR: uses
    GRAVITAS }|..o{ HAL: uses
    GRAVITAS ||--o{ DISK: IO
    KERNEL ||--|{ MEMORY: maps
    KERNEL ||--o{ EXE: runs
    EXE }|..o{ METEOR: uses
    EXE }|..o{ KERNEL: msg

Startup Flow

flowchart TD
    boot[Bootloader] --> kern(Kernel)
    kern --> disk(Read Disk) --> ind(Index Filesystem) -->
    parse(Parse Configuration) --> run(Run Startup Programs)
    parse -.-> sh([Interactive Shell])
    kern --> mem(Map Memory) -.-> ind
    run ==> actor([Create Actors])

Actor System

Why?

Actors work as an abstraction over data storage and messaging. It allows for all systems (GUI, Programs, etc.) to work together, and rely on the same features. It reduces work of implementation, and all implementations can use the functions.

Features

  • Petnames
  • OCAP security
  • HMAC message verification

Format

#![allow(unused)]
fn main() {
// Different possible types of actors (more to be added)
enum ActorType {
    GUI(photon::Widget),
    ProgramInterface,
}

// Possible states an actor can be in
enum ActorState {
    Receive,
    Send,
    Work,
    Idle,
}

// Cryptographic keypair
struct KeyPair {
    privkey: u128,
    pubkey: u128,
}

// The actor itself
struct Actor<D: DataInterface> {
    petname: Option<String>, // Human-meaningful petname (explored further down)
    uuid: Uuid, // Unique identifier
    namespace: Uuid, // Parent namespace of this actor
    actor_type: ActorType,
    state: ActorState,
    keys: Option<KeyPair>, // Cryptographic keypair
    creation_date: DateTime,
    modified_date: DateTime,
    data: Option<D>, // Optional data of the generic D type
}

impl Actor {
    fn new(namespace: Uuid, a_type: ActorType) -> Self {
        Actor {
            petname: None,
            uuid: Uuid::new(),
            namespace: namespace,
            actor_type: a_type,
            state: ActorState::Idle,
            keys: None,
            creation_date:: now(),
            modified_date: now(),
            data: None,
        }
    };
}

impl KeyPair {
    async fn generate_keypair(&mut self) -> Self; // Generate a public/private keypair (threaded)
    fn get_pubkey(&self) -> u128; // Return the keypair of an Actor
    async fn sign(&self, &[u8]) -> Result<&[u8], Error>; // Sign some data with a private key (threaded)
    async fn verify_signature(&[u8], u128) -> Result<(), Error>; // Verify signed data (threaded)
}

trait FilesystemInterface { // Interfacing with the filesystem
    async fn read(&mut self) -> Result<(), Error>; // Read the data from the disk into the Actor using the Uuid as a search key
    async fn write(&self) -> Result<(), Error>; // Write the data to the disk using the Uuid as a key
}

trait DataInterface { // Necessary data functions
    async fn to_bytes(&self) -> Result<&[u8], Error>; // Convert the data into a byte array
}

trait MessageInterface { // Sending & receiving messages
    async fn send_message(&self, MessageType, Uuid) -> Result<(), Error>; // Send a message to a recipient
    async fn receive_message(&self, Channel) -> Message; // Asynchronously wait for an incoming message, and deal with the first one we get
}
}

OCAP

TODO

Messages

  • postcard for message passing
  • Priority Queue for processing multiple messages, while dealing with higher-priority ones first

Messages will be fully modelled so an actor can know exactly what they have to deal with, and what they can send. Different channels are used to make each one less clogged up, and used only for a specific purpose. Actors can read from/write to a specific channel, allowing them to ignore the others. They can then also deal with channels in different ways, maybe deprioritizing the Test channel.

#![allow(unused)]
fn main() {
enum Channel { // Channels for sending/receiving messages on
    Graphics, // Low-latency graphics updates
    Test, // Designated channel for testing messages
    Filesystem, // Batch filesystem operations
    Print, // Printing text
    Executable, // Executable-related messages
}

enum ProcessCode {
    Exit, // Exit the process
    Save, // Save data
    Clear, // Clear data
    Restart, // Restart process
}

enum MessageType {
    Ping(String), // Simple test if we can send/recieve a message
    FilesystemUpdate(gravitas::FileOperation), // We want to operate on the filesystem
    GraphicsUpdate(photon::GraphicsOperation), // Update a graphics window
    TextUpdate(String), // Send some text (text mode only)
    ProcessUpdate(ProcessCode), // Send some info about an operation to be done on the current process. Usually kernel -> exe
}

struct Message {
    id: Uuid, // UUID of the message itself
    m_type: MessageType, // Message type & content
    priority: u8, // For priority queueing
    sender: Uuid, // Who is sending the message
    recipient: Uuid, // Who the message is meant for
}
}

An example message handling loop may look like this:

#![allow(unused)]
fn main() {
loop { // Continuously loop through message sending & receiving
    actor.send_message(MessageType::Ping("hello!".to_string())).await; // Block and await until we can send the test message.
    match actor.receive_message(&self, Channel::Test).await.m_type { // Match on a message type
        Ping(s) => println!("We got pinged! {}", s), // Print if we got pinged
        _ => {}, // Ignore other states
    }
}
}

Latency

Security Features

Mercury is designed with security in mind from the beginning.

  • First, we will be using Orion - a pure Rust crypto library.
  • There is built in support for checksums and AES encryption in the filesystem.
  • HMAC1 will be used for message passing - which additionally allows for encrypted messages.
  • nanorand RNG
  • HighwayHash is used for checksums
  • Argon2id is used for key-derivation

Isolation

To-Do

Microkernel

The core kernel of Mercury will be highly limited, implementing only necessary portions. This allows other functionality to be simply run in userspace.

Additionally, most code should be put into separate libraries then pulled into the kernel code. This will likely be done via git submodules.

Initially, it will be built for RISC-V, then ARM (focused on running in a VM), then on a Raspberry Pi. Afterwards, we can put focus towards building out various features.

Support for multiple targets will be done via Cargo.toml targets, cross-compilation, and conditional compilation.

Concurrency

For performance, we will be using various concurrency programming techniques.

For IO intensive operations, async will be used. This will include the filesystem, actors, and GUI.

CPU-bound operations are better suited to individual threads, however. This might include operations like hashing, encryption, and indexing.

Boot Process

TODO

Memory Management

TODO

Processes

TODO

Error Handling

All errors must be handled gracefully by the kernel. If possible, they should simply log an error. If not, they can display it to the user, preferably in a simple format, maybe using something like const_panic or snafu.

GUI

Eventually, programs will be able to use the photon library to have access to a graphics API. This will initialize various actors to represent parts of the UI.

Performance

The GUI is one of the systems where latency is far more important than throughput. There are several things that aid with performance of this system:

  1. Each window is drawn in a separate buffer, allowing for easy concurrency
  2. Messages use the latency bus
  3. All draws are based on optimized and simple low-level operations
  4. Only changes are re-rendered

Drawing

When a GUI element wants to update, it first sends a message to the kernel. The kernel then calculates the overlaying of each window, writes each window to its own buffer, then updates the screen buffer with ones that have changed, which is then drawn to the screen. This ensures that only necessary parts are re-rendered, and the rendering can be done asynchronously/threaded.

The photon library will not only provide a high-level API for applications to use, but also lower-level drawing methods for the kernel to use. These may include line, rectangle, triangle, and circle drawing methods, as well as being able to render text.

Flow

flowchart LR
    app(Application) --> kern(Kernel)
    kern --> buf([Buffer])
    kern --> app
    buf --> dis((Display))

Styling

Styling of GUI elements is done via a global configuration. The kernel parses this information, and uses it to actually style the widgets provided to it.

Widgets created by the program/library contain no styling data - only information such as text, size, callbacks, etc. The kernel does all the display work.

Design

erDiagram
    WINDOW ||--|{ SECTION: holds
    WINDOW ||--|| TITLE: has
    SECTION ||--|| TITLE: has
    SECTION ||--o{ SECTION: holds 
    SECTION ||--o{ CANVAS: holds
    SECTION ||--o{ TEXTBOX: holds

Filesystem

Warning

I have no idea what I'm doing here. If you do, please let me know, and fix this! This is just some light brainstorming of how I think this might work.

Prelude

Right now, actors are stored in RAM only. But, what if we want them to be persistent on system reboot? They need to be saved to the disk.

I don't want to provide a simple filesystem interface to programs like UNIX does however. Instead, all data should be just stored in actors, then the actors will decide whether or not they should be saved. They can save at any time, save immediately, or just save on a shutdown signal.

Therefore, the "filesystem" code will just be a library that's simple a low-level interface for the kernel to use. Actors will simply make requests to save.

Performance

I believe that this format should be fairly fast, but only implementation and testing will tell for sure. Throughput is the main concern here, rather than latency. We can be asynchronous as wait for many requests to finish, rather than worrying about when they finish. This is also better for SSD performance.

  1. Minimal data needs to read in - bit offsets can be used, and only fixed-size metadata must be known
  2. serde is fairly optimized for deserialization/serialization
  3. BTreeMap is a very fast and simple data structure
  4. Async and multithreading will allow for concurrent access, and splitting of resource-intensive tasks across threads.
  5. hashbrown is quite high-performance
  6. Batch processing increases throughput

Buffering

The kernel will hold two read/write buffers in-memory and will queue reading & writing operations into them. They can then be organized and batch processed, in order to optimize HDD speed (not having to move the head around), and SSD performance (minimizing operations).

Filesystem Layout

NameSizeHeader
Boot Sector128 BNone
Kernel Sector4096 KBNone
Index Sectoru64PartitionHeader
Config Sectoru64PartitionHeader
User Sector(s)u64PartitionHeader

Partition

A virtual section of the disk. Additionally, it has a UUID generated via lolid to enable identifying a specific partition.

binary-layout can be used to parse data from raw bytes on the disk into a structured format, with no-std.

#![allow(unused)]
fn main() {
use binary_layout::prelude::*;
const LABEL_SIZE: u16 = 128; // Example number of characters that can be used in the partition label

define_layout!(partition_header, BigEndian, {
    partition_type: PartitionType, // Which type of partition it is
    num_chunks: u64, // Chunks in this partition
    uuid: Uuid
});

enum PartitionType {
    Index, // Used for FS indexing
    Config, // Used for system configuration
    User, // User-defined partition
}

fn parse_data(partition_data: &mut [u8]) -> View {
    let mut view = partition_header::View::new(partition_data);

    let id: u64 = view.uuid().read(); // Read some data
    view.num_chunks_mut().write(10); // Write data

    return view;
}
}

Chunk

Small pieces that each partition is split into. Contains fixed-length metadata (checksum, encryption flag, modification date, etc.) at the beginning, and then arbitrary data afterwards.

binary-layout is similarly used to parse the raw bytes of a chunk.

#![allow(unused)]
fn main() {
use binary_layout::prelude::*;
const CHUNK_SIZE: u64 = 4096; // Example static chunk size (in bytes)

define_layout!(chunk, BigEndian, {
    checksum: u64,
    modified: u64, // Timestamp of last modified
    uuid: u128,
    data: [u8; CHUNK_SIZE],
});
}

This struct is then encoded into bytes and written to the disk. Drivers for the disk are to be implemented. It should be possible to do autodetection, and maybe for Actors to specify which disk/partition they want to be saved to.

AES encryption can be used, and this allows for only specific chunks to be encrypted.1

Reading

On boot, we start executing code from the Boot Sector. This contains the assembly instructions, which then jump to the kernel code in the Kernel Sector. The kernel then reads in bytes from the first partition (as the sectors are fixed-size, we know when this starts) into memory, parsing it into a structured form.

From here, as we have a fixed CHUNK_SIZE, and know how many chunks are in our first partition, we can read from any chunk on any partition now. On startup, an Actor can request to read data from the disk. If it has the right capabilities, we find the chunk it's looking for from the index, parse the data, and send it back.

Also, we are able to verify data. Before passing off the data, we re-hash it using HighwayHash to see if it matches. If it does, we simply pass it along like normal. If not, we refuse, and send an error message.

Writing

Writing uses a similar process. An Actor can request to write data. If it has proper capabilties, we serialize the data, allocate a free chunk, and write to it. We hash the data first to generate a checksum, and set proper metadata.

Permissions

Again, whether actors can:

  • Write to a specific disk/partition
  • Write to disk at all
  • Read from disk

will be determined via capabilities

Indexing

Created in-memory on startup, modified directly whenever the filesystem is modified. It's saved in the Index Sector (which is at a known offset), allowing it to be read in easily on boot.

The index is simply an alloc:: BTreeMap. (If not, try scapegoat).

We also have a simple Vec of the chunks that are free, which we modify in reverse.

#![allow(unused)]
fn main() {
let mut index = BTreeMap::new(); // Basic Actor index
let mut free_index = Vec<u64>; // Index of free chunks

struct Location {
    partition: Uuid, // Partition identified via Uuid
    chunks: Vec<u64>, // Which chunk(s) in the partition it is
}

let new_data_location = Location {
    partition: Uuid::new(),
    chunks: vec![5, 8], // 5th & 8th chunk in that partition
}

index.entry(&actor.uuid).or_insert(&new_data_location); // Insert an Actor's storage location if it's not already stored
for i in &new_data_location.chunks {
    free_index.pop(&i); // Remove used chunks from the free chunks list
}

index.contains_key(&actor.uuid); // Check if the index contains an Actor's data
index.get(&actor.uuid); // Get the Location of the actor
index.remove(&actor.uuid); // Remove an Actor's data from the index (e.g. on deletion)
for i in &new_data_location.chunks {
    free_index.push(&i); // Add back the now free chunks
}
}

This then allows the index to be searched easily to find the data location of a specific Uuid. Whenever an actor makes a request to save data to it's Uuid location, this can be easily found. It also allows us to tell if an actor hasn't been saved yet, allowing us to know whether we need to allocate new space for writing, or if there's actually something to read.

To-Do

  • Snapshots
  • Isolation
  • Journaling
  • Resizing
  • Atomic Operations

Executable Format

Programs written in userspace will need to follow a specific format. First, users will write a program in Rust, using the Mercury libraries, and with no-std. They'll use Actors to communicate with the kernel. Then, they'll compile it for the proper platform and get a pure binary.

This will be ran through an executable packer program, and the output of which can be downloaded by the package manager, put on disk, etc. It'll then parsed in via bincode, then the core is ran by the kernel in userspace. Additionally, the raw bytes will be compressed.

Then, whether reading from chunks from memory or disk, we can know whether it will run on the current system, how long to read for, and when the compressed bytes start (due to the fixed length header). It is then simple to decompress the raw bytes and run them from the kernel.

#![allow(unused)]
fn main() {
enum Architecture {
    RiscV,
    Arm,
}

struct PackedExecutable {
    arch: Architecture,
    size: u64,
    compressed_bytes: [u8],
}
}
1

Specific details to be figured out later

Configuring a Build Environment

Info

All of this build information is for Linux right now, as I don't want to mess around with getting stuff working on Windows. Just use WSL if you want. BSD should work similarly, but if not, please let us know!

Of course, first you will need to install Rust. The best way to do this is through rustup. Additionally, you will need to install clippy and rustfmt. You probably will also want to cargo install sccache, and export the RUSTC_WRAPPER='sccache' environment variable however your shell does it.

Then, you'll want to git clone whatever repository you're wanting to work on.

Building

TODO: Figure out how building actually works.

When you're just testing, cargo will use the dev profile. This will allow for debugging, and faster compilation speeds.

However, all releases will be using the release profile, which is much slower, but better optimized. It's unlikely you'll want to use this on your computer.

Development Workflow

1. Pull Down Code

git clone the repository of whatever code you're wanting to work on. Make a new branch for the feature you want to do.

Maybe it's a new feature, or fixing a bug (please file it in the issue tracker!).

2. Do Coding

Work on your code however you do it, make sure to cargo fmt before each commit, sign your commits, and commit fairly often.

3. Test

Clippy

Run this clippy command, and try and ensure there are no warnings:

cargo clippy -- -W clippy::pedantic -W clippy::suspicious -W clippy::complexity -W clippy::perf -W clippy::cargo -W clippy::nursery -W clippy::unwrap_used -D warnings

Automated Tests

Run cargo test to ensure all of the tests still pass. If needed, add your own tests for your new code.

Run

Of course, manually run the code in a VM and see if everything works how it should.

3. Document

Use inline rustdoc to document your code.

4. Push

Push the code to a new branch in the repository. If it's ready and fully working, make a pull request to merge it into main.

5. CI

Eventually, I want to have some sort of CI system, to allow automated tests, checking, and releases.

Debugging

Debugging code and understanding what's going wrong is highly important, especially in a complex setup such as this. We will try to make it as easy as possible, with tools such as tracing.

There are several external tools that can be used as well.

Using the OS

Once we actually have a working kernel, information on running it will be added here!