Introduction
Welcome to the documentation for the Mercury project!
This is the main user-facing documentation for the project as a whole. It consists of docs for Developers, as well as end-users.
If you would like to use the operating system, or help with development, this is where to get your information!
What is Mercury?
Mercury, or the Mercury Project, refers to the collection of libraries
, crates
, and other code under this organization.
"Organization" is a loose term, and really it's anyone who contributes and follows the Code of Conduct.
Mercury OS refers specifically to the full Operating System built off of the ferrite
kernel and all the other binaries.
Status
Right now, Mercury is mainly in the planning stage. We're still setting things up, learning, and planning out how we're going to do this. Actual code development is likely not going to take place for a while, but feel free to come back later to help with that!
Development
If you're wanting to help develop Mercury, you've come to the right place! Again, right now we're a heavy work-in-progress. But eventually this will be useful!
Contributor Agreements
By submitting resources to this project (code, art, writing, etc.), you must agree to the following terms:
- Resources will be licensed under the CNPLv7+ license
- You must follow the Code of Conduct
- You should follow the Design Goals and Best Practices when possible
Otherwise, feel free to start contributing!
Best Practices
These are various "best practices" for code written for the Mercury project. They should be followed when reasonable, to the best of your ability/understanding. Feel free to contact a maintainer with questions!
- No compromises will be made for compatibility. If there is a better, if unusual, way to do things, it should be done that way.
- Everything must be fully accessible. Everything else can be sacrificed for this.
unsafe
code should be avoided when possible.- Everything should be documented as it is written.
- Nesting should be avoided as much as possible.
- If something can be excluded from the main
kernel
, it should be. It's amicrokernel
! - Features should be opt-in rather than opt-out.
Source Code
All of the source code for Mercury is on a self-hosted Gitea, courtesy of Lavender Software. Sign up there, and contact one of the maintainers to get access to the repositories.
The source for this site, and our website is available there as well.
Design
All crates
/libraries
are in a no-std
environment. This means we only have access to the libcore functionality.
However, we will be using the alloc
crate to access the heap, andcollections
to have access to data structures like Vec
.
We should, however, have basic support for async and threading in core::
.
Learning
Before jumping in, I highly recommend learning some stuff abotu Rust and embedded development with it. A thorough series of steps might be:
- Read through the Rust Book
- Work through the Interactive Rust Book
- Complete the rustlings exercises
- Take a quick look through the Embedded Rust Book
- Read the RISC-V Guide/RISC-V Bytes to learn more about the RISC-V architecture
- Read the OSDev Wiki entries on Microkernels and Message Passing
- Read the Async Book
- This has some good information about performance
Additionally you might want to learn about Vulkan if you're going to be hacking on the GUI:
- Go through the Vulkan Tutorial (Rust) to learn some of the basics
- Read through the Vulkano docs. (Vulkano is a safe wrapper around the Vulkan API. It's likely what we will be using)
Understanding the Design Goals
Mercury has several new and novel design decisions that make it radically different from other general Operating Systems.
First off, it's written in Rust, which allows for several nice features, including:
- Memory safety
- Easy dependency and build management with
cargo
- Great performance and reliability
- Several compilation targets, with simple cross-compilation
It also uses microkernel architecture - this allows for us to keep the base kernel code small, and have additional features be modular, and easy to integrate into other projects. This also allows for a smaller attack surface, less bloat, smaller code, etc.
Additionally, Mercury is designed for ARM
/RISC-V
architecture machines.
This is not only because they are simpler, but also because I believe they are the future of computing.
For the future, I do not see myself wanting or attempting to implement x86
functionality.
We may also use Rhai for scripting, for easy user control & modification of the system.
It will also use a global configuration - similar to Guix or NixOS. This allows it to be easily setup. It will likely use RON for configuration.
Note: figlet-rs can be used for cool ASCII
art!
Further design decisions are gone into detail in the next few chapters.
Code Organization
All of the code will take place in separate repositories. Information on actually commiting, pulling, etc. is in the Workflow chapter.
Most of the code will be implemented as libraries, enabling for them to be used across systems, and worked on separately.
Similarly drivers will be libraries in git submodules
.
- ferrite - The core microkernel code w/ bootloader
- hermes - The package manager
- meteor - The actors library/implementation
- gravitas - The library for working with storage
- pulsar - Networking code
- photon - GUI library
Overall Design
Connections
erDiagram BOOTLOADER ||--|| KERNEL: runs KERNEL ||..o{ GRAVITAS: uses KERNEL }|..o{ METEOR: uses GRAVITAS }|..o{ HAL: uses GRAVITAS ||--o{ DISK: IO KERNEL ||--|{ MEMORY: maps KERNEL ||--o{ EXE: runs EXE }|..o{ METEOR: uses EXE }|..o{ KERNEL: msg
Startup Flow
flowchart TD boot[Bootloader] --> kern(Kernel) kern --> disk(Read Disk) --> ind(Index Filesystem) --> parse(Parse Configuration) --> run(Run Startup Programs) parse -.-> sh([Interactive Shell]) kern --> mem(Map Memory) -.-> ind run ==> actor([Create Actors])
Actor System
Actors work as an abstraction over data storage and messaging. It allows for all systems (GUI, Programs, etc.) to work together, and rely on the same features. It reduces work of implementation, and all implementations can use the functions.
Features
- Petnames
- OCAP security
- HMAC message verification
Format
#![allow(unused)] fn main() { // Different possible types of actors (more to be added) enum ActorType { GUI(photon::Widget), ProgramInterface, } // Possible states an actor can be in enum ActorState { Receive, Send, Work, Idle, } // Cryptographic keypair struct KeyPair { privkey: u128, pubkey: u128, } // The actor itself struct Actor<D: DataInterface> { petname: Option<String>, // Human-meaningful petname (explored further down) uuid: Uuid, // Unique identifier namespace: Uuid, // Parent namespace of this actor actor_type: ActorType, state: ActorState, keys: Option<KeyPair>, // Cryptographic keypair creation_date: DateTime, modified_date: DateTime, data: Option<D>, // Optional data of the generic D type } impl Actor { fn new(namespace: Uuid, a_type: ActorType) -> Self { Actor { petname: None, uuid: Uuid::new(), namespace: namespace, actor_type: a_type, state: ActorState::Idle, keys: None, creation_date:: now(), modified_date: now(), data: None, } }; } impl KeyPair { async fn generate_keypair(&mut self) -> Self; // Generate a public/private keypair (threaded) fn get_pubkey(&self) -> u128; // Return the keypair of an Actor async fn sign(&self, &[u8]) -> Result<&[u8], Error>; // Sign some data with a private key (threaded) async fn verify_signature(&[u8], u128) -> Result<(), Error>; // Verify signed data (threaded) } trait FilesystemInterface { // Interfacing with the filesystem async fn read(&mut self) -> Result<(), Error>; // Read the data from the disk into the Actor using the Uuid as a search key async fn write(&self) -> Result<(), Error>; // Write the data to the disk using the Uuid as a key } trait DataInterface { // Necessary data functions async fn to_bytes(&self) -> Result<&[u8], Error>; // Convert the data into a byte array } trait MessageInterface { // Sending & receiving messages async fn send_message(&self, MessageType, Uuid) -> Result<(), Error>; // Send a message to a recipient async fn receive_message(&self, Channel) -> Message; // Asynchronously wait for an incoming message, and deal with the first one we get } }
OCAP
TODO
Messages
- postcard for message passing
- Priority Queue for processing multiple messages, while dealing with higher-priority ones first
Messages will be fully modelled so an actor can know exactly what they have to deal with, and what they can send.
Different channels are used to make each one less clogged up, and used only for a specific purpose.
Actors can read from/write to a specific channel, allowing them to ignore the others.
They can then also deal with channels in different ways, maybe deprioritizing the Test
channel.
#![allow(unused)] fn main() { enum Channel { // Channels for sending/receiving messages on Graphics, // Low-latency graphics updates Test, // Designated channel for testing messages Filesystem, // Batch filesystem operations Print, // Printing text Executable, // Executable-related messages } enum ProcessCode { Exit, // Exit the process Save, // Save data Clear, // Clear data Restart, // Restart process } enum MessageType { Ping(String), // Simple test if we can send/recieve a message FilesystemUpdate(gravitas::FileOperation), // We want to operate on the filesystem GraphicsUpdate(photon::GraphicsOperation), // Update a graphics window TextUpdate(String), // Send some text (text mode only) ProcessUpdate(ProcessCode), // Send some info about an operation to be done on the current process. Usually kernel -> exe } struct Message { id: Uuid, // UUID of the message itself m_type: MessageType, // Message type & content priority: u8, // For priority queueing sender: Uuid, // Who is sending the message recipient: Uuid, // Who the message is meant for } }
An example message handling loop may look like this:
#![allow(unused)] fn main() { loop { // Continuously loop through message sending & receiving actor.send_message(MessageType::Ping("hello!".to_string())).await; // Block and await until we can send the test message. match actor.receive_message(&self, Channel::Test).await.m_type { // Match on a message type Ping(s) => println!("We got pinged! {}", s), // Print if we got pinged _ => {}, // Ignore other states } } }
Latency
Security Features
Mercury is designed with security in mind from the beginning.
- First, we will be using Orion - a pure Rust crypto library.
- There is built in support for checksums and AES encryption in the filesystem.
- HMAC1 will be used for message passing - which additionally allows for encrypted messages.
- nanorand RNG
- HighwayHash is used for checksums
- Argon2id is used for key-derivation
Isolation
To-Do
Microkernel
The core kernel
of Mercury will be highly limited, implementing only necessary portions.
This allows other functionality to be simply run in userspace.
Additionally, most code should be put into separate libraries then pulled into the kernel
code.
This will likely be done via git submodules
.
Initially, it will be built for RISC-V
, then ARM
(focused on running in a VM), then on a Raspberry Pi.
Afterwards, we can put focus towards building out various features.
Support for multiple targets will be done via Cargo.toml
targets, cross-compilation, and conditional compilation.
Concurrency
For performance, we will be using various concurrency programming techniques.
For IO intensive operations, async
will be used.
This will include the filesystem, actors, and GUI.
CPU-bound operations are better suited to individual threads
, however.
This might include operations like hashing, encryption, and indexing.
Boot Process
TODO
Memory Management
TODO
Processes
TODO
Error Handling
All errors must be handled gracefully by the kernel
. If possible, they should simply log an error.
If not, they can display it to the user, preferably in a simple format, maybe using something like const_panic or snafu.
GUI
Eventually, programs will be able to use the photon
library to have access to a graphics API.
This will initialize various actors to represent parts of the UI.
Performance
The GUI is one of the systems where latency is far more important than throughput. There are several things that aid with performance of this system:
- Each window is drawn in a separate buffer, allowing for easy concurrency
- Messages use the latency bus
- All draws are based on optimized and simple low-level operations
- Only changes are re-rendered
Drawing
When a GUI element wants to update, it first sends a message to the kernel
.
The kernel
then calculates the overlaying of each window, writes each window to its own buffer, then updates the screen buffer with ones that have changed, which is then drawn to the screen.
This ensures that only necessary parts are re-rendered, and the rendering can be done asynchronously/threaded.
The photon
library will not only provide a high-level API for applications to use, but also lower-level drawing methods for the kernel
to use.
These may include line, rectangle, triangle, and circle drawing methods, as well as being able to render text.
Flow
flowchart LR app(Application) --> kern(Kernel) kern --> buf([Buffer]) kern --> app buf --> dis((Display))
Styling
Styling of GUI elements is done via a global configuration.
The kernel
parses this information, and uses it to actually style the widgets provided to it.
Widgets created by the program/library contain no styling data - only information such as text, size, callbacks, etc.
The kernel
does all the display work.
Design
erDiagram WINDOW ||--|{ SECTION: holds WINDOW ||--|| TITLE: has SECTION ||--|| TITLE: has SECTION ||--o{ SECTION: holds SECTION ||--o{ CANVAS: holds SECTION ||--o{ TEXTBOX: holds
Filesystem
I have no idea what I'm doing here. If you do, please let me know, and fix this! This is just some light brainstorming of how I think this might work.
Prelude
Right now, actors are stored in RAM only. But, what if we want them to be persistent on system reboot? They need to be saved to the disk.
I don't want to provide a simple filesystem interface to programs like UNIX does however. Instead, all data should be just stored in actors, then the actors will decide whether or not they should be saved. They can save at any time, save immediately, or just save on a shutdown signal.
Therefore, the "filesystem" code will just be a library that's simple a low-level interface for the kernel
to use.
Actors will simply make requests to save.
Performance
I believe that this format should be fairly fast, but only implementation and testing will tell for sure. Throughput is the main concern here, rather than latency. We can be asynchronous as wait for many requests to finish, rather than worrying about when they finish. This is also better for SSD performance.
- Minimal data needs to read in - bit offsets can be used, and only fixed-size metadata must be known
serde
is fairly optimized for deserialization/serializationBTreeMap
is a very fast and simple data structure- Async and multithreading will allow for concurrent access, and splitting of resource-intensive tasks across threads.
hashbrown
is quite high-performance- Batch processing increases throughput
Buffering
The kernel
will hold two read/write buffers in-memory and will queue reading & writing operations into them.
They can then be organized and batch processed, in order to optimize HDD speed (not having to move the head around), and SSD performance (minimizing operations).
Filesystem Layout
Name | Size | Header |
---|---|---|
Boot Sector | 128 B | None |
Kernel Sector | 4096 KB | None |
Index Sector | u64 | PartitionHeader |
Config Sector | u64 | PartitionHeader |
User Sector(s) | u64 | PartitionHeader |
Partition
A virtual section of the disk. Additionally, it has a UUID generated via lolid to enable identifying a specific partition.
binary-layout can be used to parse data from raw bytes on the disk into a structured format, with no-std
.
#![allow(unused)] fn main() { use binary_layout::prelude::*; const LABEL_SIZE: u16 = 128; // Example number of characters that can be used in the partition label define_layout!(partition_header, BigEndian, { partition_type: PartitionType, // Which type of partition it is num_chunks: u64, // Chunks in this partition uuid: Uuid }); enum PartitionType { Index, // Used for FS indexing Config, // Used for system configuration User, // User-defined partition } fn parse_data(partition_data: &mut [u8]) -> View { let mut view = partition_header::View::new(partition_data); let id: u64 = view.uuid().read(); // Read some data view.num_chunks_mut().write(10); // Write data return view; } }
Chunk
Small pieces that each partition is split into. Contains fixed-length metadata (checksum, encryption flag, modification date, etc.) at the beginning, and then arbitrary data afterwards.
binary-layout
is similarly used to parse the raw bytes of a chunk.
#![allow(unused)] fn main() { use binary_layout::prelude::*; const CHUNK_SIZE: u64 = 4096; // Example static chunk size (in bytes) define_layout!(chunk, BigEndian, { checksum: u64, modified: u64, // Timestamp of last modified uuid: u128, data: [u8; CHUNK_SIZE], }); }
This struct is then encoded into bytes and written to the disk. Drivers for the disk are to be implemented. It should be possible to do autodetection, and maybe for Actors to specify which disk/partition they want to be saved to.
AES encryption can be used, and this allows for only specific chunks to be encrypted.1
Reading
On boot, we start executing code from the Boot Sector. This contains the assembly instructions, which then jump to the kernel
code in the Kernel Sector.
The kernel
then reads in bytes from the first partition (as the sectors are fixed-size, we know when this starts) into memory, parsing it into a structured form.
From here, as we have a fixed CHUNK_SIZE
, and know how many chunks are in our first partition, we can read from any chunk on any partition now.
On startup, an Actor can request to read data from the disk. If it has the right capabilities, we find the chunk it's looking for from the index, parse the data, and send it back.
Also, we are able to verify data. Before passing off the data, we re-hash it using HighwayHash to see if it matches. If it does, we simply pass it along like normal. If not, we refuse, and send an error message.
Writing
Writing uses a similar process. An Actor can request to write data. If it has proper capabilties, we serialize the data, allocate a free chunk, and write to it. We hash the data first to generate a checksum, and set proper metadata.
Permissions
Again, whether actors can:
- Write to a specific disk/partition
- Write to disk at all
- Read from disk
will be determined via capabilities
Indexing
Created in-memory on startup, modified directly whenever the filesystem is modified. It's saved in the Index Sector (which is at a known offset), allowing it to be read in easily on boot.
The index is simply an alloc::
BTreeMap. (If not, try scapegoat).
We also have a simple Vec
of the chunks that are free, which we modify in reverse.
#![allow(unused)] fn main() { let mut index = BTreeMap::new(); // Basic Actor index let mut free_index = Vec<u64>; // Index of free chunks struct Location { partition: Uuid, // Partition identified via Uuid chunks: Vec<u64>, // Which chunk(s) in the partition it is } let new_data_location = Location { partition: Uuid::new(), chunks: vec![5, 8], // 5th & 8th chunk in that partition } index.entry(&actor.uuid).or_insert(&new_data_location); // Insert an Actor's storage location if it's not already stored for i in &new_data_location.chunks { free_index.pop(&i); // Remove used chunks from the free chunks list } index.contains_key(&actor.uuid); // Check if the index contains an Actor's data index.get(&actor.uuid); // Get the Location of the actor index.remove(&actor.uuid); // Remove an Actor's data from the index (e.g. on deletion) for i in &new_data_location.chunks { free_index.push(&i); // Add back the now free chunks } }
This then allows the index to be searched easily to find the data location of a specific Uuid
.
Whenever an actor makes a request to save data to it's Uuid
location, this can be easily found.
It also allows us to tell if an actor hasn't been saved yet, allowing us to know whether we need to allocate new space for writing, or if there's actually something to read.
To-Do
- Snapshots
- Isolation
- Journaling
- Resizing
- Atomic Operations
Executable Format
Programs written in userspace will need to follow a specific format.
First, users will write a program in Rust, using the Mercury libraries, and with no-std
.
They'll use Actors to communicate with the kernel
.
Then, they'll compile it for the proper platform and get a pure binary.
This will be ran through an executable packer program, and the output of which can be downloaded by the package manager, put on disk, etc.
It'll then parsed in via bincode
, then the core is ran by the kernel
in userspace.
Additionally, the raw bytes will be compressed.
Then, whether reading from chunks from memory or disk, we can know whether it will run on the current system, how long to read for, and when the compressed bytes start (due to the fixed length header).
It is then simple to decompress the raw bytes and run them from the kernel
.
#![allow(unused)] fn main() { enum Architecture { RiscV, Arm, } struct PackedExecutable { arch: Architecture, size: u64, compressed_bytes: [u8], } }
Specific details to be figured out later
Configuring a Build Environment
All of this build information is for Linux
right now, as I don't want to mess around with getting stuff working on Windows
.
Just use WSL
if you want.
BSD
should work similarly, but if not, please let us know!
Of course, first you will need to install Rust
. The best way to do this is through rustup.
Additionally, you will need to install clippy and rustfmt.
You probably will also want to cargo install sccache
, and export the RUSTC_WRAPPER='sccache'
environment variable however your shell does it.
Then, you'll want to git clone
whatever repository you're wanting to work on.
Building
TODO: Figure out how building actually works.
When you're just testing, cargo
will use the dev
profile.
This will allow for debugging, and faster compilation speeds.
However, all releases will be using the release
profile, which is much slower, but better optimized.
It's unlikely you'll want to use this on your computer.
Development Workflow
1. Pull Down Code
git clone
the repository of whatever code you're wanting to work on.
Make a new branch for the feature you want to do.
Maybe it's a new feature, or fixing a bug (please file it in the issue tracker!).
2. Do Coding
Work on your code however you do it, make sure to cargo fmt
before each commit, sign your commits, and commit fairly often.
3. Test
Clippy
Run this clippy
command, and try and ensure there are no warnings:
cargo clippy -- -W clippy::pedantic -W clippy::suspicious -W clippy::complexity -W clippy::perf -W clippy::cargo -W clippy::nursery -W clippy::unwrap_used -D warnings
Automated Tests
Run cargo test
to ensure all of the tests still pass.
If needed, add your own tests for your new code.
Run
Of course, manually run the code in a VM and see if everything works how it should.
3. Document
Use inline rustdoc to document your code.
4. Push
Push the code to a new branch in the repository.
If it's ready and fully working, make a pull request to merge it into main
.
5. CI
Eventually, I want to have some sort of CI system, to allow automated tests, checking, and releases.
Debugging
Debugging code and understanding what's going wrong is highly important, especially in a complex setup such as this. We will try to make it as easy as possible, with tools such as tracing.
There are several external tools that can be used as well.
Using the OS
Once we actually have a working kernel
, information on running it will be added here!