Dynamic Userland Application Loading #3941

viswajith-g · 2024-03-28T22:18:28Z

Pull Request Overview

This pull request adds dynamic userland application loading to the kernel.

There are three stages to this:

Setup Phase
Flash Phase
Load Phase

Setup Phase

During the setup phase, a userland application passes the size of the new binary to the app_loader capsule.
This capsule forwards the size to the kernel which determines if there is enough space in the flash to write the new app and what address the app should be written to. On success, the kernel returns the application size it has set up for to the capsule.
The capsule returns a success to the userland app.
On Failure, the kernel passes the reason for failure to the capsule, and the capsule passes a FAIL error to the userland app.

Flash Phase

After the userland app receives the Ok from the capsule, the app sends the binary of the new app 512 bytes at a time along with the offset.
The capsule checks that these values do not violate the bounds dictated by the kernel and then passes the binary data to the kernel.
Upon receiving the binary data, the kernel once again checks if the data is within the bounds specified during the setup phase and then writes the data to flash. On success, the kernel passes success to the app via the capsule. On failure, the same is propagated to the userland application.

Load Phase

Once the userland app receives confirmation that the final set of writes is completed, the app sends a load request to the kernel via the capsule. The kernel looks for the process binary at the address we just finished writing and converts the app binary into a process object. Upon success, the kernel tries to add the process object to the processes array. Once this is done, the process is officially active, and the kernel writes padding after the newly installed app to preserve the continuation of the linked list for future app loading.

Testing Strategy

This pull request was tested by running the app_loader application, a test userland app to verify if the new app (blink) was written and loaded correctly on nRF52840DK. This was tested in three configurations:

When only the app_loader user app is installed on the device.
- In this case, the dynamic app loader writes the new app after the app_loader user app and then loads it.
When there is another app in addition to app_loader.
- The device had two apps installed on it, one was adc and the other was app_loader. In this case, the dynamic process loader finds available flash after the two apps and writes blink there after which it loads the new app.
When there is padding in between applications.
- In this case, the device had padding, then app_loader and more padding. Dynamic process loader fits the app in the padding area before app_loader, and writes new padding between blink and app_loader.

A previous implementation of the process loader without the process binaries is available at this repo. This version was tested on the Imix board, and given not too much has changed, I think the new implementation should work on Imix.

TODO or Help Wanted

More testing.
Feedback and merge.

Documentation Updated

✅ Updated the relevant files in /docs, or no updates are required.

Formatting

✅ Ran make prepush.

This version works by writing the app first and then creates the process and adds it to the processes array using the synchronous process loading methods. Finally it writes the padding app. There are new overheads introduced however in the form of load_processes_return_memory() and load_processes_from_flash_return() in process_loading.rs. This is because, when the board initially does setup, we want it to pass the remaining RAM available to the dynamic process loader to load new apps. Currently load_processes() does not return the remaining memory, so those two new methods were added.

fixed some warnings causing CI fail

kernel/src/dynamic_process_loading.rs

bradjc · 2024-03-28T22:40:59Z

kernel/src/dynamic_process_loading.rs

+ Ok(())
+ }
+
+ fn find_next_available_address(&self, new_app: &mut NewApp) -> Result<(), ErrorCode> {


I think this function should take a slice and return the values, not take in a mutable object.

I think I've implemented it, please check.

bradjc · 2024-03-28T22:48:51Z

kernel/src/dynamic_process_loading.rs

+ // reset the global new app start address and length values
+ self.new_app_start_addr.set(0);
+ self.new_app_length.set(0);


I think this needs to happen even if an error occurs above.

I think I fixed that now.

bradjc · 2024-03-28T22:50:09Z

kernel/src/dynamic_process_loading.rs

+ // thererfore we need to add the physical address of the new app
+ // for each write
+ } else {
+ offset // for padding, the kernel passes the address, so no need to add an offset


This is confusing. Maybe a separate function?

I created a separate function for the bounds check and the physical address calculation. It is in a place that I am slightly unhappy with, but it gets the job done.

bradjc · 2024-03-28T22:54:45Z

kernel/src/dynamic_process_loading.rs

+ if !padding_flag {
+ match command {
+ NonvolatileCommand::UserspaceRead | NonvolatileCommand::UserspaceWrite => {
+ if offset >= self.new_app_start_addr.get()


What does this check do?

I think this should compare to length.

bradjc · 2024-03-28T22:57:38Z

kernel/src/dynamic_process_loading.rs

+ NonvolatileCommand::UserspaceRead | NonvolatileCommand::UserspaceWrite => {
+ if offset >= self.new_app_start_addr.get()
+ || length > self.new_app_length.get()
+ || offset + length > self.new_app_length.get()


This can be problematic if offset and/or length are very large: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=ad881cb6ed56bfa015fcd78983987b6f

Previous checks should mitigate this.

added checked_add() to make sure we catch overflows.

bradjc · 2024-03-28T22:58:09Z

kernel/src/dynamic_process_loading.rs

+ NonvolatileCommand::UserspaceWrite => {
+ self.driver.write(user_buffer, physical_address, active_len)
+ }


Where do we validate that we always end up with a valid TBF header in flash?

I'm going to work on this next. While at it, I also have some ideas on improving the state machine to ensure that the user app always tries to write to valid and successive address blocks.

bradjc · 2024-03-29T19:06:11Z

kernel/src/dynamic_process_loading.rs

+ buffer: &'static mut [u8],
+ offset: usize,
+ app_size: usize,


Can you replace buffer and app_size with a SubSliceMut?

Do we want the capsule to do the sub slicing?

bradjc · 2024-03-29T19:08:13Z

kernel/src/dynamic_process_loading.rs

+ }
+ }
+
+ // Needs to be set separately, or breaks grants allocation


While I agree this needs to be a function, what does breaking grant allocation mean?

Explained better.

bradjc · 2024-03-29T19:08:55Z

kernel/src/dynamic_process_loading.rs

+
+ /******************************************* NVM Stuff **********************************************************/
+
+ // This function checks whether the new app will fit in the bounds dictated by the start address and length provided


Use /// comments for function comments.

kernel/src/dynamic_process_loading.rs

changed it so that the capsule sends a subslice of the buffer to the kernel so the kernel does not have to compare lengths that the capsule provided vs the length of the buffer the capsule actually sent. Generally a safer approach.

Improved the state machine to better track if a userland app is requesting write to the bounds allocated to it during the setup phase. This state machine also ensures that when the process loader is busy, another request cannot be made.

Improved the state machine to better track if a userland app is requesting write to the bounds allocated to it during the setup phase. Also tracks if the dynamic process loader is currently busy. Performed some code cleanup.

The kernel now validates the header before writing the app into flash. In addition, upon failure at any stage, the flash region will be reclaimed for future use. Added a Fail state to help track failure modes better.

fixed two instances in app write when resets were not taking place. also fixed an improper condition check for declared length vs length in header.

An app that was successfully could not be made into a process object because of errant state tracking. That was fixed. Additionally, removed unnecessary state and parameter clears during Busy state.

alistair23 · 2024-04-02T04:06:36Z

kernel/src/process_loading.rs

+// dynamic process loader to find the right spot to load the new app because otherwise
+// we are not privy to this information.
+
+#[inline(always)]


The current return type for load_processes is Result<(), ProcessLoadError>. We want the function to return the remaining memory when the board initializes the processes in main.rs. Currently, changing the return type breaks the build for all other boards.

What we require:

.unwrap_or_else(|(err, remaining_memory)| { debug!("Error loading processes!"); debug!("{:?}", err); remaining_memory });

What we currently have:

.unwrap_or_else(|err| { debug!("Error loading processes!"); debug!("{:?}", err); });

I have duplicated the functions needed to return the memory for this implementation, the other fix is to change the return type for all the supported boards.

I meant why #[inline(always)]

But in terms of duplicating a function, it would probably make more sense to just change the boards to ignore the return type instead of duplicating a function just to return something new

I removed the inline, forgot to remove it after copying the functions.

This is just to test if the dynamic process loading works, so I think it is less work to duplicate the functions to prove the idea than to change the return types for all boards.

It's a lot more work to maintain duplicate functions

Right, but the synchronous functions will go away when we move to async loading. That will remove the need to change the return types. If we make the change now, we will have to revert later which is pointless. The current implementation takes a step forward in enabling dynamic process loading, but is not the end goal.

alistair23 · 2024-04-02T04:09:14Z

kernel/src/dynamic_process_loading.rs

+ self.load_processes_from_flash(app_flash, app_memory)?;
+
+ if config::CONFIG.debug_process_credentials {
+ debug!("Checking: no checking, load and run all processes");


I don't follow, how do we know there is no checking?

I have not implemented credential checking yet.

Can we not just use the existing process loader, which should do the checking for us? It seems strange to duplicate the loading

No, the current synchronous process loader does not perform process checking anyway. Plus we lose access to the current processes array after loading the new app if we do not duplicate the functions.

Why can't we just modify the existing ones to not loose access?

alistair23 · 2024-04-02T04:10:38Z

kernel/src/process_loading.rs

+// load_processes_return_memory() and load_processes_from_flash_return are copies
+// of load_processes_return_memory() and load_processes_from_flash(). These functions
+// are created so that they return the remaining memory to main.rs after loading
+// all the apps initially. This remaining_memory value is then used by the
+// dynamic process loader to find the right spot to load the new app because otherwise


Why not just modify the existing ones instead of creating a copy?

Apps were previously aligned but we were not writing padding before new apps to match the linkedlist with the alignment. This is now fixed. A new enum was added to track the padding requirement. Additionally, the setup phase is now asynchronous to accomodate cases where you may or may not have to write padding after an app depending on if there is an app stored beyond our newly written app. This change propogates to the capsule as well.

check_padding_requirements was called twice once during postpad and during prepad to return the next_app_start_addr and previous_app_end_addr respectively. Changed the function so that the values are stored and the function does not have to be called twice.

changed the way the app address is identified during setup phase to align the app to the power of 2 based on its size and start address.

bradjc · 2024-04-11T20:22:47Z

capsules/extra/src/app_loader.rs

+ // Internal buffer for copying appslices into.
+ buffer: TakeCell<'static, [u8]>,
+ // What issued the currently executing call.
+ current_user: OptionalCell<NonvolatileUser>,


Suggested change

current_user: OptionalCell<NonvolatileUser>,

current_user: OptionalCell<ProcessId>,

bradjc · 2024-04-11T20:27:51Z

capsules/extra/src/app_loader.rs

+ let result = self
+ .buffer
+ .take()
+ .map_or(Err(ErrorCode::RESERVE), |buffer| {
+ let mut write_buffer = SubSliceMut::new(buffer);
+ write_buffer.slice(..write_buffer.len());
+ let res = self.driver.write_app_data(write_buffer, offset, pid);
+ match res {
+ Ok(()) => Ok(()),
+ Err(e) => Err(e),
+ }
+ });


Why doesn't enqueue() just handle this?

bradjc · 2024-04-11T20:30:17Z

capsules/extra/src/app_loader.rs

+
+ // And then signal the app.
+ kernel_data
+ .schedule_upcall(upcall::WRITE_DONE, (length, 0, 0))


First argument to userpsace should always be an errorcode (or 0 for success).

bradjc · 2024-04-11T20:31:24Z

capsules/extra/src/app_loader.rs

+ if self.current_user.is_none() {
+ // No app is currently using the underlying storage.
+ // Mark this app as active, and then execute the command.
+ self.current_user.set(NonvolatileUser::App {
+ processid: processid,
+ });


Shouldn't this check if the processid matches the current user?

capsules/extra/src/app_loader.rs

bradjc · 2024-04-11T20:46:39Z

kernel/src/dynamic_process_loading.rs

+ // the app is trying to write to an address that is not the immediate continuation of where
+ // the previous write left off. Return an Invalid error.
+ if offset != self.next_offset.get() {
+ // check if the app is trying to write to the correct offset
+ self.reset_process_loading_metadata(); // clear out new app size, start address, current offset and state
+ return Err(ErrorCode::INVAL);
+ }


Oh, I see. But what if the loader is doing compression where it doesn't need to write every byte?

I did not think of that

bradjc · 2024-04-11T20:47:04Z

kernel/src/dynamic_process_loading.rs

+ Some(State::LastWrite)
+ | Some(State::Setup)
+ | Some(State::Load)
+ | Some(State::PaddingWrite)
+ | Some(State::Fail)
+ | None => Err(ErrorCode::BUSY),


Just _ => Err() is fine.

bradjc · 2024-04-11T20:48:13Z

kernel/src/dynamic_process_loading.rs

+ self.previous_app_end_addr.set(previous_app_end_addr);
+ self.next_app_start_addr.set(next_app_start_addr);


This function is called check_*, but it modifies internal state. That is somewhat confusing.

renamed it to compute_padding_requirements_and_neighbors()

bradjc · 2024-04-11T20:50:22Z

kernel/src/dynamic_process_loading.rs

+ fn find_next_available_address(
+ &self,
+ flash: &'static [u8],
+ app_length: usize,
+ ) -> Result<usize, ErrorCode> {


I think it would more straightforward to just have this function return more, and encompass all of the work of determining where an app can go and what padding requirements there are. So this would return the address and the padding needs.

I think padding needs should not be returnable because we are setting it using an OptionalCell. This way, we don't have to call the function again during load to check if we need to write pre-pad. (I know I am calling it right now, I should not be doing that because the value is set the first time anyway, so I removed the check_padding_req() from load).

bradjc · 2024-04-11T20:51:11Z

kernel/src/dynamic_process_loading.rs

+ self.new_app_length.set(app_length);
+
+ // check what kind of padding we have to write, no padding, pre padding, post padding or both pre and post padding
+ self.check_padding_requirements();


With check_padding_reqs integrated into find_next_available this code is a little more straightforward.

so we compute the requirements in find_next_address, but still write the padding in setup right? I find it odd otherwise that the function trying to find the next address and padding requirement also writes the padding. It makes sense that the main setup logic should take care of this.

Moved some logic to improve flow. Changed write so that we don't track write validity based on offset increments. Added checks to make sure the header is not manipulated without changing the whole header.

bradjc · 2024-05-09T21:41:13Z

capsules/extra/src/app_loader.rs

+//! | userspace |
+//! | |
+//! +-----------------------------------------------------------------+
+//! kernel::Driver


Suggested change

//! kernel::Driver

//! kernel::SyscallDriver

bradjc · 2024-05-09T21:42:26Z

capsules/extra/src/app_loader.rs

+ /// - Returns ErrorCode::FAIL if:
+ /// - The kernel is unable to create a process object for the application
+ /// - The kernel fails to write a padding app (thereby potentially breaking the linkedlist)
+


Suggested change

bradjc · 2024-05-09T21:45:13Z

capsules/extra/src/app_loader.rs

+ // Shorten the length if the application gave us nowhere to
+ // put it.
+ let active_len = cmp::min(length, allow_buf_len);


Suggested change

// Shorten the length if the application gave us nowhere to

// put it.

let active_len = cmp::min(length, allow_buf_len);

// Shorten the length if the application did not give us

// enough bytes in the allowed buffer.

let active_len = cmp::min(length, allow_buf_len);

bradjc · 2024-05-09T21:49:05Z

kernel/src/dynamic_process_loading.rs

+pub struct DynamicProcessLoader<'a, C: 'static + Chip> {
+ kernel: &'static Kernel,
+ chip: &'static C,
+ fault_policy: &'static dyn ProcessFaultPolicy,
+ procs: MapCell<&'static mut [Option<&'static dyn process::Process>]>,
+ flash: Cell<&'static [u8]>,
+ app_memory: OptionalCell<&'static mut [u8]>,
+ new_process_flash: OptionalCell<&'static [u8]>,
+ flash_driver: &'a dyn NonvolatileStorage<'a>,
+ buffer: TakeCell<'static, [u8]>,
+ new_app_start_addr: Cell<usize>,
+ new_app_length: Cell<usize>,
+ previous_app_end_addr: Cell<usize>,
+ next_app_start_addr: Cell<usize>,
+ client: OptionalCell<&'static dyn DynamicProcessLoadingClient>,
+ state: Cell<State>,
+ padding_requirement: OptionalCell<PaddingRequirement>,
+}


Can you move all of the state that is temporary (ie is only valid/set when an app is being loaded) to its own struct, and just have that entire struct in a cell (optionalcell)?

This is complicated enough it would help to be able to tell what is transient and what isn't.

bradjc · 2024-05-09T21:56:49Z

kernel/src/dynamic_process_loading.rs

+ match self.find_next_available_address(flash, app_length) {
+ Ok(new_app_start_address) => {
+ let offset = new_app_start_address - flash_start;
+ let new_process_flash = self
+ .flash
+ .get()
+ .get(offset..offset + app_length)
+ .ok_or(ErrorCode::FAIL)?;
+
+ self.new_process_flash.set(new_process_flash);
+ self.new_app_start_addr.set(new_app_start_address);
+ self.new_app_length.set(app_length);
+
+ match self.padding_requirement.get() {


This is too confusing. find_next_available_address() should just do a find and return everything (including the need for padding). That state is mutated deep in here makes it hard to reason about how this works.

bradjc · 2024-05-09T22:01:14Z

kernel/src/dynamic_process_loading.rs

+ None | Some(PaddingRequirement::PrePad) => {
+ self.state.set(State::AppWrite);
+ self.client.map(|client| {
+ client.setup_done();
+ });


We can't do this. We can't call a callback from a downcall (ie from DynamicProcessLoading.setup).

This needs to just return something that says "setup complete" (as opposed to "setup is still pending") or this file needs a deferred call to signal the callback.

bradjc · 2024-05-09T22:03:40Z

kernel/src/dynamic_process_loading.rs

+ // There is no header here
+ // if we fail here, let us erase the app we just wrote
+ self.state.set(State::PaddingWrite);
+ let _ = self.write_padding_app(


I assume this won't work if we are already writing a padding app from line 1111?

It works, the padding app from line 1111 only writes prepad. This line erases the app that was written and postpad by extending the prepad to the next app's start address. Essentially we rewrite the prepad such that it reclaims the memory allocated to the new app.

viswajith-g added 5 commits March 28, 2024 12:44

fixed warnings

a331d70

fixed some warnings causing CI fail

fixed some format issues

e4073f1

clippy fixes

b575bec

fixed header check issue

12d6e60

github-actions bot added kernel component labels Mar 28, 2024

bradjc reviewed Mar 28, 2024

View reviewed changes

resolved some pr comments

aa51671

bradjc reviewed Mar 29, 2024

View reviewed changes

viswajith-g added 3 commits March 31, 2024 19:27

pr comments changes

2b54b13

changed it so that the capsule sends a subslice of the buffer to the kernel so the kernel does not have to compare lengths that the capsule provided vs the length of the buffer the capsule actually sent. Generally a safer approach.

Improved State Machine

8181efc

Improved the state machine to better track if a userland app is requesting write to the bounds allocated to it during the setup phase. This state machine also ensures that when the process loader is busy, another request cannot be made.

Improved State Machine

32b288b

Improved the state machine to better track if a userland app is requesting write to the bounds allocated to it during the setup phase. Also tracks if the dynamic process loader is currently busy. Performed some code cleanup.

viswajith-g mentioned this pull request Apr 1, 2024

Dynamic App Loader Helper tock/libtock-c#387

Open

viswajith-g added 3 commits April 1, 2024 19:03

Header validation + Erasing app upon failure

05f4207

The kernel now validates the header before writing the app into flash. In addition, upon failure at any stage, the flash region will be reclaimed for future use. Added a Fail state to help track failure modes better.

bug fixes

a2156ed

fixed two instances in app write when resets were not taking place. also fixed an improper condition check for declared length vs length in header.

states bug fix

8c9e836

An app that was successfully could not be made into a process object because of errant state tracking. That was fixed. Additionally, removed unnecessary state and parameter clears during Busy state.

viswajith-g requested a review from bradjc April 2, 2024 02:40

alistair23 reviewed Apr 2, 2024

View reviewed changes

github-actions bot assigned bradjc Apr 6, 2024

viswajith-g added 3 commits April 5, 2024 22:28

alignment improvement

e67b478

changed the way the app address is identified during setup phase to align the app to the power of 2 based on its size and start address.

bradjc reviewed Apr 11, 2024

View reviewed changes

Better flow and documentation

c7fe16b

Moved some logic to improve flow. Changed write so that we don't track write validity based on offset increments. Added checks to make sure the header is not manipulated without changing the whole header.

bradjc reviewed May 9, 2024

View reviewed changes

alevy added needs-rebase waiting-on-author labels Jun 6, 2024


		/***************************************** NVM Stuff ********************************************************/

		// This function checks whether the new app will fit in the bounds dictated by the start address and length provided

	current_user: OptionalCell<NonvolatileUser>,
	current_user: OptionalCell<ProcessId>,

		self.previous_app_end_addr.set(previous_app_end_addr);
		self.next_app_start_addr.set(next_app_start_addr);

Dynamic Userland Application Loading #3941

Are you sure you want to change the base?

Dynamic Userland Application Loading #3941

Conversation

viswajith-g commented Mar 28, 2024 • edited

Pull Request Overview

Setup Phase

Flash Phase

Load Phase

Testing Strategy

TODO or Help Wanted

Documentation Updated

Formatting

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viswajith-g Apr 2, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viswajith-g commented Mar 28, 2024 •

edited

viswajith-g Apr 2, 2024 •

edited