Known Limitations#
This page details undesired behaviours or missing pieces of the current system, pending future development.
Scalability#
Badge Scalability#
We use the 64-bit badge value of endpoint capabilities to track their purpose - either as RDEs or as resources. For convenience and efficiency, all relevant information is stored / retrieved by masking the badge itself. However, this introduces a scalability problem, especially since we cannot have more than 255 resource types or resource spaces. Eventually, we should replace the badge value with some unique ID that can be used to find a corresponding data structure. For example, the badge value could be the ID for a hash table maintained by the corresponding resource server (or root task), and the value is a structure containing the cap type, permissions, space ID, client ID, and object ID with large-enough fields for scalability.
Resource Directory Scalability#
The resource directory is maintained as a simple, static array in the PD’s shared data page. It has a static maximum of 8 non-core resource types, and 8 spaces per resource type, for a particular PD’s resource directory. A dynamic data structure would be more difficult since the shared data page is used by two PDs, so pointers would need to be adjusted across address spaces.
Garbage Accumulation#
We have tried to eliminate garbage accumulation for long term use of the system, but there are a small number of known sources of garbage.
ASID Pools#
On aarch64, an ASID pool contains enough space for up to 512 VSpaces. If we run out of space in the default ASID Pool due to a large number of address spaces in the system, we can create up to 128 (for aarch64) ASID pools, each of them taking a 4K page of memory. If the address spaces are being destroyed as well, some of these pools may become unused. We would need to introduce some reference tracking to identify when an ASID Pool can be destroyed. Note that this source of garbage will not actually occur in the system currently, since the badge scalability issue will already prevent us from creating more than 0xFE = 254 address spaces. - As a side note, we learned that destroying a VSpace does free its assigned ASID. However, destroying all the VSpaces assigned to an ASID pool will not automatically destroy the pool.
Revoked Slots#
When we revoke a resource from a PD, we do not free the slot in its CSpace. This is so that the slot will not get filled with some other resource, potentially causing the PD to use the new resource unknowingly while it tries to use the old resource. If the system has a lot of revoked resources, these empty revoked slots could eventually fill up a CSpace. The alternative would be to have a handler in each PD to be notified when resources are revoked.
Ramdisk Server Bound Page#
The ramdisk server keeps a shared memory page with each of its clients. The client can notify the server to unbind the page using ramdisk_client_unbind. However, if the client terminates without calling unbind, the ramdisk will not be notified, and the MO will remain attached to its address space. This has not been an issue in our test scenarios, since we have a small number of clients using the same ramdisk, and the ramdisk server is restarted between tests. A general solution to this problem would be to introduce a new type of async work task, which notifies a resource server when a client is disconnected (or equivalently, when an RDE is removed).
Model State Extraction#
Partial State#
The system is currently only intended to extract the full system’s model state, and not just a subgraph centered on a particular PD. When a PD requests a model extraction, the root task iterates over all PDs in the system and extracts their state. Alternatively, we might want the ability to extract the model state of only one PD. This process could proceed as follows:
The root task iterates over the PD’s resource directory entries
Each entry is added to the model state
The root task iterates over the PD’s held resources
If the resource is a core resource, then the root task can add all relevant information to the model state
If the resource is not a core resource, the root task reaches out to the corresponding resource server for a subgraph
If the subgraph includes resources from other resource servers, the root task will need to recursively reach out to those resource servers as well
If the resource is a PD: This is still an open question, do we recursively dump the PD as well?
Runtime Metrics#
The system does not currently support calculating the model metrics (RSI & FR) at runtime. It would be possible to do so if we modify the model extraction utility to store the graph in a traversable data structure and implement the calculation algorithms.
Resource Space Metadata#
The system does not currently track where metadata is stored for particular resource spaces. For example, an address space is not associated with the structure containing its metadata. Another example is that the file system does not show an association between the file space and the blocks used for file system metadata (ie. superblock and inodes). At a finer granularity, we might even want to track where metadata is stored for particular edges of the model.
Resource / PD Cleanup#
Fault Handler Dependencies#
The implementation does not explicitly track a “fault edge” between a PD and its fault handler, so it is unable to clean up a PD if its fault handler crashes. We suspect that this would be a non-configurable option to recursively follow fault edges and clean up all PDs along the path.
Cleaning up Resource Space Metadata#
As noted in the section above, we don’t track where metadata is stored for resource spaces. If we tracked a map relation from resource spaces to metadata, then we would count this relation as another potential dependency for resource space cleanup.
PD Creation#
Thread-PD Capability Space Synchronization#
Threads-PDs each have their own capability spaces, which are currently not synchronized. Upon creating a new thread, the PD creation module does not attempt to copy caps from the creator thread into the same slots in the new thread, and any new caps allocated by threads within the same process-PD are not synchronized between them. The developer must ensure that references to caps across thread-PDs are valid, by manually sending caps allocated for a certain thread to other threads.
Sample Apps#
File System#
We have made some simplifying assumptions for the file system:
The file system assumes that only one thread accesses it (which will remain true as long as the file server is the only PD that may access it). This means that any logic related to file locking or mutual exclusion has been removed.
The file system’s logging mechanism is disabled, due to the assumption that the file system is unrecoverable if the system crashes (since it is stored in memory).
The file server’s API does not allow for any operations related to file ownership or permissions, as we expect this functionality to be handled by capability permissions in the future.
Since there are no permissions, any PD with an RDE for a file namespace is assumed to have full permissions on any file within the namespace.
Another limitation is that the file server does not yet show file names in the model state, and instead it shows namespaces as file resource spaces that map to the default file resource space. It should be possible to show the file names in the model state without explicitly tracking them as resources with capabilities in the system.