Model State#
This section describes how to extract, view, and use the system’s model state as generated
by CellulOS
and /proc
.
Setup the Python venv#
The scripts located at /scripts/proc/
will process the raw CSV, upload it to Neo4j, and calculate RSI / FR metrics.
(Assumes that python 3.10
and virtualenv
are already installed)
cd scripts/proc
# 1. Create the virtualenv (needed once)
python -m venv venv
# 2. Activate the virtualenv
source ./venv/bin/activate
# Install requirements:
pip install -r requirements.txt
CellulOS#
In CellulOS, the tests are the only way to run scenarios.
Run a scenario & Extract Model State#
During a test, print the model state to console using the
pd_client_dump
API call.When running the system tests, you can enable model state extraction with the
GPIExtractModel
configuration option.
Once the test completes, copy the printed model state to a CSV file in the same directory as the scripts.
Ensure the CSV filename is prefixed with
raw_
.
Processing Model State#
Double check if this is still needed.
Processing elevates the model state from implementation-level to model-level. For instance, in implementation one PD may switch between two address spaces, but in the model state this should appear as two separate PDs. The processing currently splits PDs with access to more than one ADS or CPU.
Run
python csv_processing.py
. This will process all files in the current directory of the formraw_<name>.csv
to<name>.csv
.
Examples#
Process#
Running the test and extracting the model state for a simple scenario.
cd BUILD_DIR
# Configure Cmake to built test GPI* of interest
ccmake .
ninja && ./simulate
Parse the o/p and save the model state as the as csv
file.
And then import it to Neo4j as explained below
Virtual Machine#
Running the test and extracting the model state for a scenario with Virtual Machine.
This requires multiple steps such are waiting the VM to boot, rung & extract the model state of the hello process
in the guest, and then extract the model state of the root-task and the VM-PD on the host.
Using a combination of the pexpect
and other scripting techniques, all of this can be done with 1 script.
This script assumes that
the OSmosis dir is
~/OSmosis
the build dir is
~/OSmosis/qemu-build
the buildroot image to run inside the VM is at:
~/buildroot/cellulos/qemu/buildroot-arm-cellulos-with-everything
The script does:
Ensures that the
*.py
scripts in the buildroot dir are same as the ones in the OSmosis repo. This is needed as some of the same scripts are run inside the VM.Ensures the
rootfs.cpio
is the buildroot is not newer than therootfs.cpio
in the OSmosis repo.Outputs: the following CSV files in
./outputs/{vmm}/{datetime}
host.csv
: Model state as generated by CellulOSguest.csv
: Model state as generatedproc_model.py
run inside the VM usingpexpect
. For now it is hard coded to run thehello
program once.g2h.csv
: Mappings betweenguest PA --> host VA
# Generate CSV
sudo -E env PATH="$HOME/.local/lib:$PATH" ./vm_model.py \
--vmm cellulos \
--clean
# Import CSV (assumed noe4j is setup, see section below)
sudo -E env PATH="$HOME/.local/lib:$PATH" python ./import_csv.py \
--files outputs/cellulos/2024_11_02_160915/g2h_file.csv \
outputs/cellulos/2024_11_02_160915/guest.csv \
outputs/cellulos/2024_11_02_160915/host.csv
Proc Model State#
To demonstrate the extraction of model state from an entirely different system, we can build a model state from the contents of Linux’s /proc
virtual filesystem. We run some sample programs, and fetch the corresponding information from /proc
. Currently, this extracts the following information:
Virtual memory regions, their permissions, and their purpose (heap, stack, file, etc.).
Physical memory regions and their mappings from virtual.
Devices which the physical memory regions originate from.
Setup on Ubuntu Host#
cd ./scripts/proc
# Activate the virtualenv:
source ./venv/bin/activate
# Build the `pfs` module: `pfs` is a c++ library, so we use a `pybind` wrapper to generate a Python module from it.
cd pfs
cmake . && make
# This should generate a python module: `/pfs/lib/pypfs.[...].so`.
# Copy the example files `cp pfs/out/* ../`
Run a scenario & Extract Model State#
In proc_model.py
, choose the configuration of programs to run.
You can choose an existing configuration by setting to_run = run_configs[<idx>]
with the index of the chosen configuration.
To add a new configuration and/or programs, ensure that the programs are built by the pfs/osmosis_examples
,
and add them to the program_names
and run_configs
variables, and copy them to the proc
directory.
Hello Example#
# Activate the virtualenv:
source ./venv/bin/activate
# Run: We need to include the regular `$PATH` (or `/usr/bin/`) for access to `sudo` for the namespace example.
sudo -E env PATH="./venv/bin:$PATH" python proc_model.py
The resulting model state is saved to the proc_model.csv
file, which can be imported into neo4j for visualization
following the steps below.
VMM Example#
This used the same script as the CellulOS VMM Example.
This script assumes that
the buildroot image to run inside the VM is at:
~/buildroot/qemu/buildroot-x86
The script does:
Ensures that the
*.py
scripts in the buildroot dir are same as the ones in the OSmosis repo. This is needed as some of the same scripts are run inside the VM.Outputs: the following CSV files in
./outputs/{vmm}/{datetime}
host.csv
: Model state as generated by CellulOSguest.csv
: Model state as generatedproc_model.py
run inside the VM usingpexpect
. For now it is hard coded to run thehello
program once.g2h.csv
: Mappings betweenguest PA --> host VA
by querying the Qemu monitor via telnet
# Generate CSV
sudo -E env PATH="$HOME/.local/lib:$PATH" ./vm_model.py \
--vmm qemu \
--clean
# Import CSV (assumed noe4j is setup, see section below)
sudo -E env PATH="$HOME/.local/lib:$PATH" python ./import_csv.py \
--files outputs/qemu-x86/2024_11_02_160915/g2h_file.csv \
outputs/qemu-x86/2024_11_02_160915/guest.csv \
outputs/qemu-x86/2024_11_02_160915/host.csv
Visualizing Model State (Common)#
The neo4j_docker.sh
script will spin up a docker container that runs a local Neo4j instance.
NOTE: These instructions have only been tested on Linux.
Starting the Neo4j container#
We are using the Neo4j enterprise container, since we want to use Bloom for visualization.
cd scripts/proc
source ./bin/activate
./neo4j_docker.sh start
Navigate to http://localhost:7474
to access the local Neo4j console.
The username and password to the console will be output when the script completes.
NOTE: This will overwrite any existing config.txt
files in the directory that the script is run from.
To connect to Bloom#
To use Bloom it is better to install the Neo4j Desktop app from (https://neo4j.com/download/)
Additional options#
Neo4j data directory#
The script will, by default, create a neo4j
directory in your home directory, to store the local instance’s data. You can change where this should be created by supplying the path as the third argument to the script: ./neo4j_docker.sh start <neo4j_dir>
Neo4j docker container name#
The script re-uses the same docker container across invocations. The default name for this container is neo4j-osm
. You can change its name by providing a fourth argument to the script: ./neo4j_docker.sh start <csv_file> <neo4j_dir> <neo4j_container_name>
Stopping the Container#
Run ./neo4j_docker.sh stop
. If you’ve used a custom Neo4j directory or docker container name, you must provide it as arguments: ./neo4j_docker.sh stop <neo4j_dir> <neo4j_container_name>
Cleaning Up the Container and All Local Data#
Run ./neo4j_docker.sh clean
. This will delete the container and the Neo4j directory associated with it, you may be prompted for sudo
permissions.
If you’ve used a custom Neo4j directory or docker container name, you must provide it as arguments: ./neo4j_docker.sh clean <neo4j_dir> <neo4j_container_name>
Importing Data to Neo4j#
The import the CSV generates by with CellulOS
or /proc
use the following script.
This script assumes that the neo4j docker instance with the name neo4j-osm
is running on the same machine.
The script first converts the model state CSV to a form of CSV that the neo4j-admin
tool expects.
We use neo4j-admin
tool as that is faster than using LOAD CSV
with a schema directly.
The script interacts with the docker instance using docker exec
and subprocess.run
.
It copies the new CSV file in the ~/neo4j/import
folder which is mounted inside the docker container.
The data is imported to database named neo4j
test1
and the existing data in that DB is deleted.
cd scripts/proc
source bin/activate
sudo -E env PATH="$HOME/.local/lib:$PATH" \
python ./import_csv.py \
--files ./file1.csv ./file2.csv
Sample Queries#
Everything: Not recommended when the graph is large.
MATCH p=(()-[]-())
RETURN p
Everything But: Show everything, excluding certain resource types and/or PDs.
// Specify resource types to exclude, and PD IDs to exclude
WITH ["VMR", "BLOCK"] AS ignore_types, ["PD_0", "PD_1"] AS ignore_pds
MATCH p=((a)-[]-(b))
WHERE ((a:PD AND NOT a.ID IN ignore_pds) OR ((a:RESOURCE OR a:RESOURCE_SPACE) AND NOT a.DATA IN ignore_types))
AND ((b:PD AND NOT b.ID IN ignore_pds) OR ((b:RESOURCE OR b:RESOURCE_SPACE) AND NOT b.DATA IN ignore_types))
RETURN p
PDs only: Shows PD nodes and the relationships between them.
MATCH pdpaths=((:PD)-[]->(:PD))
RETURN pdpaths
PDs and resource spaces: Shows PD nodes, resource space nodes, and the relationships between them.
// Get PD & Resource Space relations
MATCH pd_pd_paths=((:PD)-[]->(:PD))
RETURN pd_pd_paths AS paths
UNION DISTINCT
MATCH pd_rs_paths=((:PD)-[]->(:RESOURCE_SPACE))
RETURN pd_rs_paths AS paths
UNION DISTINCT
MATCH rs_rs_paths=((:RESOURCE_SPACE)-[]->(:RESOURCE_SPACE))
RETURN rs_rs_paths AS paths
Files Overview: Shows PDs, resource spaces, files, and relations to files.
// Get PD & Resource Space relations
MATCH pd_pd_paths=((:PD)-[]->(:PD))
RETURN pd_pd_paths AS paths
UNION DISTINCT
MATCH pd_rs_paths=((:PD)-[]->(:RESOURCE_SPACE))
RETURN pd_rs_paths AS paths
UNION DISTINCT
MATCH rs_rs_paths=((:RESOURCE_SPACE)-[]->(:RESOURCE_SPACE))
RETURN rs_rs_paths AS paths
UNION DISTINCT
// Get 1 edge incoming to files
MATCH p1=(()-[]->(:RESOURCE {DATA: 'FILE'}))
RETURN p1 AS paths
UNION DISTINCT
// Get 2 edges outgoing from files
MATCH p2=((:RESOURCE {DATA: 'FILE'})-[*0..2]->())
RETURN p2 AS paths
UNION DISTINCT
// Get 1 edge incoming to nodes 1 edge outgoing from files
MATCH p3=((:RESOURCE {DATA: 'FILE'})-[*0..1]->()<-[]-())
RETURN p3 AS paths
Visualize RSI: Shows resources of a particular type shared between two PDs, at any depth.
WITH "PD_3.0" as pd1, "PD_4.0" as pd2, "FILE" as type
// Find all accessible resources of the type
MATCH p1=((:PD {ID: pd1})-[:HOLD|MAP*1..4]->(r1:RESOURCE {DATA:type}))
WITH pd2, p1, r1, type
MATCH p2=((:PD {ID: pd2})-[:HOLD|MAP*1..4]->(r1))
RETURN p1, p2
Calculating Metrics#
Identify the IDs of the PDs you wish to compare.
Add an entry to the
configurations
array inmetrics.py
:
{'file': '<processed_csv_filename>.csv', 'pd1': '<first PD ID>', 'pd2': '<second PD ID>'}
Run
python metrics.py <idx>
, replacing<idx>
with the index of the desired configuration in theconfigurations
array.The metrics script does connect to Neo4j, so it is essential that the corresponding file is also imported in your Neo4j instance.
The script will output the RSI and FR values for the chosen PDS, something like this:
Calculating metrics for 'kvstore_007.csv' (PD_6.0,PD_7.0)
RSI VMR: 0.0
RSI MO: 0.0
RSI VCPU: 0.0
RSI PCPU: 1.0
RSI FILE: 1.0
RSI BLOCK: 1.0
FR: 1
Setup on Buildroot based Qemu VM#
This is mainly to ensure that our proc
based extraction has all the needed
dependencies inside a buildroot
based Linux VM on both x86_64
and aarch64
x86_64#
# Clone & build
git clone --branch cellulos git@github.com:sid-agrawal/buildroot.git
cd buildroot
# For
cp osmosis_configs/qemu_x86-64_config .config
make # first build takes a while so bump up with -j 12
# Copy python and pfs files
export OSMOSIS_DIR="$HOME/OSmosis" # Setup as it applies to you :)
./build-helper.sh x86_64 $OSMOSIS_DIR
make # This one should be quick.
# To use qemu+KVM with Qemu's monitor enabled
sudo ./start-qemu-kvm.sh
# Once linux is booted, loging with username "root" and no password.
# Dump some example model state
cd /root/proc
python3 ./proc_model.py --csv hello.csv --os linux
aarch64#
# Clone & build
git clone --branch cellulos git@github.com:sid-agrawal/buildroot.git
cd buildroot
# For
cp osmosis_configs/qemu_aarch64_config .config
make # first build takes a while so bump up with -j 12
# Copy python and pfs files
export OSMOSIS_DIR="$HOME/OSmosis" # Setup as it applies to you :)
./build-helper.sh aarch64 $OSMOSIS_DIR
make # This one should be quick.
# Start Qemu
output/images/start_qemu.sh
# Once linux is booted, loging with username "root" and no password.
# Dump some example model state
cd /root/proc
python3 ./proc_model.py --csv hello.csv --os linux
A Note on the linux kernel in this buildroot.
In this version, we use the Linux kernel that is supplied by buildroot.
We have updated it to enable writes to /dev/mem
.
To trigger a rebuild of just the kernel in buildroot,
say after a .config
change,
do make linux-rebuild