Model State#

This section describes how to extract, view, and use the system’s model state as generated by CellulOS and /proc.

Setup the Python venv#

The scripts located at /scripts/proc/ will process the raw CSV, upload it to Neo4j, and calculate RSI / FR metrics. (Assumes that python 3.10 and virtualenv are already installed)

cd scripts/proc
# 1. Create the virtualenv (needed once)
python -m venv venv

# 2. Activate the virtualenv
source ./venv/bin/activate

# Install requirements: 
pip install -r requirements.txt

CellulOS#

In CellulOS, the tests are the only way to run scenarios.

Run a scenario & Extract Model State#

  1. During a test, print the model state to console using the pd_client_dump API call.

  2. Once the test completes, copy the printed model state to a CSV file in the same directory as the scripts.

  3. Ensure the CSV filename is prefixed with raw_.

Processing Model State#

Double check if this is still needed.

Processing elevates the model state from implementation-level to model-level. For instance, in implementation one PD may switch between two address spaces, but in the model state this should appear as two separate PDs. The processing currently splits PDs with access to more than one ADS or CPU.

  1. Run python csv_processing.py. This will process all files in the current directory of the form raw_<name>.csv to <name>.csv.

Examples#

Process#

Running the test and extracting the model state for a simple scenario.

cd BUILD_DIR
# Configure Cmake to built test GPI* of interest 
ccmake . 

ninja && ./simulate

Parse the o/p and save the model state as the as csv file. And then import it to Neo4j as explained below

Virtual Machine#

Running the test and extracting the model state for a scenario with Virtual Machine. This requires multiple steps such are waiting the VM to boot, rung & extract the model state of the hello process in the guest, and then extract the model state of the root-task and the VM-PD on the host. Using a combination of the pexpect and other scripting techniques, all of this can be done with 1 script.

This script assumes that

  • the OSmosis dir is ~/OSmosis

  • the build dir is ~/OSmosis/qemu-build

  • the buildroot image to run inside the VM is at: ~/buildroot/cellulos/qemu/buildroot-arm-cellulos-with-everything

The script does:

  • Ensures that the *.py scripts in the buildroot dir are same as the ones in the OSmosis repo. This is needed as some of the same scripts are run inside the VM.

  • Ensures the rootfs.cpio is the buildroot is not newer than the rootfs.cpio in the OSmosis repo.

  • Outputs: the following CSV files in ./outputs/{vmm}/{datetime}

    • host.csv: Model state as generated by CellulOS

    • guest.csv: Model state as generated proc_model.py run inside the VM using pexpect. For now it is hard coded to run the hello program once.

    • g2h.csv: Mappings between guest PA --> host VA

# Generate CSV
 sudo -E env PATH="$HOME/.local/lib:$PATH" ./vm_model.py \ 
       --vmm cellulos \
       --clean

# Import CSV (assumed noe4j is setup, see section below)
sudo -E env PATH="$HOME/.local/lib:$PATH" python ./import_csv.py \
    --files  outputs/cellulos/2024_11_02_160915/g2h_file.csv \
             outputs/cellulos/2024_11_02_160915/guest.csv \ 
             outputs/cellulos/2024_11_02_160915/host.csv

Proc Model State#

To demonstrate the extraction of model state from an entirely different system, we can build a model state from the contents of Linux’s /proc virtual filesystem. We run some sample programs, and fetch the corresponding information from /proc. Currently, this extracts the following information:

  • Virtual memory regions, their permissions, and their purpose (heap, stack, file, etc.).

  • Physical memory regions and their mappings from virtual.

  • Devices which the physical memory regions originate from.

Setup on Ubuntu Host#

cd ./scripts/proc

# Activate the virtualenv: 
source ./venv/bin/activate

# Build the `pfs` module: `pfs` is a c++ library, so we use a `pybind` wrapper to generate a Python module from it.
cd pfs
cmake . && make 

# This should generate a python module: `/pfs/lib/pypfs.[...].so`.
#  Copy the example files `cp pfs/out/* ../`

Run a scenario & Extract Model State#

In proc_model.py, choose the configuration of programs to run. You can choose an existing configuration by setting to_run = run_configs[<idx>] with the index of the chosen configuration. To add a new configuration and/or programs, ensure that the programs are built by the pfs/osmosis_examples, and add them to the program_names and run_configs variables, and copy them to the proc directory.

Hello Example#

# Activate the virtualenv: 
source ./venv/bin/activate
# Run: We need to include the regular `$PATH` (or `/usr/bin/`) for access to `sudo` for the namespace example.
sudo -E env PATH="./venv/bin:$PATH" python proc_model.py

The resulting model state is saved to the proc_model.csv file, which can be imported into neo4j for visualization following the steps below.

VMM Example#

This used the same script as the CellulOS VMM Example.

This script assumes that

  • the buildroot image to run inside the VM is at: ~/buildroot/qemu/buildroot-x86

The script does:

  • Ensures that the *.py scripts in the buildroot dir are same as the ones in the OSmosis repo. This is needed as some of the same scripts are run inside the VM.

  • Outputs: the following CSV files in ./outputs/{vmm}/{datetime}

    • host.csv: Model state as generated by CellulOS

    • guest.csv: Model state as generated proc_model.py run inside the VM using pexpect. For now it is hard coded to run the hello program once.

    • g2h.csv: Mappings between guest PA --> host VA by querying the Qemu monitor via telnet

# Generate CSV
sudo -E env PATH="$HOME/.local/lib:$PATH" ./vm_model.py \ 
       --vmm qemu \
       --clean

# Import CSV (assumed noe4j is setup, see section below)
sudo -E env PATH="$HOME/.local/lib:$PATH" python ./import_csv.py \
    --files  outputs/qemu-x86/2024_11_02_160915/g2h_file.csv \
             outputs/qemu-x86/2024_11_02_160915/guest.csv \ 
             outputs/qemu-x86/2024_11_02_160915/host.csv

Visualizing Model State (Common)#

The neo4j_docker.sh script will spin up a docker container that runs a local Neo4j instance. NOTE: These instructions have only been tested on Linux.

Starting the Neo4j container#

We are using the Neo4j enterprise container, since we want to use Bloom for visualization.

cd scripts/proc
source ./bin/activate
./neo4j_docker.sh start

Navigate to http://localhost:7474 to access the local Neo4j console. The username and password to the console will be output when the script completes. NOTE: This will overwrite any existing config.txt files in the directory that the script is run from.

To connect to Bloom#

To use Bloom it is better to install the Neo4j Desktop app from (https://neo4j.com/download/)

Additional options#

Neo4j data directory#

The script will, by default, create a neo4j directory in your home directory, to store the local instance’s data. You can change where this should be created by supplying the path as the third argument to the script: ./neo4j_docker.sh start <neo4j_dir>

Neo4j docker container name#

The script re-uses the same docker container across invocations. The default name for this container is neo4j-osm. You can change its name by providing a fourth argument to the script: ./neo4j_docker.sh start <csv_file> <neo4j_dir> <neo4j_container_name>

Stopping the Container#

Run ./neo4j_docker.sh stop. If you’ve used a custom Neo4j directory or docker container name, you must provide it as arguments: ./neo4j_docker.sh stop <neo4j_dir> <neo4j_container_name>

Cleaning Up the Container and All Local Data#

Run ./neo4j_docker.sh clean. This will delete the container and the Neo4j directory associated with it, you may be prompted for sudo permissions.

If you’ve used a custom Neo4j directory or docker container name, you must provide it as arguments: ./neo4j_docker.sh clean <neo4j_dir> <neo4j_container_name>

Importing Data to Neo4j#

The import the CSV generates by with CellulOS or /proc use the following script. This script assumes that the neo4j docker instance with the name neo4j-osm is running on the same machine.

The script first converts the model state CSV to a form of CSV that the neo4j-admin tool expects. We use neo4j-admin tool as that is faster than using LOAD CSV with a schema directly. The script interacts with the docker instance using docker exec and subprocess.run. It copies the new CSV file in the ~/neo4j/import folder which is mounted inside the docker container.

The data is imported to database named neo4j test1 and the existing data in that DB is deleted.

cd scripts/proc
source bin/activate 
sudo -E env PATH="$HOME/.local/lib:$PATH" \
    python ./import_csv.py \
             --files ./file1.csv  ./file2.csv 

Sample Queries#

  • Everything: Not recommended when the graph is large.

MATCH p=(()-[]-())
RETURN p
  • Everything But: Show everything, excluding certain resource types and/or PDs.

// Specify resource types to exclude, and PD IDs to exclude
WITH  ["VMR", "BLOCK"] AS ignore_types, ["PD_0", "PD_1"] AS ignore_pds
MATCH p=((a)-[]-(b))
WHERE ((a:PD AND NOT a.ID IN ignore_pds) OR ((a:RESOURCE OR a:RESOURCE_SPACE) AND NOT a.DATA IN ignore_types)) 
  AND ((b:PD AND NOT b.ID IN ignore_pds) OR ((b:RESOURCE OR b:RESOURCE_SPACE) AND NOT b.DATA IN ignore_types))
RETURN p
  • PDs only: Shows PD nodes and the relationships between them.

MATCH pdpaths=((:PD)-[]->(:PD))
RETURN pdpaths
  • PDs and resource spaces: Shows PD nodes, resource space nodes, and the relationships between them.

// Get PD & Resource Space relations
MATCH pd_pd_paths=((:PD)-[]->(:PD))
RETURN pd_pd_paths AS paths
UNION DISTINCT
MATCH pd_rs_paths=((:PD)-[]->(:RESOURCE_SPACE))
RETURN pd_rs_paths AS paths
UNION DISTINCT
MATCH rs_rs_paths=((:RESOURCE_SPACE)-[]->(:RESOURCE_SPACE))
RETURN rs_rs_paths AS paths
  • Files Overview: Shows PDs, resource spaces, files, and relations to files.

// Get PD & Resource Space relations
MATCH pd_pd_paths=((:PD)-[]->(:PD))
RETURN pd_pd_paths AS paths
UNION DISTINCT
MATCH pd_rs_paths=((:PD)-[]->(:RESOURCE_SPACE))
RETURN pd_rs_paths AS paths
UNION DISTINCT
MATCH rs_rs_paths=((:RESOURCE_SPACE)-[]->(:RESOURCE_SPACE))
RETURN rs_rs_paths AS paths
UNION DISTINCT

// Get 1 edge incoming to files
MATCH p1=(()-[]->(:RESOURCE {DATA: 'FILE'}))
RETURN p1 AS paths
UNION DISTINCT

// Get 2 edges outgoing from files
MATCH p2=((:RESOURCE {DATA: 'FILE'})-[*0..2]->())
RETURN p2 AS paths
UNION DISTINCT

// Get 1 edge incoming to nodes 1 edge outgoing from files
MATCH p3=((:RESOURCE {DATA: 'FILE'})-[*0..1]->()<-[]-())
RETURN p3 AS paths
  • Visualize RSI: Shows resources of a particular type shared between two PDs, at any depth.

WITH "PD_3.0" as pd1, "PD_4.0" as pd2, "FILE" as type

// Find all accessible resources of the type
MATCH p1=((:PD {ID: pd1})-[:HOLD|MAP*1..4]->(r1:RESOURCE {DATA:type}))
WITH pd2, p1, r1, type
MATCH p2=((:PD {ID: pd2})-[:HOLD|MAP*1..4]->(r1))

RETURN p1, p2

Calculating Metrics#

  1. Identify the IDs of the PDs you wish to compare.

  2. Add an entry to the configurations array in metrics.py:

{'file': '<processed_csv_filename>.csv', 'pd1': '<first PD ID>', 'pd2': '<second PD ID>'}
  1. Run python metrics.py <idx>, replacing <idx> with the index of the desired configuration in the configurations array.

    • The metrics script does connect to Neo4j, so it is essential that the corresponding file is also imported in your Neo4j instance.

  2. The script will output the RSI and FR values for the chosen PDS, something like this:

Calculating metrics for 'kvstore_007.csv' (PD_6.0,PD_7.0)
RSI VMR: 0.0
RSI MO: 0.0
RSI VCPU: 0.0
RSI PCPU: 1.0
RSI FILE: 1.0
RSI BLOCK: 1.0
FR: 1

Setup on Buildroot based Qemu VM#

This is mainly to ensure that our proc based extraction has all the needed dependencies inside a buildroot based Linux VM on both x86_64 and aarch64

x86_64#

# Clone & build
git clone --branch cellulos git@github.com:sid-agrawal/buildroot.git
cd buildroot

# For 
cp osmosis_configs/qemu_x86-64_config .config
make # first build takes a while so bump up with -j 12 

# Copy python and pfs files
export OSMOSIS_DIR="$HOME/OSmosis" # Setup as it applies to you :)
./build-helper.sh x86_64 $OSMOSIS_DIR

make # This one should be quick.

# To use qemu+KVM with Qemu's monitor enabled
sudo ./start-qemu-kvm.sh

# Once linux is booted, loging with username "root" and no password.
# Dump some example model state
cd /root/proc
python3 ./proc_model.py --csv hello.csv --os linux

aarch64#

# Clone & build
git clone --branch cellulos git@github.com:sid-agrawal/buildroot.git
cd buildroot

# For 
cp osmosis_configs/qemu_aarch64_config .config
make # first build takes a while so bump up with -j 12 

# Copy python and pfs files
export OSMOSIS_DIR="$HOME/OSmosis" # Setup as it applies to you :)
./build-helper.sh aarch64 $OSMOSIS_DIR

make # This one should be quick.

# Start Qemu
output/images/start_qemu.sh

# Once linux is booted, loging with username "root" and no password.
# Dump some example model state
cd /root/proc
python3 ./proc_model.py --csv hello.csv --os linux

A Note on the linux kernel in this buildroot.

In this version, we use the Linux kernel that is supplied by buildroot. We have updated it to enable writes to /dev/mem. To trigger a rebuild of just the kernel in buildroot, say after a .config change, do make linux-rebuild