Launch the service on Bare Metal

This document describes steps required to host the Silent Network node on Bare Metal hardware. Audience of this documents are DevOps, or software engineers, preferably with basic knowledge of Docker and the Shell.

The hardware needs to meet several criteria. We provide the Operator software in a form of simple Docker image.

Prerequisites

  1. The CPU must support Intel SGX (Software Guard Extensions)

    1. Example CPU: Intel(R) Xeon(R) Gold 5412U

  2. The whole platform must be up to date. Use most recent CPU models, update to newest BIOS and other firmware. Otherwise software will not launch. Look at Platform provisioning for more information

  3. The host operating system: Ubuntu 22.04 (Jammy)

    1. Intel libraries in use are officially supported on Ubuntu: https://download.01.org/intel-sgx/sgx_repo/ubuntu/dists/

  4. Please check the SGX features your CPU has. The simplest way is to use sgx-detect

    1. Example output:

➜  ~ sudo ./sgx-detect
Detecting SGX, this may take a minute...
✔  SGX instruction set
  ✔  CPU support
  ✔  CPU configuration
  ✔  Enclave attributes
  ✔  Enclave Page Cache
  SGX features
    ✔  SGX2  ✔  EXINFO  ✔  ENCLV  ✔  OVERSUB  ✔  KSS  
    Total EPC size: 92.2MiB
✔  Flexible launch control
  ✔  CPU support
  ✔  CPU configuration
  ✔  Able to launch production mode enclave
✔  SGX system software
  ✔  SGX kernel device (/dev/sgx_enclave)
  ✔  libsgx_enclave_common
  ✔  AESM service
  ✔  Able to launch enclaves
    ✔  Debug mode
    ✔  Production mode
    ✔  Production mode (Intel whitelisted)

You're all set to start running SGX programs!
  1. It's important to have all green ticks in SGX Instruction set, Flexible launch control, SGX system software. From SGX features important are: SGX2, EXINFO

  2. The operator software needs to be tied to that particular CPU die. Once you run it on a machine, it needs to be always the same machine hereafter. Restarting the container on another SGX-enabled CPU will cause the generation of different MRSIGNER and MRENCLAVE Keys resulting in different encryption keys. That will disallow the enclave to unseal the state, that was stored while using the previous CPU. Making the software not-operable.

  3. The host machine needs to have installed container runtime, like Docker

  4. The software uses a disk as a persistent storage. The minimum size required is 64 GB. The storage should be exclusive to this software. No other service should use it. The storage should be persistent, i.e., data should stay after the power cycle.

  5. The storage should be periodically baked up, so it will be possible to rollback to last valid state in case of database write failure, disk write failure, or some unexpected software bug

  6. Minimal RAM is 16GB

  7. You need to provide static IP, or the URL that will point to the Operator software

  8. The running service requires a high bandwidth of the external network interface.

  9. The system date and time must be valid, synchronized by NTP (Ubuntu by default has it enabled)

Platform provisioning

The Operator software does remote attestation when the Aggregator service connects to it. The attestation procedure involves external infrastructure (including Intel's web services). The platform on which the Operator service will be run must first be correctly configured.

For those interested in more details, refer to official Intel's documentation. However, it's not mandatory for the setup process to be completed.

  1. Install PCK ID Retrieval tool and others

Add Debian repo (command for Ubuntu Jammy):

echo 'deb [arch=amd64] https://download.01.org/intel-sgx/sgx_repo/ubuntu jammy main' | sudo tee /etc/apt/sources.list.d/intel-sgx.list > /dev/null
wget -O - https://download.01.org/intel-sgx/sgx_repo/ubuntu/intel-sgx-deb.key | sudo apt-key add -

Install required packages:

sudo apt update
sudo apt install sgx-pck-id-retrieval-tool sgx-aesm-service libsgx-urts libsgx-dcap-ql libsgx-dcap-default-qpl libsgx-aesm-ecdsa-plugin libsgx-aesm-quote-ex-plugin sgx-ra-service 
  1. Make changes in /opt/intel/sgx-pck-id-retrieval-tool/network_setting.conf

Change the PCCS_URL to match our caching service:

PCCS_URL=https://pccs.el.silencelaboratories.com/sgx/certification/v4/platforms

Set USE_SECURE_CERT to true:

USE_SECURE_CERT=TRUE

Uncomment user_token and set it to given value:

user_token =NBFPy5R9dJTa
  1. Provision this host

Call the command to provision the host. It will fill up the cache database of PCCS. Needed to be done only once.

sudo PCKIDRetrievalTool

Re-provision in case of hardware changes or BIOS update of the machine.

The valid output of this command looks like this:

> sudo PCKIDRetrievalTool 

Intel(R) Software Guard Extensions PCK Cert ID Retrieval Tool Version 1.21.100.3

Warning: platform manifest is not available or current platform is not multi-package platform.
the data has been sent to cache server successfully and pckid_retrieval.csv has been generated successfully!

Reach out to us if this step introduces any problems

This command will create pckid_retrieval.csv please do not remove it.

  1. Set aesmd config

Edit /etc/aesmd.conf . Uncomment and set default quote type:

default quoting type = ecdsa_256
  1. Set QCNL config

The qcnl config is json like file describing the network configuration that is used during attestation. In particular, contains the URL to the PCCS service, and other parameters.

Download this file:

And put it under /etc/sgx_default_qcnl.conf

  1. Restart the aesmd service:

sudo systemctl restart aesmd

Check if platform is free from known hardware vulnerabilities

Put get_tcb_info.pyscript on SGX machine aside to pckid_retrieval.csv (file generated by PCKIDRetrievalTool from Platform provisioning section) and execute it

The script will output JSON to the console. Find if tcbStatus property is set to UpToDate anywhere in that JSON. Example:

{
    "tcbInfo": {
        "id": "SGX",
        "version": 3,
        ...
        "tcbLevels": [
            {
                "tcb": {
                    "sgxtcbcomponents": [
                       ...
                    ],
                    "pcesvn": 11
                },
                "tcbDate": "2024-03-13T00:00:00Z",
                "tcbStatus": "UpToDate"
            },
        ...
}

Run the service

There are several services to run,sgx-secret-vault, operator-sgx, postgres.

We provide sample docker-compose file to launch them together.

Download the operator directory on your sgx-enabled machine:

Extract the operatordirectory,

tar zxf operator.tar.gz

The structure of the directory is following:

.
├── config
│   ├── init-user-db.sql
│   └── resolv.conf
├── docker-compose.yaml
└── silent-network-operator.env

Setup environment variables that are required to run the container

To launch the compose, file silent-network-operator.env contains env variables used to configure the services. Most of them are predefined, please set ORIG_IDto name of your organization, it will be used in for example in Grafana dashboards.

For security reasons change DB_PASS from default operator_password

Apply that change also in config/init-user-db.sql file.

Run docker containers

Once you have all envs set up, run the containers:

  1. Pass us your Github username to grant access to the container registry

  2. Create GitHub Personal Access Token (with read: packages scope) and login Docker to the registry

  3. From the operatordirectory run the compose:

docker compose --env-file silent-network-operator.env up -d

The startup can take a while. Eventually, the logs from the service should appear:

2024-07-24T14:10:13.220863Z  INFO el_party_svc::signer: Master VK: "XXXX"
2024-07-24T14:10:13.222779Z  INFO el_party_svc: listening on 0.0.0.0:80

You should be able to reach the service by calling a simple command:

curl --insecure https://localhost:{SN_OPERATOR_EXTERNAL_PORT}/v1/version

It should respond with details of running software.

If you want to shut down the services, use following command:

docker compose --env-file silent-network-operator.env down

The storage

Once you launch the services, they will keep the state on the storage, in dband sgx-secret-vaultdirectories.

.
├── config
│   ├── init-user-db.sql
│   └── resolv.conf
├── db
├── docker-compose.yaml
├── sgx-secret-vault
│   └── seed_file
└── silent-network-operator.env

Wrapping up

Make the service to be externally available. Provide to us:

  1. The IP address, or URL, together with the port by which the service is accessible

  2. Response from this command (insecure because certificates are self-signed):

curl --insecure https://localhost:{SN_OPERATOR_EXTERNAL_PORT}/v1/version
  1. The city where the hosted HW is running

Troubleshooting

Quote verify failed func_verify_quote_result: "0xE019"

If you receive the error message during startup:

ra_tls_verify_callback: sgx_qv_verify_quote failed: 57369
2024-11-18T11:25:02.252594Z DEBUG remote_attestation::verifier: check certificate return -9984
2024-11-18T11:25:02.253033Z ERROR el_party_svc: RA-TLS certificate verification results: VerifyError {
    return_code: -9984,
    callback_results: ra_tls_verify_callback_results {
        attestation_scheme: RA_TLS_ATTESTATION_SCHEME_DCAP,
        err_loc: AT_VERIFY_EXTERNAL,
        dcap: dcap {
            func_verify_quote_result: "0xE019",
            quote_verification_result: "UNSPECIFIED",
        },
    },
}

This might happen if:

  1. The PCCS service is down, check it's accessibility by simple curl command:

curl https://pccs.el.silencelaboratories.com/sgx/certification/v4/qe/identity

It should return JSON response with HTTP status 200. If it doesn't, please reach out to the Silence Laboratories team.

  1. The configuration files are invalid:

Make sure the files are mounted to the container:

-v /etc/sgx_default_qcnl.conf:/etc/sgx_default_qcnl.conf \
-v ${CONFIG_MOUNT}/resolv.conf:/etc/resolv.conf \

Has expected sha2 sums as mentioned earlier.

Startup of the container with an error AESM service returned error 30;

The error:

error: AESM service returned error 30; this may indicate that infrastructure for the DCAP attestation requested by Gramine is missing on this machine

Make sure all packages mentioned in Platform provisioningwere installed correctly

For further debugging, call

sudo journalctl -u aesmd.service 

PCKIDRetrievalTool error

The output of the PCKID tool:

Error: unexpected error occurred while sending data to cache server.
pckid_retrieval.csv has been generated successfully, however the data couldn't be sent to cache server!

Our PCCS server reports:

2025-04-02 11:18:23.310 [error]: Intel PCS server returns error(404).
2025-04-02 11:18:23.310 [error]: Error: No cache data for this platform.

It means the CPU is not registered, the registration service needs to be installed:

sudo apt install sgx-ra-service

Then call PCKID tool again

sudo PCKIDRetrievalTool

for further debugging, read the logs from /var/log/mpa_registration.log

Other issues

In case of any problems with the service, please provide us with the logs from the container: docker logs operator

Last updated