NDTwin Simulation Platform
NDTwin Simulation Subsystem Documentation
1. Overview
The NDTwin Simulation Subsystem is a distributed, high-throughput framework designed to evaluate network “what-if” scenarios (such as energy-saving topology changes) in parallel.
It separates the control logic (App), the routing logic (Request Manager), and the computation logic (Simulation Server) into distinct, asynchronous components. This design allows the NDT to run computationally intensive simulations (like Max-Min Fairness calculations) without blocking the real-time network controller.
2. System Architecture
The system implements a three-tier architecture supported by a shared Network File System (NFS) for heavy data exchange.
A. The Energy-Saving Application (App)
- Role: The client and orchestrator.
- Paper Context: Corresponds to the “Energy-saving App” that detects low-load thresholds (e.g., link utilization < 30%) and generates candidate power-off plans.
- Implementation Details:
- Input Generation: Generates simulation cases (JSON/TXT) containing network state (Flows, Topology).
- NFS I/O: Writes inputs directly to
abs_input_file_pathon the shared NFS mount to avoid HTTP overhead. - Callback Server: Runs a local
HttpSessionserver to listen for simulation results asynchronously.
B. The Request Manager (Request Manager)
- Role: The reverse proxy / load balancer.
- Paper Context: Corresponds to the “Simulation request and reply manager.”
- Implementation Details:
- Decoupling: Decouples the App from the Sim Server. The App does not need to know the IP/Port of the physical simulation workers.
- Asynchronous Routing: Uses
boost::asio::strandto handle concurrent requests without race conditions. It forwards requests to the Sim Server and routes callbacks back to the specific App instance based onapp_id.
C. The Simulation Server (Sim Server)
- Role: The compute worker.
- Paper Context: Corresponds to the “Simulation Server” that runs cases in parallel.
- Implementation Details:
- Process Isolation: Uses
boost::processto spawn external binary simulators (e.g.,power_sim) as child processes. - Parallelism: Can execute multiple simulation cases simultaneously (utilizing multi-core CPUs) without blocking the main I/O thread.
- Life-cycle Management: Monitors process exit codes (
on_exit) and triggers the completion workflow.
3. The Simulation Workflow
The workflow minimizes latency for control messages while maximizing throughput for data transfer.
Phase 1: Preparation (Data Plane)
Before submitting a request, the App prepares the data.
- Logic: The App calculates candidate plans based on F1-F6 strategies (see Section 4).
- Data Transport: Instead of sending megabytes of flow table data via HTTP, the App writes the input file to the NFS.
// App.cpp logic
fs::path input_file_path = abs_input_file_path(simulator, version, case_id, input_filename);
std::ofstream out(input_file_path, std::ios::binary); // Write to NFS
Phase 2: Submission (Control Plane)
The App sends a lightweight JSON signal to the Request Manager.
- Payload:
{
"simulator": "power_sim",
"version": "1.0",
"app_id": "101",
"case_id": "case_1",
"input_filename": "input.json"
}
- Routing: The Request Manager identifies the target Simulation Server and forwards the packet.
Phase 3: Execution
The Simulation Server receives the instruction.
- Execution: It invokes the specific simulator binary requested (e.g.,
power_sim). - Non-Blocking: The
run_simulatorfunction usesboost::process::async_pipeto ensure the server remains responsive to new requests while the simulation calculates results.
// Sim_server.cpp
auto process = std::make_shared<bp::child>(command, bp::std_out > stdout, ioc);
Phase 4: Analysis & Callback
- Completion: When the simulator process exits, the Sim Server verifies the exit code.
- Output: The simulator binary has already written the result to the output file on NFS.
- Notification: The Sim Server sends a callback HTTP POST to the Request Manager, which routes it back to the App.
4. Simulation Strategies (Logic)
While the C++ code provided handles the execution framework, the actual logic runs inside the power_sim binary. The NDTwin framework is designed to evaluate plans based on the six factors identified in the research:
- F1 (Switch Power): Prioritizing high-draw switches for maximum savings.
- F2 (Traffic Load): Avoiding “hotspot” switches to prevent congestion.
- F3 (Flow Count): Minimizing the number of users affected by rerouting.
- F4 (Flow Entry Limit): Ensuring the new routing plan does not exceed hardware TCAM limits (e.g., 121 entries/batch on Brocade ICX7250).
- F5 (Scope of Change): Minimizing the number of switches requiring updates.
- F6 (Throughput): Using Max-Min Fairness models to ensure elephant flows maintain QoE.
The Simulation Server passes the case_id and input paths to the binary, which calculates a score based on these factors and writes the result to NFS.
5. Technical Implementation Highlights
Efficient Data Handling (NFS)
The system employs a “reference-by-path” pattern.
- Problem: JSON/HTTP is inefficient for large matrices of flow data.
- Solution: The HTTP payload only carries the path to the data. The
Sim Serverreads the actual data from the mounted NFS directory atinput_file_path. - Safety: The code includes
safe_system(mount_nfs_command())and signal handlers (SIGINT,SIGTERM) to ensure file systems are mounted/unmounted correctly.
Robust Asynchronous Networking
Built on Boost.Asio, the system is fully non-blocking.
- Strands:
net::strandis used extensively inHttpSessionto serialize handlers. This prevents race conditions in the multi-threaded environment without the performance penalty of coarse-grained mutex locking. - Keep-Alive: The HTTP implementation supports persistent connections, reducing TCP handshake overhead during batch simulations (e.g., evaluating 26 different power-off plans in rapid succession).
Error Propagation
- If a simulation binary crashes or fails (non-zero exit code), the
Sim Servercaptures this event inbp::on_exit. - A
success: falseflag is returned in the JSON callback, allowing the App to discard that plan and attempt the next best candidate.