Abstract
Large-scale network simulation has grown in importance due to
a rapid increase in Internet size and the availability of Internet measurement
topologies with applications to computer networks and network
security.
A key obstacle to large-scale network simulation
over PC clusters is the memory balancing problem, where a
memory-overloaded machine can slow down a distributed
simulation due to disk I/O overhead. Network partitioning
methods for parallel and distributed simulation are
insufficiently equipped to handle new challenges brought
on by memory balancing due to their focus on CPU
and communication balancing.
This dissertation studies memory balancing for large-scale
network simulation in power-law networks over PC clusters.
First, we design and
implement a measurement subsystem for dynamically tracking
memory consumption in DaSSFNet, a distributed network
simulator. Accurate monitoring of memory consumption is
difficult due to complex protocol interaction through which
message related events are created and destroyed inside and
outside a simulation kernel.
Second, we achieve efficient memory cost monitoring by
tackling the problem of estimating peak memory consumption
of a group of simulated network nodes in power-law
topologies during network partitioning.
In contrast to CPU balancing where the processing cost
of a group of nodes is proportional to their sum, in memory balancing
this closure property need not hold.
Power-law connectivity injects additional complications
due to skews in resource consumption across network
nodes.
Third, we show that
the maximum memory cost metric outperforms the total cost metric
for memory balancing under multilevel recursive
partitioning but the opposite holds
for CPU balancing. We show that the trade-off can be
overcome through joint memory-CPU balancing---in general
not feasible due to constraint conflicts---which
is enabled by network simulation having a tendency to
induce correlation between memory and CPU costs.
Fourth, we evaluate memory balancing in the presence
of virtual memory (VM) management which admits larger
problem instances to be run over limited physical
memory.
VM introduces complex memory management
dependencies that make understanding
and evaluating simulation performance difficult.
We provide a performance evaluation framework wherein
the impact of memory thrashing in distributed network simulation
is incorporated which admits quantitative
performance comparison and diagnosis.
Fifth, we show that improved memory balancing under
the maximum cost metric
in the presence of VM
manifests as faster completion
time compared to the total cost metric
despite the
CPU balancing advantage of the latter.
In the cases where the CPU balancing advantage of
the total cost metric is strong, we show that joint
memory-CPU balancing can achieve the best of both
worlds.
We carry out performance evaluation using benchmark
applications with varying traffic characteristics: BGP routing,
worm propagation under local and global scanning, and distributed
client/server system.
We use a testbed of 32 Intel x86
machines running a measurement-enhanced DaSSFNet over Linux.