Best practices

Last updatedAugust 12, 2021

There is a difference between knowing the path and walking the path.
-- Morpheus

This page details best practices for Nextmv solvers and simulators when using apps and engines.

Solver best practices

Hop solvers have a different way of approaching the world than previous technologies.

States and transitions

Hop solvers operate by evaluating states and projecting the impacts of choices according to transition functions. Thus it makes sense to structure our model code and any helper functions as state machines.

For example, in a Hop vehicle routing engine, the Next method makes a choice about whether a state transition can occur (e.g. does the driver have capacity?) and calls a method which generates a new state.

    if s.Driver.State.Capacity > 0 {
        for it := s.pickups.Iterator(); it.Next(); {
            next = append(next, s.Pickup(it.Value()))
        }
    }

This makes models concise, easy to test, and easy to reason about.

Keeping track of decisions

The order in which we make decisions can have a major impact on the solutions we find. Some models have an obvious decision ordering. For instance, a routing model adapts a sequential ordering of decisions. The decision is always the same: "Where should we go next?"

Understanding values and bounds

Hop only requires a Value method when we want to maximize or minimize some objective. The Bounds method is optional.

However, if you can provide bounds on an objective, you should! It may greatly improve the search since Hop uses that information to prune states from its search tree. Try removing the Bounds methods from our templates and see how that changes their runtime. Pay special attention to the statistics section of the JSON output. This shows the number of states explored and deferred for later exploration.

Enforcing assignment

Hop enforces the assignment of all locations by default. In some cases this may lead to no feasible solutions being returned. For example, if no vehicles in a fleet are compatible with a given location, then Hop will not return a solution by default (all locations cannot be assigned). To relax the default behavior, you can add UnassignedPenalties as a slice of cost values (one per location).

Unassigned penalties are virtual (unitless) costs added to the value for a route. Adding unassigned penalties will discourage, but still allow for, some unassigned locations. Locations can have different unassigned penalties, e.g., larger values can be set for higher value locations to more strongly discourage unassignment. To achieve the desired behavior of discouraging but allowing for unassignment if necessary, a sufficiently large penalty value should be used, where 'sufficiently large' can be defined as larger than the maximum cost of servicing the particular location (but not too large to cause integer overflows).

UnassignedPenalties are optional and should only be added when the default behavior of enforced assignment is not desired.

Note, in Nextmv cloud, an unassigned penalty can also be set for all locations by default (in the defaults section of an input schema) or for specific locations (as detailed above). Unassigned penalties set for specific locations will override the default value (if specified) for those locations in this case.

Simulator best practices

Simulation is a way of investigating and understanding the properties of complex systems that are otherwise hard to model. Below we detail some considerations when simulating with Dash, our discrete event simulator inspired by the event-oriented dynamics of many modern software systems.

Terminating simulations

By default, a Dash simulation will run until it has no more actors scheduled. There are some circumstances, such as the single-server queue example, where it makes sense to have an actor run forever. These simulations can be terminated using a duration limit for simulated time. Do this using either the DASH_SIMULATOR_LIMITS_DURATION environment variable or the -dash.simulator.limits.duration command-line flag.

Updating actor state

An actor in Dash is any type that implements a Run method. If Run returns a boolean true value, then Dash schedules it to run again in the simulation.

Actors typically maintain their own internal states using struct attributes. Thus, they are often loaded from JSON input and referred to in method receivers as pointers. This allows them to mutate their state during calls to Run and in response to events.

For example, the customer actors of the single-server queue example are unmarshaled from JSON directly into pointers by the CLI runner.

func main() {
    cli.Run(
        func(customers []*customer, opt sim.Options) (sim.Simulator, error) {
            // Customers can mutate their state in the simulation.
        },
    )
}

Similarly, their methods are defined with pointer receivers, so state can be updated and stored in the simulation.

func (c *customer) Run(now time.Time) (time.Time, bool) {
    // Run changes customer state.
}

Randomizing data

Introducing randomness into a simulation is a good way to bound estimates of important measures, as well as stress test your models. Dash makes it easy to set an arbitrary random seed to use for creating random values while running a simulation, via the -dash.simulator.random.seed command-line flag.

To ease the task further, bash and zsh provide a $RANDOM function which produces a signed 16-bit integer between 0 and 32767. We can use it to introduce some randomness into a Dash simulation as follows:

./dash-sim -dash.runner.input.path input.json \
           -dash.simulator.random.seed $RANDOM

Using this method, a new random seed will be used each time the simulation is run. Note that we will still need to encode randomness into our simulation using Go's math/rand package. The random seed will be recorded in the options section of Dash's output.

Event and measure levels

Dash uses one event ledger for publishing and subscribing to actor events, and another for recording measures. For many use cases, the Publish and Subscribe methods provided by Dash's event ledger work quite well. However, when simulations produce many events or measures, they can become too verbose.

Events and measures can also be ascribed a level, similar to the levels of many popular logging systems. To use these levels, merely substitute PublishLevel and SubscribeLevel for calls to Publish and Subscribe. The dash/sim/log package provides the following levels:

All
Trace
Debug
Info
Warn

Lower levels (which have higher values) are more important. Like other log leveling systems, subscribing at a level (e.g. Info) means you receive every message that is at least as important as that (Info, Warn).

The Publish and Subscribe ledger methods use level All.

There is a command-line flag and associated environment variable that allows one to only receive measures or events at a certain level. For instance, in the customer.go file of the queue example, we can change the Run method to use:

if c.arrivalTime == nil {
    c.arrivalTime = &now
    c.events.PublishLevel(log.Info, arrival(*c))
}

Recompiling the example and running the command below will print nothing:

./queue -dash.runner.input.path input.json \
        -dash.runner.output.events \
        -dash.simulator.limits.duration 2h \
        -dash.runner.output.level warn | \
jq .events

This will print only the arrival events:

./queue -dash.runner.input.path input.json \
        -dash.runner.output.events \
        -dash.simulator.limits.duration 2h \
        -dash.runner.output.level trace | \
jq .events

The same functionality can be applied to measures using the corresponding environment variable or command-line flag.

Warmup periods

Events and measures logged at the very beginning of a simulation may not always represent reality, in particular, if the system has not reached a steady state. For example, initializing all actors to "available" at simulation start may produce overly optimistic events and measures for a mid-day simulation. In otherwords, simulated time for actors to accomplish tasks may be faster than what would happen in reality. For these scenarios, we recommend specifying a warmup duration and excluding warmup messages from the output. This allows the simulation to ramp up to a steady state without impacting the fidelity of the events and measures.

Was this helpful?