Imagine you’re a Census field officer who knocks on non-respondents’ doors to encourage them to fill out the Census (it’s mandatory, and they could face a fine if they don’t!). You know that on average 40% of the doors you knock on will have someone answer the door. So you knock on every non-responding household’s door in your neighbourhood – some answer, and that’s great so you take them off your list and try the others again tomorrow. After a week of daily knocking, what’s the probability that a house you knock on will have someone answer you today? Is it still 40%?
Of course not. Some houses don’t have anyone there during the hours a field officer might be visiting them, and those houses will quickly become overrepresented in the sample of houses left on your list. This is an example of “diminishing returns” in action – the more effort you put into something, the less efficient each extra bit of effort gets – and is a feature of many real-world systems that one might want to simulate.
Now imagine that you’re a virtual Census officer knocking on virtual doors. The person coding up the simulation only knows that on average 40% of the doors knocked on will answer the door, but doesn’t have any data beyond this about what happens for each day. The naive method would be to, after every knock, independently have the door be answered with probability 0.4 – however as shown above, this won’t capture the right real-world behaviour.
Why does this matter? Well, consider the two uses for the simulation – to make decisions far in advance of the live operation, and as a benchmark to track progress against during the live operation.
- If we don’t account for diminishing returns by day, we end up looking like we’re doing a lot better than we are halfway through the live operation, as our simulation predicts that we still have a 0.4 probability of people answering the door even though we’ve already visited all of the easiest addresses. This could cause false confidence and lead to eventual undercount because we don’t take appropriate countermeasures for things going wrong.
- If we don’t account for diminishing returns by category (e.g. time of day a Census officer is out and about) then our upfront decisions about scheduling could be thrown off by trying to optimise a simulation that doesn’t reflect the real world.
To expand on that last point: assume we had data on answering the door split by time of day – we know that on average 30% of people answer their doors in the daytime as opposed to 50% in the evenings after work. If we naively plug this into our simulation and then use it to find the optimal schedule, it will tell us to always visit in the evenings. However, in reality some houses are better approached in the daytime, and some mix of daytime and evening will be a better approach.
The Advantages of Agent-Based Modelling
Agent-Based Modelling (ABM) is a type of simulating where you simulate each individual agent (in this case, Census officer and house) rather than just having numbers for “doors knocked on” and “doors answered”.
This method opens the door (heh) to a solution by having properties associated with an agent (such as a house’s “probability of answering the door in the daytime” and “probability of answering the door in the evening”). Values for these properties can vary among agents. Now, when the virtual officer knocks on the door, instead of a universal 0.3 or 0.5 we instead call on that individual agent’s probability. Easy-to-contact houses are thus removed from the pool of remaining agents and the system starts demonstrating the emergent behaviour of diminishing returns present in the real-world system.
Challenges still remain with this approach. From what distribution and why do you assign those properties to the agents, and how do you get the data needed to inform your choice? Daytime and evening contact rates of a house are probably correlated, but how correlated? Assuming you need a really well calebrated model and have the data to throw at it, is this really a better approach than just estimating a lookup table of contact rates by day? Nevertheless, it’s a powerful tool for the ABM kit.
This post was made drawing on my own experiences working on the collection operation for Census 2021.