Why good data matters
Here’s the classic “Relative Trains” puzzle, first published in “Finnemore’s Gazette of New Mathematical Diversions ” in June 1922.
Train A leaves Stourbridge heading towards London at 09:00 on Monday morning, travelling at a consistent speed of 30 mph. Train B leaves Wolverhampton heading towards London at 09:45 on Monday morning travelling at a steady speed of 45mph. London is 10% closer to Stourbridge than Wolverhampton. Which arrives first?
If you answered, “Train A”, you’re wrong. Train A will never get there. There are no direct trains from Stourbridge to London, you have to change in Birmingham.
If you answered “Train B”, you’re wrong. Train B doesn’t exist.
If you answered “both trains will arrive at the same time” then you’re wrong on all levels.
The correct answer was Train C, which as Finnemore puts it:
“[T]ravels at an infinite speed, and therefore occupies all points in the universe simultaneously”.
So, what does this have to do with data? Well I was going to use Finnemore’s puzzle as a metaphor for what happens under the Bromford Deal, and show how the trains are like the members of the Deal household.
Finding out that Train A is headed towards London but will never get there is like finding out that an unemployed household member who we’ve told to join the work club in order to improve their employment prospects is currently out of work, but if we had their date of birth it would reveal that they’re only 6 weeks old.
Finding out that train B doesn’t exist is like finding out that the customer to whom we’re currently addressing all correspondence doesn’t exist, and only appears to exist because someone passed out on their keyboard and accidentally created a new entry in contact database.
Train C represents all those customers who are living in our homes without our knowledge – not as squatters, but just as missing household data (for example, one or more of train A’s parents are probably living in the household too). That would have been the metaphor, but I haven’t really got the hang of metaphors yet, so there it is. More of a simile I suppose.
Those of you who remember welfare reform can probably see how the above instances of incomplete, incorrect and missing data might cause problems.
For example when we were calculating the likely impact of The Bedroom Tax we needed to know the age, gender and relationship of all the trains in our properties. We didn’t have much of the data we needed, meaning that our initial estimates… actually our initial estimates were pretty good, even though I used a random number generator to come up with them. So that’s a bad example. However, due to a weird statistical fluke, the underoccupation data we got back from the first few local authorities to share their information with us showed that for those particular LA’s our estimates were a long way out, and we were underestimating the problem by about 33%. So at that point we had to draw up a worst case scenario in which our estimates for all other LA’s were also out by 33%. And let me tell you, when your figures for the potential rental income risk from welfare reform are based on the results of a random number generator plus 33%, they make for pretty scary reading.
So, what does all this mean for you? In your day to day job, how does it impact on the business if you don’t collect accurate data? I don’t know, that depends on who you are what your job is. I’m pretty sure that if we were having this conversation one-to-one I’d be able to come up with some pretty good examples of the importance of good data entry that apply specifically to your role.