Making Passenger

Dave Hulbert and Andy Leon: Using data to track and predict crowding on UK buses

June 15, 2020 Passenger Season 1 Episode 8

Passenger's Engineering Director Dave Hulbert and Engineering Lead Andy Leon join Matt and Tom to explain how they have been able to deliver accurate, robust and reliable updates to help essential travellers to plan their travel around social distancing recommendations. 

Bringing streams of crowdsourced information alongside automatic passenger counting from a number of different suppliers of hardware on the buses, creates a qualitative picture of the live occupancy of the vehicle. Whilst this is hugely important during the pandemic, Passenger believe the technology also has longer reaching benefits. Built at pace from their Enhanced Vehicle Information module, Passenger are rolling this out now to help inform users of the cleanliness of vehicles plus space and seating availability. 


Matt :

Hello, and welcome to this week's Making Passenger podcast. I'm Matt.

Tom :

And I'm Tom.

Matt :

The requirement for physical distancing has resulted in a significant interest in providing information to customers on vehicle occupancy and cleanliness to encourage trust in the safety of public transport.

Tom :

This week, we thought it might be interesting to speak to Dave and Andy, Passenger's Engineering Director and Engineering Lead who are responsible for actually delivering these new features.

Matt :

The conversation aims to share an overview of the work in this area and some of the technical and delivery challenges of shipping features like this at pace. I hope you enjoy. Dave and Andy, thanks very much for joining us today.

Dave :

Thanks Matt!

Andy :

Thank you.

Matt :

Dave. Perhaps we should give a little bit context the rest of the discussion. Could you perhaps give a quick overview of the feature and what it will do for users and operators as a whole?

Dave :

So we're talking about this Vehicle Asset Information and the extensions we're adding to it around the live occupancy data. What we're doing is providing live information about how the full vehicles are based on social distancing, and the reduced capacity that vehicles have. This is really important both to users and to operators, including their staff and bus drivers. So that buses don't have too many people on so people can safely travel by bus or other modes of transport as well.

Matt :

And so the information that we're getting to provide to end users - where's that coming from?

Dave :

So we have our own crowdsource data from apps and websites coming in from end users, and that's one of the key places we're taking in data. We're also working with a number of different suppliers of hardware on the buses, such as Ticketer to get data directly from the drivers and from the vehicles themselves about the live occupancy of the vehicle.

Matt :

Okay, so Andy, perhaps you could just talk about why aren't we just using Ticketer data? I mean, we have a number of operators using different suppliers to provide that data. So why aren't we just using the Ticketer SIRI SM solution for example?

Andy :

We do have a lot of operators on the Ticketer system. We also have a few that are using other systems, which don't have that necessarily passenger counting sort of system going into place. The other side of things is that in in the case of this this virus that we're going through at the moment, the sort of space available on the bus is no longer quantitive - it can no longer really be described as you know, there are 40 seats and and 20 of them are taken depending on the contents or the makeup of the passengers on the bus. If there's households travelling together, they may be sitting very closely together. If there's lots and lots of individuals and they need to space out, so in that sense, it becomes much more qualitative and much more, much more judgmental, or much more nuanced as to how busy a bus is, and whether there is space for for you on there.

Matt :

So is that why we're choosing to display 'quiet', 'moderate' and 'busy' rather than actual number of vacant seats? Perhaps Dave, you could talk to me about that or Andy?

Andy :

Exactly that, exactly that. We feel it's easier for someone to make a human judgement as to whether social distancing is possible on a vehicle.

Matt :

Okay, that's interesting. So crowdsourcing this data means that we'll need a lot of people to contribute. So why are people going to do that? What makes us think that people are going to do that?

Dave :

We have two key things here. Firstly, is the position that we're in as Passenger. We have lots of apps and websites, kind of covering lots of the UK and we have a really engaged user base. And so based on the trust that users have in, in our operators and the apps, we have good ratings and app stores and things like that, which we've seen means that users are very engaged in our apps and websites, and are sort of keen to make the most of them. We also have this time that we're in now, we have more of a social responsibility. And we believe that there's going to be an increase in the amount of people that are willing to contribute this kind of data. Whereas without COVID-19, it might be harder to acquire this data. In this situation, there's likely to be more people that are keen to, to help each other out and kind of a civic duty, if you will.

Andy :

I think that's absolutely right. There's there's definitely an increase in peoples sort of solidarity in the last few months.

Tom :

It's probably worth adding as well that for the operators that we work with that don't already have a sort of a Ticketer feed of data coming through. And then you know, they're busy working with their their providers to come up with this kind of similar solution if you like. But it's worth adiing that the crowd sourced approach, as Andy mentioned, gives those operators that don't have an actual, you know, a way to sort of start to communicate some of this kind of information to their passengers to show that, you know, things are happening and the work is being done to to manage this kind of, you know, this new situation everyone finds themselves in. So it's a it's a way that, you know, operators that don't have that, that Ticketer to feed can get behind their marketing and encourage their their passengers to help each other. So it's really something that you know, they can they can do, they can actively get behind and support.

Matt :

Yeah, I think you're right there, Tom. And sort of measuring real time vehicle occupancy on buses isn't something that's been widespread in the UK ever. I'm not sure if that's something that's happened anywhere around the globe. And as a result, there isn't a great deal of knowledge in the sector about how to perhaps implement this accurately. With COVID-19, there's a demand, you know, to solve this problem, and there are a lot of people scrambling to deliver something in this area as fast as possible. How can we guarantee the accuracy of what we're showing to users? How do we feel confident in the information that we're showing? Is there anything more that we should be doing or that someone else could be doing?

Dave :

So this is a challenging question, I think. We have a few different data sources at our disposal, which is one of our advantages by taking in both crowdsource data and data from the ETM hardware suppliers. We have the ability to measure the accuracy of those two things against each other. So we've seen accuracy statistics from different hardware supplies of 96% or 98% accuracy around counting how many people are currently on a vehicle. But as Andy mentioned earlier, this doesn't provide that qualitative, very subjective information about how much space there actually is left on a vehicle. So, with crowdsource data, we might not have the specific number of seats available as high accuracy as the counting machines on the vehicles, the automated passenger counting machines, but the accuracy of the data that's actually useful to the end user, we believe it's going to be much higher.

Matt :

Okay. So in a scenario where we have different number of data sources, how are we working out we'll get shown to the customer in the app some way. So we're saying that we're going to show quiet, moderate and busy, but, you know, what's our trigger point for moving into those areas?

Andy :

At the moment from the data that we would receive over a vehicle monitoring feed the operators are setting their own thresholds as to what percentage of a bus seats occupied constitutes a 'moderate' or a 'busy', what would be interesting is whether the operator's idea of what constitutes busy actually matches up with with customers ideas of what a busy bus is and who has the idea of who believes it's most busy.

Matt :

Yeah, I suppose that's what we'll draw a line somewhere different but I guess that's going to be a little bit perhaps trial and error and just you know, with the operators and the customers and seeing where everyone's comfortable.

Dave :

I think these things change over time as well. As lockdown is gradually lifted and guidance adjusts, different countries have different social distance - numbers - we have two metres in the UK, some countries in Europe have 1.5 metres. So it might be that the the idea of busyness or how much space needs to be between different people and adjusts over time as the science improves, and as we learn more about COVID-19.

Matt :

Andy - so we've been adding real time capacity information to each vehicle. Have you considered where else you might be able to show this data For example, departure boards or any other feed?

Andy :

We can show this data essentially, wherever we normally show these resources. We currently show vehicles on on maps in various places be at a timetable page, or live sort of overview maps. We also show them on individual journeys. If you're looking for a particular journey from one place to the other, anywhere, we can show this we can also surface this data. And the more places we surface this data, the more data we can actually collect from people in order to improve the accuracy or to improve this sort of majority voting of this.

Matt :

Okay, so is any of this information going to be part of the Open Data Portal stuff that we already give to operators that have websites and systems with us?

Andy :

We can provide all of this data via an API. And at the moment, it's, we can consider taking this public. The usage of this data is is a little bit restricted because it's very, very temporal. It decays very quickly the value of this data.

Matt :

Right. Okay, so in an hour's time, it's out of date, really, it needs to be like the live vehicle locations, it needs to be real time for it to be important and useful.

Andy :

Yeah, there's not much, there's not much scope for for for this data after beyond about 10-15 minutes, really, bus levels can change very, very quickly. Especially it's sort of like key stops or, or hubs where, where lots of people tend to tend to change or to board or alight in those places. You can go very quickly from a busy bus to an almost empty bus. And vice versa. Yeah, but there's definitely some there's definitely value in this sort of archive of data, being able to go back and look at the patterns of where buses get busy, and that could advise journey patterns and you know, other sort of short shuttle groups that could go on I'm particularly busy parts of a journey.

Matt :

Okay, so I mean, that leads me to my next question, perhaps Dave you could answer this one for me, have we're given any thought to how we might handle when the bus is too busy? What that might mean for user instead of just saying, "Look, it's busy and the next one's in an hour". What does a user do?

Dave :

Yeah, so we have both the end user that's potentially getting or about to get on the bus. We also have the operator stuff. So one of the things that we're planning on doing is providing some of this data to the operators directly as well. So they can see, as Andy mentioned, around planning and that kind of thing. If there are sort of hotspots around the network, and they'll be able to look into some of that data, then for the user who's about to get on their vehicle. We've got some things in process around alternative modes and looking at other options that they would have. So if a vehicle is almost full and there's not space to get on their vehicle and still social distance responsibly, then we're planning to provide alternative information to the user. One of the things that we're looking at is providing information about bike share schemes. So we have integrations with public bike share schemes in a few of our apps and websites already. So if someone's about to get on a bus, but it's full, and then we can provide guidance around where the nearest Bike Share thing or we can provide guidance around where the nearest Bike Share Bay is and direct the user to that place. We can also provide them information and when we start to collect around more busyness data and more historical data, provide them information about perhaps a later bus that they could get or a slightly different route that isn't as busy as the one there on at the moment.

Matt :

Yeah, that's, that's great. I think the moment what we've developed and pushed out in the last couple of weeks, I know that isn't the finished form. And we've talked about the real time passenger data on vehicles, but can you tell me about the plans to perhaps extend that from bus to trams and other modes as well?

Andy :

Trams are largely quite similar to two buses in this sense, apart from sort of the difference in having maybe more than one carriage, it's quite similar to decks in that sense only with the carriage tend to be more accessible. So there's again, there's this, there's a sort of qualitative element to this, where you need to decide how busy a tram is, in general, if there's many, many seats on a on a tram, but everyone is, is clustered into one, then someone's idea of how busy this is going to be is going to vary from from from carriage to carriage. As far as stops go, stops in a very interesting thing because stops can't be addressed by a single operator feed for a vehicle. There's very few situations where a stop is run exclusively by an operator, or that one operator is the only one having a route through there. So in that case, crowdsourcing that information is very important. It gives the best sort of analogy of how busy the stop is going to be. When you get there and how much further you're gonna have to stand down the street in order to maintain proper social distancing.

Matt :

Yeah, I can see where in the centre of town that might be more of a concern than perhaps on the outskirts of the network. But it may inform a user on potentially if they've got a choice of stops, and perhaps which one they might want to go to. So I could see why that would be really, really helpful for the end user. So we've got an option to allow users to feedback that the wheelchair bay is in use, which which is interesting, and potentially solves a big problem that existed for a little while for people that want to know whether or not they're able to even get on the bus because most buses only allow for one or possibly two wheelchair users. And I know that, Tom, you were quite interested in getting this in this feature in. Can you talk to me about why you felt like this was an important thing to make sure went out in this first release?

Tom :

Yeah, absolutely. I think it's one of those things that when you start building a feature, you look at kind of what else you can do with it. Historically, it's always been really difficult for, you know, drivers to communicate via the systems they've got, whether that bay is taken up already. And I know that you know, there has been quite a lot of conflict between mums with buggies, and mums and dads with buggies and push chairs versus, you know, people that need to use the wheelchair bays. I think there's, there's a real practical application to this in terms of, you know, someone who is in a wheelchair and is making their way to the stop to have a some foresight as to whether that space is vacant or not. So, yeah, certainly from our perspective, when we're building something like this, and it starts to look, you know, that starts to use the collective minds of everyone on the bus, everyone travelling, you know, all those brains are great sources of information and they can also look around and see whether that that bay is vacant as well. So it makes perfect sense from a technology perspective. If we're doing one thing, we may as well absolutely bolt this one on as well. And actually, I think this one, you know, has a greater deal of importance in the longer term.

Matt :

Yeah, I think it's nice that we were able to put in something small like this, that potentially may have a positive effect for for some of those other users. So we saw in the news this morning, or possibly last night that wearing a mask is now a requirement of using public transport. And if you don't, then there's the potential of fines I believe it is? How can we react to that sort of thing in our in our app? Is there anything that perhaps we should be doing or could be doing to work alongside any of that new governance and if anything else happens - how fast are we able to react?

Dave :

So one of the benefits that we have is that we work really closely with the operators and to provide information direct to the end users. And within apps and websites we have information around social distancing and guidance from the directly from the operators, around the best practices to do that. And so, I've seen some operators have already included about wearing masks in this guidance. So when someone's looking at a vehicle or something within on the website or on the app, they'll be able to see directly from the operator around the information about wearing masks and this information also includes things like how often the vehicle is cleaned, and whether there's any other things that the customer needs to be aware of when boarding the vehicle.

Matt :

So, Tom, you may remember when we spoke to James Carney of Blackpool a couple weeks ago, he mentioned that they were selling masks on their buses. So could you see perhaps an evolution of this system where it might tell you whether or not they had any masks left on the bus to sell or or something like that? Do you think that would be useful?

Tom :

I think it may well be. I think it's, it's one of those that I think what we're talking about here is non surgical masks. I think these are the kinds of masks that are, you know that people can make themselves out of an old t shirt, for example. So there might not be a requirement for the operators to sell actual masks, I think it's one of those things that it's a little bit uncertain as to what the guidance is going to be, you know, this, I believe, comes in on the 15th of June. I don't really know how they're going to enforce any of the fines. And I think, you know, what we're talking about here. And, you know, a point Dave made earlier on in the conversation was around the fact that this stuff is changing so quickly. And I think the benefit that we have with this system is that you know, the information can be updated really easily. So, you know, if an operator is selling masks on that particular bus, for example, we wouldn't have a real time stock of masks or anything like that, because that would be quite sort of almost an insane system to have, but certainly we could say that there is a supply of masks on that bus that could be acquired from the driver purchased or given away or whatever the operater wanted to do, so we could communicate something. Absolutely, I think what we're making sure what we're doing is, is allowing our operators to update information for their customers as quickly as possible as these things change. But making sure that we don't build something that is very focused on on COVID-19 we're really, really keen that everything we make has a longevity to it in terms of the value that it brings, you know, post this period. And, you know, again, we talked earlier about, you know, the social distancing, coming down from two metres, you know, we know the World Health Organisation's official guidance is 1 metre, you know, and I'm absolutely sure that it will come down at some point in the UK to, to around that sort of distance to enable, you know, pubs and restaurants to re-establish their businesses and get back up and running again. So we are dealing with things that are changing fairly quickly. So we have to take that into consideration when we structure you know, the systems to make sure that we are able to continue to react quickly to these changes to support everyone to get this information out there as quickly as possible and as accurately as possible, because I think that accuracy is the most important thing, particularly in the context of how many people are on a bus and how clean the bus is, because that's the bit that gives the reassurance. You know, that's the bit that helps someone to, you know, he's someone who's been perhaps in lockdown at home, in isolation for quite some time, as many of us have been the reassurance to get back onto, you know, back out there in the world and know that, you know, they're going to be safe doing that. That's the that's the key.

Matt :

Yeah, I think you're absolutely right.

Tom :

So going back to something that we talked about a little bit earlier on, I just want to sort of quiz you guys on how we come up with a really clear indication to the end user about the 'busyness' or the 'crowdedness', whatever the phrases we want to use about the vehicle itself, and then we've got lots of potential and lots of discussions happening around, you know, different data feeds coming in. We've talked about crowd sourced data - So that's information that we're getting from the passengers - we talked about how Ticketer are providing counting numbers via their existing Siri VM feeds. And I've also had conversations with, with potential people interested in the system about how we might use sort of device GPS location data. I guess the question is, in all of these different streams of data coming in, how do we then turn that into something so simple, that sort of reliably, robustly accurately communicates the situation on the ground to the customer.

Andy :

I think we have to treat each one of those as an individual signal of the busyness of that vehicle. It's similar to how a search engine would work in so much as there are lots of different things that they take into account. But ultimately, they boil that down to a 1234 sort of ranking of, depending on, of the site that you're trying to search for. We need to do the same thing in that sense, to take all of these different data sources, combine them together, give them different weighting depending on how accurate we think they might be, how timely they are, like when they were last reported, for example, and mix that together to provide the customer with the clearest idea of, "Okay, do I get on this bus? Or do I wait for another or find a different, a different way to get where I'm going"? We also need to take into account stops. Stops are a very interesting part of this because there are there are always going to be particular stops along a journey that are going to be more, they're gonna be more people either getting on or alighting from that bus. So we need to have that sort of journey information and that sort of, how a journey looks across the course of something. If there's a bus coming, which is very, very busy. Are you going to be able to get on that bus? If 20 people get off at that particular stop, then you can get on. That's fine. The fact that it's 'busy' up until that point starts to lose some some relevance in that sense. So we have to sort of try and build up a much bigger picture of of how these things change across the course of a journey, and across the course of a day.

Matt :

I think you're right, Andy, it's an interesting comparison to a search engine, but I think it's quite possibly, or it has the capacity to quite possibly be that complicated as the systems evolve and, and more streams come in to enable us to feed this information back. It's been really interesting to chat to you both today. I'm sure we could chat for a lot longer on a multitude of subjects. But unfortunately, I'm going to draw a line here. So Dave, and Andy, thank you very much for joining us, I'm sure will speak to you again at some point in the future.

Andy :

Thank you, Matt and Tom.

Tom :

Thank you.

Matt :

Next week, we'll be speaking to Dr. Ian Walker. He teaches Statistics and Traffic Psychology at the University of Bath we're undertakes research on road safety, travel choices and habits and energy consumption. As if that's not interesting enough, Ian's also a Guinness World Record holder for the fastest bicycle crossing of Europe. We're really looking forward to speaking with him and finding out what if anything can be done to break people's car habits, if you've any questions for us or for him tweet us @makingpassenger Until next time!