Saving Realtime Transit Data to a DataFrame
Getting Realtime Transit Data from the STM API using Python looked at getting realtime transit data and displaying it in a notebook.
This post explores saving the same data to a Polars DataFrame so it's easy to analyze. I like working with Polars because it has a really intuitive API.
Python setup
Install polars into the same environment as used in the Getting Realtime Transit Data post:
pip install polars
Notebook setup
In the first cell of the notebook, import the required libraries:
import requests
import gtfs_realtime_pb2
import polars as pl
The first two here are the same as in the Getting Realtime Transit Data post.
Getting the data
The following code comes from the earlier post, the only difference now being that it has a function, realtime. Instead of hardcoding the API key, we add an api_key parameter to this realtime function. This gives code that's easier to work with in a notebook, as the function can be called multiple times to get realtime data for multiple points in time.
def realtime(api_key):
url = "https://api.stm.info/pub/od/gtfs-rt/ic/v2/vehiclePositions"
headers = {
"accept": "application/x-protobuf",
"apiKey": f"{api_key}",
}
response = requests.get(url, headers=headers)
protobuf_data = response.content
message = gtfs_realtime_pb2.FeedMessage()
message.ParseFromString(protobuf_data)
return message
Processing to a DataFrame
message contains the realtime data. To process the fields we want from that realtime data, we will initially represent each returned entity as a dictionary and then store each of those dictionaries in a list.
[
{'trip_id':'123', 'route_id':'45', 'longitude'....}
{'trip_id':'456', 'route_id':'29', 'longitude'....}
...
]
With a list of dictionaries, where each dictionary represents one entity, we'll then convert it to a DataFrame.
Here's what the code will look like, (added to the earlier code in the realtime function).
def realtime(api_key):
...
# Create a list to store each entity in
data = []
# Get the timestamp from the message header
header_timestamp = message.header.timestamp
# Loop through the entities
for entity in message.entity:
# Create an empty dict to store the entity information
entity_data = {}
# Extract all the relevant fields and add them to the empty dict
entity_data['header_timestamp'] = header_timestamp
entity_data['entity_id'] = entity.id
trip = entity.vehicle.trip
entity_data['trip_id'] = trip.trip_id
entity_data['start_time'] = trip.start_time
entity_data['start_date'] = trip.start_date
entity_data['route_id'] = trip.route_id
position = entity.vehicle.position
entity_data['latitude'] = position.latitude
entity_data['longitude'] = position.longitude
entity_data['bearing'] = position.bearing
entity_data['speed'] = position.speed
entity_data['current_stop_sequence'] = entity.vehicle.current_stop_sequence
entity_data['current_status'] = entity.vehicle.current_status
entity_data['timestamp'] = entity.vehicle.timestamp
vehicle = entity.vehicle.vehicle
entity_data['vehicle_id'] = vehicle.id
entity_data['occupancy_status'] = entity.vehicle.occupancy_status
# Add this record to the list
data.append(entity_data)
# Convert the list of dicts to a polars DataFrame
df = pl.DataFrame(data)
return df
Now calling the function with an API key:
df = realtime("<api_key>")
Returns a polars DataFrame object:

We can now start to explore the data. For example, using a filter to get the current buses running for a particular route. Here, I use route 45...because it's the best in Montreal.
df.filter(pl.col("route_id")=="45")