Creating 15 and 60 minute intervals by rounding.
dat2 <- dat %>%
mutate(
interval60 = floor_date(ymd_hms(Start.date), unit = "hour"),
interval15 = floor_date(ymd_hms(Start.date), unit = "15 mins"),
week = week(interval60),
dotw = wday(interval60, label = TRUE)
)
glimpse(dat2)## Rows: 337,704
## Columns: 15
## $ Duration <chr> "653", "297", "129", "1119", "533", "895", "489",…
## $ Start.date <chr> "2019-05-01 00:00:02", "2019-05-01 00:00:13", "20…
## $ End.date <chr> "2019-05-01 00:10:55", "2019-05-01 00:05:11", "20…
## $ Start.station.number <chr> "31127", "31641", "31255", "31655", "31203", "316…
## $ Start.station <chr> "22nd & H St NW", "2nd St & Massachusetts Ave NE"…
## $ End.station.number <chr> "31202", "31603", "31237", "31244", "31223", "315…
## $ End.station <chr> "14th & R St NW", "1st & M St NE", "25th St & Pen…
## $ Bike.number <chr> "W00233", "W00555", "W23383", "W20337", "W23707",…
## $ Member.type <chr> "Member", "Member", "Member", "Member", "Member",…
## $ start_geo <POINT [°]> POINT (-77.04886 38.89893), POINT (-77.0031…
## $ end_geo <POINT [°]> POINT (-77.03201 38.91305), POINT (-77.0054…
## $ interval60 <dttm> 2019-05-01, 2019-05-01, 2019-05-01, 2019-05-01, …
## $ interval15 <dttm> 2019-05-01 00:00:00, 2019-05-01 00:00:00, 2019-0…
## $ week <dbl> 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 1…
## $ dotw <ord> Wed, Wed, Wed, Wed, Wed, Wed, Wed, Wed, Wed, Wed,…
Getting variables influencing demand from census data-
dcCensus <-
get_acs(geography = "tract",
variables = c("B01003_001", "B19013_001",
"B02001_002", "B08013_001",
"B08012_001", "B08301_001",
"B08301_010", "B01002_001"),
year = 2017,
state = "DC", # ← changed to DC
geometry = TRUE,
county = NULL, # ← DC is a state + city, no county needed
output = "wide") %>%
rename(Total_Pop = B01003_001E,
Med_Inc = B19013_001E,
Med_Age = B01002_001E,
White_Pop = B02001_002E,
Travel_Time = B08013_001E,
Num_Commuters = B08012_001E,
Means_of_Transport = B08301_001E,
Total_Public_Trans = B08301_010E) %>%
select(Total_Pop, Med_Inc, White_Pop, Travel_Time,
Means_of_Transport, Total_Public_Trans,
Med_Age,
GEOID, geometry) %>%
mutate(Percent_White = White_Pop / Total_Pop,
Mean_Commute_Time = Travel_Time / Total_Public_Trans,
Percent_Taking_Public_Trans = Total_Public_Trans / Means_of_Transport)Extract Geometries-
dcTracts <-
dcCensus %>%
as.data.frame() %>%
distinct(GEOID, .keep_all = TRUE) %>%
select(GEOID, geometry) %>%
st_sf()Adding the spatial information to our rideshare data as origin and destination data-
dat2 <- dat2 %>%
mutate(
from_longitude = st_coordinates(start_geo)[, 1],
from_latitude = st_coordinates(start_geo)[, 2],
to_longitude = st_coordinates(end_geo)[, 1],
to_latitude = st_coordinates(end_geo)[, 2]
)
dat_census <- dat2 %>%
filter(!is.na(from_longitude) &
!is.na(from_latitude) &
!is.na(to_longitude) &
!is.na(to_latitude)) %>%
st_as_sf(coords = c("from_longitude", "from_latitude"), crs = 4326) %>%
st_join(
dcTracts %>% st_transform(crs = 4326),
join = st_intersects,
left = TRUE
) %>%
rename(Origin.Tract = GEOID) %>%
mutate(
from_longitude = unlist(map(geometry, 1)),
from_latitude = unlist(map(geometry, 2))
) %>%
as.data.frame() %>%
select(-geometry) %>%
st_as_sf(coords = c("to_longitude", "to_latitude"), crs = 4326) %>%
st_join(
dcTracts %>% st_transform(crs = 4326),
join = st_intersects,
left = TRUE
) %>%
rename(Destination.Tract = GEOID) %>%
mutate(
to_longitude = unlist(map(geometry, 1)),
to_latitude = unlist(map(geometry, 2))
) %>%
as.data.frame() %>%
select(-geometry)weather.Panel <-
riem_measures(station = "DCA", date_start = "2019-05-01", date_end = "2019-05-31") %>%
dplyr::select(valid, tmpf, p01i, sknt) %>%
replace(is.na(.), 0) %>%
mutate(interval60 = ymd_h(substr(valid, 1, 13))) %>%
mutate(
week = week(interval60),
dotw = wday(interval60, label = TRUE)
) %>%
group_by(interval60) %>%
summarize(
Temperature = max(tmpf),
Precipitation = sum(p01i),
Wind_Speed = max(sknt)
) %>%
mutate(
Temperature = ifelse(Temperature == 0, 42, Temperature)
)
glimpse(weather.Panel)## Rows: 720
## Columns: 4
## $ interval60 <dttm> 2019-05-01 00:00:00, 2019-05-01 01:00:00, 2019-05-01 02…
## $ Temperature <dbl> 75, 71, 69, 70, 67, 64, 63, 62, 61, 61, 60, 60, 60, 60, …
## $ Precipitation <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ Wind_Speed <dbl> 7, 7, 7, 7, 10, 12, 13, 8, 10, 8, 10, 10, 10, 8, 8, 6, 5…
Plotting Weather Data for Washington DC-
grid.arrange(
ggplot(weather.Panel, aes(interval60,Precipitation)) + geom_line() +
labs(title="Percipitation", x="Hour", y="Perecipitation") + plotTheme,
ggplot(weather.Panel, aes(interval60,Wind_Speed)) + geom_line() +
labs(title="Wind Speed", x="Hour", y="Wind Speed") + plotTheme,
ggplot(weather.Panel, aes(interval60,Temperature)) + geom_line() +
labs(title="Temperature", x="Hour", y="Temperature") + plotTheme,
top="Weather Data - Washington DC - May, 2019")dat_census %>%
mutate(time_of_day = case_when(hour(interval60) < 7 | hour(interval60) > 18 ~ "Overnight",
hour(interval60) >= 7 & hour(interval60) < 10 ~ "AM Rush",
hour(interval60) >= 10 & hour(interval60) < 15 ~ "Mid-Day",
hour(interval60) >= 15 & hour(interval60) <= 18 ~ "PM Rush"))%>%
group_by(interval60, Start.station, time_of_day) %>%
tally()%>%
group_by(Start.station, time_of_day)%>%
summarize(mean_trips = mean(n))%>%
ggplot()+
geom_histogram(aes(mean_trips), binwidth = 1)+
labs(title="Mean Number of Hourly Trips Per Station. Washington, May, 2019",
x="Number of trips",
y="Frequency")+
facet_wrap(~time_of_day)+
plotThemeThe above graph clearly shows that DC has higher AM and Overnight rush and lower Mid-day and PM rush in the city.
ggplot(dat_census %>%
group_by(interval60, Start.station) %>%
tally())+
geom_histogram(aes(n), binwidth = 5)+
labs(title="Bike share trips per hr by station. Washington, May, 2019",
x="Trip Counts",
y="Number of Stations")+
plotTheme
This graph show that most stations have very low trip counts per hour
(near 0 to 2 trips).A smaller number of stations have medium activity
(5–10 trips) and very few stations have high trip activity (15+, 20+,
almost none past 30 trips).From this we can conclude that bike share
usage is very uneven.A few popular stations (near downtown, offices,
metro stops) get tons of trips. Most stations (especially residential or
suburban ones) are quiet most of the day.
ggplot(dat_census %>% mutate(hour = hour(Start.date)))+
geom_freqpoly(aes(hour, color = dotw), binwidth = 1)+
labs(title="Bike share trips in Washington, by day of the week, May, 2019",
x="Hour",
y="Trip Counts")+
plotThemeggplot(dat_census %>%
mutate(hour = hour(Start.date),
weekend = ifelse(dotw %in% c("Sun", "Sat"), "Weekend", "Weekday")))+
geom_freqpoly(aes(hour, color = weekend), binwidth = 1)+
labs(title="Bike share trips in Washington - weekend vs weekday, May, 2018",
x="Hour",
y="Trip Counts")+
plotThemeFollowing up from the above graph, it is clear that office hour rush in the city is more, with demand peaking around 8-9 am and 5-6 pm on most weekdays and 12-3 pm on most weekends.
ggplot()+
geom_sf(data = dcTracts %>%
st_transform(crs=4326))+
geom_point(data = dat_census %>%
mutate(hour = hour(Start.date),
weekend = ifelse(dotw %in% c("Sun", "Sat"), "Weekend", "Weekday"),
time_of_day = case_when(hour(interval60) < 7 | hour(interval60) > 18 ~ "Overnight",
hour(interval60) >= 7 & hour(interval60) < 10 ~ "AM Rush",
hour(interval60) >= 10 & hour(interval60) < 15 ~ "Mid-Day",
hour(interval60) >= 15 & hour(interval60) <= 18 ~ "PM Rush"))%>%
group_by(Start.station.number, from_latitude, from_longitude, weekend, time_of_day) %>%
tally(),
aes(x=from_longitude, y = from_latitude, color = n),
fill = "transparent", alpha = 0.4, size = 0.3)+
scale_colour_viridis(direction = -1,
discrete = FALSE, option = "D")+
ylim(min(dat_census$from_latitude), max(dat_census$from_latitude))+
xlim(min(dat_census$from_longitude), max(dat_census$from_longitude))+
facet_grid(weekend ~ time_of_day)+
labs(title="Bike share trips per hr by station. Washington, May, 2019")+
mapThemeConfirming our previous analysis, we see that the city center sees most demand during the weekdays with less use in the outskirts in the city. Weekend see a general low demand across all stations.
We created a complete panel dataset where every station and every hour is included, even if no trips happened. This makes sure our model can learn from both busy and quiet times. We also added weather and census features to help improve our predictions.
## [1] 346704
study.panel <-
expand.grid(interval60=unique(dat_census$interval60),
Start.station.number = unique(dat_census$Start.station.number)) %>%
left_join(., dat_census %>%
select(Start.station.number, Start.station, Origin.Tract, from_longitude, from_latitude )%>%
distinct() %>%
group_by(Start.station.number) %>%
slice(1))
nrow(study.panel) ## [1] 346704
ride.panel <-
dat_census %>%
mutate(Trip_Counter = 1) %>%
right_join(study.panel) %>%
group_by(interval60, Start.station.number, Start.station, Origin.Tract, from_longitude, from_latitude) %>%
summarize(Trip_Count = sum(Trip_Counter, na.rm=T)) %>%
left_join(weather.Panel) %>%
ungroup() %>%
filter(is.na(Start.station.number) == FALSE) %>%
mutate(week = week(interval60),
dotw = wday(interval60, label = TRUE)) %>%
filter(is.na(Origin.Tract) == FALSE)We created lag features to capture how past trip counts affect future trips. We also added holiday information and checked how well the lagged values are related to current trip counts. This helps the model better predict busy and slow periods.
ride.panel <-
ride.panel %>%
arrange(Start.station.number, interval60) %>%
mutate(lagHour = dplyr::lag(Trip_Count,1),
lag2Hours = dplyr::lag(Trip_Count,2),
lag3Hours = dplyr::lag(Trip_Count,3),
lag4Hours = dplyr::lag(Trip_Count,4),
lag12Hours = dplyr::lag(Trip_Count,12),
lag1day = dplyr::lag(Trip_Count,24),
holiday = ifelse(yday(interval60) == 148,1,0)) %>%
mutate(day = yday(interval60)) %>%
mutate(holidayLag = case_when(dplyr::lag(holiday, 1) == 1 ~ "PlusOneDay",
dplyr::lag(holiday, 2) == 1 ~ "PlustTwoDays",
dplyr::lag(holiday, 3) == 1 ~ "PlustThreeDays",
dplyr::lead(holiday, 1) == 1 ~ "MinusOneDay",
dplyr::lead(holiday, 2) == 1 ~ "MinusTwoDays",
dplyr::lead(holiday, 3) == 1 ~ "MinusThreeDays"),
holidayLag = ifelse(is.na(holidayLag) == TRUE, 0, holidayLag))as.data.frame(ride.panel) %>%
group_by(interval60) %>%
summarise_at(vars(starts_with("lag"), "Trip_Count"), mean, na.rm = TRUE) %>%
gather(Variable, Value, -interval60, -Trip_Count) %>%
mutate(Variable = factor(Variable, levels=c("lagHour","lag2Hours","lag3Hours","lag4Hours",
"lag12Hours","lag1day")))%>%
group_by(Variable) %>%
summarize(correlation = round(cor(Value, Trip_Count),2))## # A tibble: 6 × 2
## Variable correlation
## <fct> <dbl>
## 1 lagHour 0.82
## 2 lag2Hours 0.54
## 3 lag3Hours 0.31
## 4 lag4Hours 0.14
## 5 lag12Hours -0.33
## 6 lag1day 0.81
The data is split to ensure a time based split.
ride.panel <- ride.panel %>%
mutate(week = week(interval60))
ride.Train <- filter(ride.panel, week >= 20)
ride.Test <- filter(ride.panel, week < 20)reg1 <-
lm(Trip_Count ~ factor(hour(interval60)) + factor(dotw) + Temperature, data=ride.Train)
reg2 <-
lm(Trip_Count ~ Start.station.number + factor(dotw)+ Temperature, data=ride.Train)
reg3 <-
lm(Trip_Count ~ Start.station.number + factor(hour(interval60)) + factor(dotw) + Temperature + Precipitation,
data=ride.Train)
reg4 <-
lm(Trip_Count ~ Start.station.number + factor(hour(interval60)) + factor(dotw) + Temperature + Precipitation +
lagHour + lag2Hours +lag3Hours + lag12Hours + lag1day,
data=ride.Train)
reg5 <-
lm(Trip_Count ~ Start.station.number + factor(hour(interval60)) + factor(dotw) + Temperature + Precipitation +
lagHour + lag2Hours +lag3Hours +lag12Hours + lag1day + holidayLag + holiday,
data=ride.Train)Five different regression models were created using different sets of features like time of day, weather, past trip counts (lags), and holidays. Each model builds on the previous one by adding more detailed information.
The test dataset was nested by week so that predictions could be made separately for each week. This helps in organizing and comparing model performance across different time periods.
Each model was used to predict trip counts on the test data. The actual and predicted values were compared to calculate errors like Mean Absolute Error (MAE) and standard deviation of errors for each model and week to understand which is the best model.
week_predictions <-
ride.Test.weekNest %>%
mutate(ATime_FE = map(.x = data, fit = reg1, .f = model_pred),
BSpace_FE = map(.x = data, fit = reg2, .f = model_pred),
CTime_Space_FE = map(.x = data, fit = reg3, .f = model_pred),
DTime_Space_FE_timeLags = map(.x = data, fit = reg4, .f = model_pred),
ETime_Space_FE_timeLags_holidayLags = map(.x = data, fit = reg5, .f = model_pred)) %>%
gather(Regression, Prediction, -data, -week) %>%
mutate(Observed = map(data, pull, Trip_Count),
Absolute_Error = map2(Observed, Prediction, ~ abs(.x - .y)),
MAE = map_dbl(Absolute_Error, mean, na.rm = TRUE),
sd_AE = map_dbl(Absolute_Error, sd, na.rm = TRUE))
week_predictionsweek_predictions %>%
dplyr::select(week, Regression, MAE) %>%
gather(Variable, MAE, -Regression, -week) %>%
ggplot(aes(week, MAE)) +
geom_bar(aes(fill = Regression), position = "dodge", stat="identity") +
scale_fill_manual(values = palette5) +
labs(title = "Mean Absolute Errors by model specification and week") +
plotThemeModel 5 (ETime_Space_FE_timeLags_holidayLags) had the lowest Mean Absolute Error (MAE) across all test weeks. As we added more features like lags and holidays, the models got better at predicting demand. This shows that using past trip patterns and holiday effects helps improve forecast accuracy.
week_predictions %>%
mutate(interval60 = map(data, pull, interval60),
Start.station.number = map(data, pull, Start.station.number)) %>%
dplyr::select(interval60, Start.station.number, Observed, Prediction, Regression) %>%
unnest() %>%
gather(Variable, Value, -Regression, -interval60, -Start.station.number) %>%
group_by(Regression, Variable, interval60) %>%
summarize(Value = sum(Value)) %>%
ggplot(aes(interval60, Value, colour=Variable)) +
geom_line(size = 1.1) +
facet_wrap(~Regression, ncol=1) +
labs(title = "Predicted/Observed bike share time series", subtitle = "Chicago; A test set of 2 weeks", x = "Hour", y= "Station Trips") +
plotThemeThe predicted trip counts closely follow the actual trip patterns across different models. Model 5 shows the best alignment with actual demand, especially during peak hours. However, there are still small gaps during sudden spikes, like on weekends or holidays.
week_predictions %>%
mutate(interval60 = map(data, pull, interval60),
Start.station.number = map(data, pull, Start.station.number),
from_latitude = map(data, pull, from_latitude),
from_longitude = map(data, pull, from_longitude)) %>%
select(interval60, Start.station.number, from_longitude, from_latitude, Observed, Prediction, Regression) %>%
unnest() %>%
filter(Regression == "ETime_Space_FE_timeLags_holidayLags") %>%
group_by(Start.station.number, from_longitude, from_latitude) %>%
summarize(MAE = mean(abs(Observed-Prediction), na.rm = TRUE))%>%
ggplot(.)+
geom_sf(data = dcCensus, color = "grey", fill = "transparent")+
geom_point(aes(x = from_longitude, y = from_latitude, color = MAE),
fill = "transparent", alpha = 0.4)+
scale_colour_viridis(direction = -1,
discrete = FALSE, option = "D")+
ylim(min(dat_census$from_latitude), max(dat_census$from_latitude))+
xlim(min(dat_census$from_longitude), max(dat_census$from_longitude))+
labs(title="Mean Abs Error, Test Set, Model 5")+
mapThemeErrors are higher at some stations compared to others. Downtown and busy areas have lower prediction errors, while outlying areas show slightly higher errors. This means the model is better at predicting trips where there is regular and stable bike usage.
week_predictions %>%
mutate(interval60 = map(data, pull, interval60),
Start.station.number = map(data, pull, Start.station.number),
from_latitude = map(data, pull, from_latitude),
from_longitude = map(data, pull, from_longitude),
dotw = map(data, pull, dotw)) %>%
select(interval60, Start.station.number, from_longitude,
from_latitude, Observed, Prediction, Regression,
dotw) %>%
unnest() %>%
filter(Regression == "ETime_Space_FE_timeLags_holidayLags")%>%
mutate(weekend = ifelse(dotw %in% c("Sun", "Sat"), "Weekend", "Weekday"),
time_of_day = case_when(hour(interval60) < 7 | hour(interval60) > 18 ~ "Overnight",
hour(interval60) >= 7 & hour(interval60) < 10 ~ "AM Rush",
hour(interval60) >= 10 & hour(interval60) < 15 ~ "Mid-Day",
hour(interval60) >= 15 & hour(interval60) <= 18 ~ "PM Rush"))%>%
ggplot()+
geom_point(aes(x= Observed, y = Prediction))+
geom_smooth(aes(x= Observed, y= Prediction), method = "lm", se = FALSE, color = "red")+
geom_abline(slope = 1, intercept = 0)+
facet_grid(time_of_day~weekend)+
labs(title="Observed vs Predicted",
x="Observed trips",
y="Predicted trips")+
plotThemeThe model predicts weekday trips better than weekend trips, especially during AM and PM rush hours. On weekends, the predictions are slightly less accurate, with more scatter and bigger errors during mid-day and overnight periods. Overall, the model tends to slightly underpredict when the number of trips is very high.
week_predictions %>%
mutate(interval60 = map(data, pull, interval60),
Start.station.number = map(data, pull, Start.station.number),
from_latitude = map(data, pull, from_latitude),
from_longitude = map(data, pull, from_longitude),
dotw = map(data, pull, dotw) ) %>%
select(interval60, Start.station.number, from_longitude,
from_latitude, Observed, Prediction, Regression,
dotw) %>%
unnest() %>%
filter(Regression == "ETime_Space_FE_timeLags_holidayLags")%>%
mutate(weekend = ifelse(dotw %in% c("Sun", "Sat"), "Weekend", "Weekday"),
time_of_day = case_when(hour(interval60) < 7 | hour(interval60) > 18 ~ "Overnight",
hour(interval60) >= 7 & hour(interval60) < 10 ~ "AM Rush",
hour(interval60) >= 10 & hour(interval60) < 15 ~ "Mid-Day",
hour(interval60) >= 15 & hour(interval60) <= 18 ~ "PM Rush")) %>%
group_by(Start.station.number, weekend, time_of_day, from_longitude, from_latitude) %>%
summarize(MAE = mean(abs(Observed-Prediction), na.rm = TRUE))%>%
ggplot(.)+
geom_sf(data = dcCensus, color = "grey", fill = "transparent")+
geom_point(aes(x = from_longitude, y = from_latitude, color = MAE),
fill = "transparent", size = 0.5, alpha = 0.4)+
scale_colour_viridis(direction = -1,
discrete = FALSE, option = "D")+
ylim(min(dat_census$from_latitude), max(dat_census$from_latitude))+
xlim(min(dat_census$from_longitude), max(dat_census$from_longitude))+
facet_grid(weekend~time_of_day)+
labs(title="Mean Absolute Errors, Test Set")+
mapThemePrediction errors (MAE) are generally low across the city, staying mostly between 1 and 4 trips. Errors are slightly higher during mid-day and overnight, especially on weekends. The model performs better during AM and PM rush hours on weekdays, where trip patterns are more regular and predictable.
week_predictions %>%
mutate(interval60 = map(data, pull, interval60),
Start.station.number = map(data, pull, Start.station.number),
from_latitude = map(data, pull, from_latitude),
from_longitude = map(data, pull, from_longitude),
dotw = map(data, pull, dotw),
Percent_Taking_Public_Trans = map(data, pull, Percent_Taking_Public_Trans),
Med_Inc = map(data, pull, Med_Inc),
Percent_White = map(data, pull, Percent_White)) %>%
select(interval60, Start.station.number, from_longitude,
from_latitude, Observed, Prediction, Regression,
dotw, Percent_Taking_Public_Trans, Med_Inc, Percent_White) %>%
unnest() %>%
filter(Regression == "ETime_Space_FE_timeLags_holidayLags")%>%
mutate(weekend = ifelse(dotw %in% c("Sun", "Sat"), "Weekend", "Weekday"),
time_of_day = case_when(hour(interval60) < 7 | hour(interval60) > 18 ~ "Overnight",
hour(interval60) >= 7 & hour(interval60) < 10 ~ "AM Rush",
hour(interval60) >= 10 & hour(interval60) < 15 ~ "Mid-Day",
hour(interval60) >= 15 & hour(interval60) <= 18 ~ "PM Rush")) %>%
filter(time_of_day == "AM Rush") %>%
group_by(Start.station.number, Percent_Taking_Public_Trans, Med_Inc, Percent_White) %>%
summarize(MAE = mean(abs(Observed-Prediction), na.rm = TRUE))%>%
gather(-Start.station.number, -MAE, key = "variable", value = "value")%>%
ggplot(.)+
#geom_sf(data = dcCensus, color = "grey", fill = "transparent")+
geom_point(aes(x = value, y = MAE), alpha = 0.4)+
geom_smooth(aes(x = value, y = MAE), method = "lm", se= FALSE)+
facet_wrap(~variable, scales = "free")+
labs(title="Errors as a function of socio-economic variables",
y="Mean Absolute Error (Trips)")+
plotThemePrediction errors are slightly higher in areas with higher median incomes and a higher percentage of White residents. There is not much change in error based on the percentage of people taking public transit. Overall, the model does a good job across different socio-economic groups, but small differences still exist.
The model’s predictions can directly support a smarter bike rebalancing strategy across Washington, DC. Because the model forecasts demand 2 to 3 hours ahead, trucks can be sent to stations that are likely to run low on bikes or docking spaces before the problem actually happens. During weekday AM and PM rush hours, where the model predicts most accurately, truck routes can be scheduled confidently to refill or clear key commuter stations. On weekends and mid-day periods, where predictions are less stable, additional rider incentives (such as discounts or extra ride time) could help naturally move bikes without relying only on trucks.
The spatial error maps show that downtown stations are easier to predict, meaning rebalancing trucks should prioritize residential or edge areas that have slightly higher errors.
Socio-economic patterns also suggest that wealthier neighborhoods and areas with a higher percentage of White residents may have slightly higher uncertainty, so flexible truck routing or mobile response teams could be helpful there.
Overall, the model allows for an efficient rebalancing plan, minimizing empty docks and stranded bikes across both busy and quiet parts of the city.
This modeling approach successfully forecasts short-term bike share demand and supports smarter rebalancing decisions. The model captures daily patterns and rush hour peaks well, helping operators plan ahead and avoid service gaps. While it performed strongly during predictable periods, unexpected events like holidays and weekend travel patterns still presented challenges. Errors were slightly higher at less busy stations and in wealthier areas, suggesting the need for more localized adjustments. Going forward, improvements could include adding real-time station availability data, accounting for special city events, and testing more flexible machine learning models to better handle unpredictable demand shifts.