Running a test example for development and familiarisation with workflow #167

carlhiggs · 2025-10-16T03:38:22Z

carlhiggs
Oct 16, 2025
Maintainer

The MITO and SILO workflows are computationally intensive, at least when applied at city scale -- for Manchester (2,827,276 persons, and 8,966 output area zones), and perhaps moreso Melbourne (4,174,056 persons, 10,289 SA1 zones).

To speed up the process of development, I previously created sample datasets of 100 randomly selected households (for Melbourne, that came to 231 persons). While that had some advantages for speeding things up, running some processes --- in particular, the current accident and exposure model --- still proved incredibly time intensive, even for the small test population; about 7.5 hours.

I tested a different approach --- instead of taking a small population sample for the full study region, take the full synthetic population sample for a smaller sub-region; for example, the suburb of Brunswick in Melbourne. This corresponded to 23,469 persons (much larger than my prevous toy populations for test purposes) and 56 zones. I found that processing of the same RunHealthExposureOffline code took 5 minutes (SiloMEL took 11 minutes). That was using a 10% Matsim sample, but suggests a 100% Matsim population would also be feasible. This will greatly speed up prototyping and debugging.

To do this, I derived Brunswick specific excerpts of all the Melbourne input and microdata. For the synthetic population, I re-allocated persons with jobs and schools outside Brunswick to randomly selected jobs and schools within Brunswick. All amenities, buildings and omx files were restricted to Brunswick, and I re-constructed the network for a 1km buffer around Brunswick. The network reconstruction was a bit complicated; I created a Java class in the Melbourne SILO use case utils module to handle this. In short --- the purpose of this test isn't to represent the 'real' Brunswick, its to create a micro dataset that is otherwise 'realistic', but much more performant. The overall data size is also much smaller; for Melbourne, ~300mb compared to ~30GB.

Not sure if this kind of example region would also be useful for your Manchester test purposes @usr110 @ismailsaadi @TabeaSonnenschein @berdikhanova @BelenZapata85 @JDWoodcock.

The approach was somewhat ad hoc (some modifications were manual, others coded), as it is only for test purposes, and I wasn't sure if the time taken (some hours) would pay off --- but I believe it did.

I've copied the current inputs and outputs (running the health exposure offline runner after yesterday's air pollution update to a "melbourne - brunswick test area" subfolder in the JIBE working group drive.

Here's an example summary of exposures for Brunswick -- you can see, we haven't modelled income for the synthetic population (that's why its constant; its something for future work). I noticed pm25 and no2 exposures reduced a little after the most recent formula update, which makes sense. I also noticed the upper end of noise exposure esitmates is very high, and these have been biased upwards with much larger values since changes earlier in October. I'm not sure if that is from changes related to the accident model branch that I've incorporated, or local to our Melbourne use case. This is something I can now more easily explore with the faster running test region!

I'd love to know the kinds of values you are getting on your side for the noise exposures, and exposures in general.

Variable	0%	25%	50%	75%	100%
age	0	24	32	46	96
income	98852	98852	98852	98852	98852
totalTravelTime_sec	0.00	3777.84	7542.29	12703.75	73167.45
totalActivityTime_min	0.00	291.00	799.00	1878.00	5183.00
totalTimeAtHome_min	4548.48	8049.62	9110.50	9674.22	10080.00
severeFatalInjuryCar	0.00	0.00	0.00	0.00	0.00
severeFatalInjuryBike	0.00	0.00	0.00	0.00	0.00
severeFatalInjuryWalk	0.00	0.00	0.00	0.00	0.00
mmetHr_walk	0.00	1.60	4.06	7.35	47.24
mmetHr_cycle	0.00	0.00	0.00	0.00	23.72
mmetHr_otherSport	0.00	0.00	47.41	55.94	68.53
exposure_normalised_pm25	7.58	12.03	13.50	14.03	16.55
exposure_normalised_no2	6.79	10.88	11.91	12.53	18.23
exposure_normalised_noise_Lden	33.47	54.93	60.59	79.82	346.10
exposure_normalised_ndvi	-0.04	0.17	0.23	0.29	0.76
exposure_noise_HA	7.94	10.95	15.67	48.08	3097.16
exposure_noise_HSD	2.14	5.54	7.51	10.02	719.91

ismailsaadi · 2025-10-16T10:31:59Z

ismailsaadi
Oct 16, 2025
Maintainer

@carlhiggs 7.5 hours is indeed too much. What were the computing specifications? Specifically, the number of cores and RAM

1 reply

carlhiggs Oct 16, 2025
Maintainer Author

Hi @ismailsaadi yup --- the earlier test example taking 7.5 hours with 100 household population is on my work computer with 16 cores and 64 GB ram; its a decent computer, but not capable of running the full model. Its still like a scaled down version of what I experience running the exposure model for 10% population on a much more powerful EC2; on a previous run, it took 4 days to not even complete 1/7th of the exposure model.

The point is, for local development of the models we shouldn't need to use high performance computing resources / long run times. I found that while scaling population down doesn't deliver sufficiently proportionate benefits to performance, reducing the scale of the study region (even with a much higher population) does. This means we can more quickly develop and test the model, even on a laptop.

Might be something you folks are already doing, but up to this week I had been trying to reduce the population size to make things more manageable. Seems obvious in retrospect, but the geographic scale is the bigger factor, in particular I think, reducing the zones for utility matrix calculations (~3000 OD combinations, compared with 13,262,521 for each mode-specific skim matrix for distance and times; i.e. much lower RAM utilisation).

I think it will greatly speed up development running a smaller version quickly on my local machine, so less time is spent wasting the HPC resources. A shame I didn't try this out months ago, but sharing now just in case of use on your side.

carlhiggs · 2025-10-21T05:10:37Z

carlhiggs
Oct 21, 2025
Maintainer Author

A likely contributor to slower run times was the implementation of a coefficient lookup instead of hard-coded coefficients for active mode weights. This probably explains why health indicator analysis was particularly slow for walk mode. Using the new test region I was able to prototype and test the back-tracking (#168 ; needed, because the exposure analysis recently started proceeded like the one a month ago, with 1/7 days not processed for health indicators within 4-5 days; too slow). I've restarted the main analysis using the updated code following local tests, and I'll continue to use the test region of Brunswick for cycling scenario.

1 reply

carlhiggs Oct 21, 2025
Maintainer Author

Regarding the running time, things are improved for Melbourne since the latest update hard-coding the active mode weight coefficients, --- the currently running process has taken 15.5 hours to get to where the previous version took 20 hours. But it is still only 15% through processing health indicators for walking on a Sunday.

Click to view summary of previous and current run log excerpts showing start and end/current times of processing

10% population RunExposureHealthOfflineMEL

A version of this starting on 16 October took about 5 days and didn't proceed beyond Sunday walk (corresponds to duration of an another analysis started but not completed in September). The following log excerpt shows when the process started, and when I terminated it because it was taking too long.

2025-10-16T23:20:06,731  INFO MainProperties:107 Scenario name: base
….
2025-10-17T06:33:39,509  INFO HealthExposureModelMEL:891 sunday, walk: Processed 46994 of 939889 trips (5.0%)
2025-10-17T12:25:53,407  INFO HealthExposureModelMEL:891 sunday, walk: Processed 93988 of 939889 trips (10.0%)
2025-10-17T19:13:36,805  INFO HealthExposureModelMEL:891 sunday, walk: Processed 140982 of 939889 trips (15.0%)
2025-10-18T03:26:50,059  INFO HealthExposureModelMEL:891 sunday, walk: Processed 187976 of 939889 trips (20.0%)
2025-10-18T11:51:16,751  INFO HealthExposureModelMEL:891 sunday, walk: Processed 234970 of 939889 trips (25.0%)
2025-10-18T19:32:15,484  INFO HealthExposureModelMEL:891 sunday, walk: Processed 281964 of 939889 trips (30.0%)
2025-10-19T02:47:20,285  INFO HealthExposureModelMEL:891 sunday, walk: Processed 328958 of 939889 trips (35.0%)
2025-10-19T10:36:24,606  INFO HealthExposureModelMEL:891 sunday, walk: Processed 375952 of 939889 trips (40.0%)
2025-10-19T17:50:10,178  INFO HealthExposureModelMEL:891 sunday, walk: Processed 422946 of 939889 trips (45.0%)
2025-10-20T01:55:56,556  INFO HealthExposureModelMEL:891 sunday, walk: Processed 469940 of 939889 trips (50.0%)
2025-10-20T06:20:04,662  WARN HealthExposureModelMEL:967 Object is null; follow stack trace
2025-10-20T10:11:06,426  INFO HealthExposureModelMEL:891 sunday, walk: Processed 516934 of 939889 trips (55.0%)
2025-10-20T17:47:53,314  INFO HealthExposureModelMEL:891 sunday, walk: Processed 563928 of 939889 trips (60.0%)
2025-10-21T02:44:07,035  INFO HealthExposureModelMEL:891 sunday, walk: Processed 610922 of 939889 trips (65.0%)

Yesterday, I reverted to hard code coefficients after experimenting and confirming that these were faster than the previous implementation using a look up table of coefficients. The following log shows start and current progress point:

2025-10-21T03:53:26,333  INFO MainProperties:107 Scenario name: base
…
2025-10-21T08:44:51,312  INFO HealthExposureModelMEL:928 sunday, walk: Processed 46994/939889 trips (5.0%)
2025-10-21T13:32:36,905  INFO HealthExposureModelMEL:928 sunday, walk: Processed 93988/939889 trips (10.0%)
2025-10-21T19:30:17,464  INFO HealthExposureModelMEL:928 sunday, walk: Processed 140982/939889 trips (15.0%)

The above log excerpts were taken looking at the head and tails of logs, the latter one currently running (Tue Oct 21 22:17:39 UTC 2025).

So, the previous analysis took about 20 hours to get to 15% complete of Sunday walk; the new analysis took about 15.5 hours to get to that point.

Faster, but still on track to take a very long time, with hours to complete each 5% of walk exposure evaluation and this being done for each of seven days.

I will try looking on my side if I can find other ways of getting the same results faster using profiling on our Brunswick test region.

@ismailsaadi @usr110 @TabeaSonnenschein I'd love to know if you are having similar experiences with exposure evaluation for walk mode in particular for Manchester, or if you think this is particular to Melbourne's current implementation.

The full population (10% matsim sample) RunHealthExposureOffline analysis is running on an EC2 r7i.8xlarge, with 32 vCPU and 204/256GB RAM allocated. There are times when processing appears fully utilised (which is good!), so adding more CPUs could be one approach (what are you running Manchester on?). But ideal would be to refactor code to avoid some of the within-loop processing for things that shouldn't vary across days/modes, perhaps cache-ing some results, or other strategies to reduce overall resource usage and processing time, I expect.

Love to hear your or others' thoughts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running a test example for development and familiarisation with workflow #167

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Running a test example for development and familiarisation with workflow #167

Uh oh!

Uh oh!

carlhiggs Oct 16, 2025 Maintainer

Replies: 2 comments · 2 replies

Uh oh!

ismailsaadi Oct 16, 2025 Maintainer

Uh oh!

Uh oh!

carlhiggs Oct 16, 2025 Maintainer Author

Uh oh!

Uh oh!

carlhiggs Oct 21, 2025 Maintainer Author

Uh oh!

Uh oh!

carlhiggs Oct 21, 2025 Maintainer Author

carlhiggs
Oct 16, 2025
Maintainer

Replies: 2 comments 2 replies

ismailsaadi
Oct 16, 2025
Maintainer

carlhiggs Oct 16, 2025
Maintainer Author

carlhiggs
Oct 21, 2025
Maintainer Author

carlhiggs Oct 21, 2025
Maintainer Author