Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors while parsing GTFS on tdata branch. Difference between branches? #203

Open
mr-tm opened this issue Feb 21, 2023 · 8 comments
Open

Comments

@mr-tm
Copy link

mr-tm commented Feb 21, 2023

Hi!
First of all thank you for creating this library, from the description it truly seems like it's one of a kind! Couldn't believe it's possible to compress all netherland transit info to just 17MB from 246MB GTFS zip file!

I'm having an error while parsing GTFS file(http://gtfs.ovapi.nl/gtfs-nl.zip) on tdata branches(tdata4, tdata-cherokee):

saving transfer stops (footpaths)
at position 9177772 in output [8.75 MB]
saving transfer times (footpaths)
at position 9263792 in output [8.83 MB]
Traceback (most recent call last):
File "gtfs2rrrr.py", line 154, in
main()
File "gtfs2rrrr.py", line 151, in main
exporter.timetable4.export(tdata)
File "C:\Projects\rrrr\rrrr-tdata4\rrtimetable\rrtimetable\exporter\timetable4.py", line 662, in export
export_transfers(tdata,index,out)
File "C:\Projects\rrrr\rrrr-tdata4\rrtimetable\rrtimetable\exporter\timetable4.py", line 328, in export_transfers
writeshort(out,(int(transfer_time) >> 2))
TypeError: int() argument must be a string or a number, not 'NoneType'

I also noticed the processing with gtfs2rrrr.py is much longer (until the error) than the master. Also gtfs2rrrr.py, before the error hit, took about 11GB of ram.

With master it was parsing just fine. (gtfsdb.py -> transfers.py -> timetable.py)

Which leads to the question - why are there so many different branches and which one would be the best for mobile deployment?

Thanks!

@skinkie
Copy link
Contributor

skinkie commented Feb 21, 2023

@mr-tm there are still many branches because we had a lot of experiments, including some that would change all data structures to check performance (column stores, different types of memory management). I would say try tdata-cherokee.

We have mostly generated timetable via a script that directly accesses a database (and already has timedemandtypes). I am not surprised that it eats so much memory on raw gtfs files, and to be fair: I don't think GTFS is the best input to achieve good routing, mainly some preprocessing on stops and their relations towards eachother is required (for examples the transfer times between stops within a station).

With respect to mobile deployment, ten years ago we did it with JNI ;) If you have more questions feel free to ask them.

@mr-tm
Copy link
Author

mr-tm commented May 10, 2023

@skinkie
thank you for your insight! Glad to hear it worked with JNI!
I managed to compile tdata-cherokee and test router through ./cli on MAC - so far, so good.
I was wondering if there is option to point how many itineraries I'd like to receive? For example, get 4 best itineraries.

Also, is it possible to get this library working with multiple timetable.dat files? For example, I have timetable.dat for city transports and timetable.dat for intercity buses and I'd like to use both files to create itinerary.

@skinkie
Copy link
Contributor

skinkie commented May 10, 2023

@skinkie thank you for your insight! Glad to hear it worked with JNI! I managed to compile tdata-cherokee and test router through ./cli on MAC - so far, so good. I was wondering if there is option to point how many itineraries I'd like to receive? For example, get 4 best itineraries.

The algorithm (RAPTOR) allow you to retreive the fastest times given N amount of transfers. Hence the only way to receive mulitple itineraries in one go is when there is actually a faster option available given more transfers.

I am currently working on rRAPTOR (range raptor) which allows at virtually no cost to get a range of itineraries in a certain time range. That will be available in this branch.

Also, is it possible to get this library working with multiple timetable.dat files? For example, I have timetable.dat for city transports and timetable.dat for intercity buses and I'd like to use both files to create itinerary.

No, that will not - by design - work. It is easier to merge the input (for example GTFS, see "one bus away transformer"). Theoretically you can merge timetable.dat too, but it would need some serious management.

@mr-tm
Copy link
Author

mr-tm commented May 19, 2023

@skinkie

I am currently working on rRAPTOR (range raptor) which allows at virtually no cost to get a range of itineraries in a certain time range. That will be available in this branch.

Can't wait! :D

I'm currently trying to get this working on Android, some times it works fine, but sometimes I'm having art_sigsegv_fault error somewhere in plan_render_otp method. I can't seem to understand why it's failing there. I tried debugging and it randomly fails somewhere in that function. (for the same parameters)

I tried studying cli.c file to mimick function calling, maybe have I missed something?

This is for opening timetable4.dat

OPResult openTimetable(string &path)
{
  LOGI("[openTimetable] Opening timetable: %s", path.c_str());
  /* initialise the structs so we can always trust NULL values */
  memset (&tdata,    0, sizeof(tdata_t));
  memset (&router,   0, sizeof(router_t));

  if (!tdata_load(&tdata, const_cast<char*>(path.c_str()))) {
    return OPResult{
      .type = Error,
      .errorMessage = "Could not load tdata!"
    };
  }

  if (! tdata_hashgrid_setup(&tdata)) {
    return OPResult{
      .type = Error,
      .errorMessage = "Could not setup hashgrid!"
    };
  }
  if (!router_setup(&router, &tdata)) {
    return OPResult{
      .type = Error,
      .errorMessage = "Could not setup router!"
    };
  }

  return OPResult{
    .type = Ok,
    .errorMessage = "Router initialized!"
  };
}

This is for getting plan (MMAP, but also DYNAMIC seems to fail too)

#define OUTPUT_LEN 32000
OPResult getPlan(const double fromLat, const double fromLong, const double toLat, const double toLong, time_t time, bool arriveBy, string &resultOtpJson){
  router_request_initialize(&req);
  plan_init(&plan);
  router_request_from_epoch(&req, &tdata, time);
  req.arrive_by = arriveBy;
  if (req.arrive_by) {
    req.time_cutoff = 0;
  } else {
    req.time_cutoff = UNREACHED;
  }
  //std::time_t ms = std::time(nullptr);
  req.from_latlon.lat = (float)fromLat;
  req.from_latlon.lon = (float)fromLong;

  req.to_latlon.lat = (float)toLat;
  req.to_latlon.lon = (float)toLong;

  req.intermediatestops = true;
  LOGI("[getPlan] Navigating from: (%f, %f) to (%f, %f). arriveBy(%s), time(%ld)", req.from_latlon.lat, req.from_latlon.lon, req.to_latlon.lat, req.to_latlon.lon, req.arrive_by ? "true" : "false", time);

  if (req.time_rounded && ! (req.arrive_by)) {
    req.time++;
  }
  req.time_rounded = false;

  if (!router_route_full_reversal(&router, &req, &plan)) {
    return OPResult{
      .type = Error,
      .errorMessage = "Could not navigate!"
    };
  }

  char result_buf[OUTPUT_LEN];
  plan.req = req;
  plan_render_otp(&plan, &tdata, result_buf, OUTPUT_LEN);
  resultOtpJson = string(result_buf);
  return OPResult{
     .type = Ok
  };
}

And this is for closing timetable

void closeTimetable(){
  LOGI("[closeTimetable] Closing timetable...");
    /* Deallocate the scratchspace of the router */
  router_teardown(&router);

  /* Deallocate the hashgrid coordinates */
  tdata_hashgrid_teardown(&tdata);

  /* Unmap the memory and/or deallocate the memory on the heap */
  tdata_close(&tdata);
}

image
The last crash happened here. stop_index value was 65535 which is out of bounds for stop_point_coords array so coords were some random values.
image

This is the request:

from_stop_area:  NONE [65535]
from_stop_point:  NONE [65535]
from_latlon:  56.988089,23.879145
to_stop_area:    NONE [65535]
to_stop_point:    NONE [65535]
to_latlon:  56.957226,23.627903
date:  2023-05-20
time:  21:32:28 [40987]
speed: 1.500000 m/sec
arrive-by: true
max xfers: 1
max time:  20:25:56
mode: 
transit

I also noticed that OTP json seems to be faulty from coord to coord request when it does not crash.
Eg.

		"from": {
			"name": "NONE",
			"stopId": {
				"agencyId": "NL",
				"id": "Z:218"
			},
			"stopCode": null,
			"platformCode": null,
			"lat": 0,
			"lon": 0,
			"wheelchairBoarding": null,
			"visualAccessible": null,
			"arrival": null,
			"departure": null
		},
		"to": {
			"name": "NONE",
			"stopId": {
				"agencyId": "NL",
				"id": "Z:218"
			},
			"stopCode": null,
			"platformCode": null,
			"lat": 0,
			"lon": 0,
			"wheelchairBoarding": null,
			"visualAccessible": null,
			"arrival": null,
			"departure": null
		},

Sometimes lat, lon are random values.

@mr-tm
Copy link
Author

mr-tm commented May 19, 2023

Full OTP json example.
full_otp.zip

@skinkie
Copy link
Contributor

skinkie commented May 19, 2023

I don't know if I am able to 'support' these kind of debugging requests, but I'll see what I can do.

@mr-tm
Copy link
Author

mr-tm commented May 19, 2023

@skinkie No problem. I think the issue here is that NONE stops are not correctly handled in plan_render_otp.c. I added quick and dirty fix to some methods so they don't access memory they should not(in case of stop - 'NONE') and it looks like the crash is not happening anymore. :)

@skinkie
Copy link
Contributor

skinkie commented May 19, 2023

@mr-tm I am accepting pull requests if you have some fixes ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants