Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Widgets created by GtkBuilder are sometimes unref'd/destroyed too soon #203

Open
finitemonkey opened this issue May 12, 2022 · 26 comments
Open

Comments

@finitemonkey
Copy link

Hi! Thanks for these excellent bindings, they are just what I needed and I have been having great fun using them!

When compiling with gc:arc or gc:orc and using widgets that are created by GtkBuilder, the underlying (impl) objects are freed too soon in some situations.

A minimal example;

import gintro/[gobject,gio,gtk4]

proc buildSomeUI():Box =
  let ui = """
    <interface>
      <object class="GtkBox" id="examplewidget">
        <property name="valign">center</property>
        <property name="halign">center</property>
        <property name="orientation">vertical</property>
        <child>
          <object class="GtkButton" id="examplebutton">
            <property name="label">Hello, I am a button! I like being a button.</property>
          </object>
        </child>
      </object>
    </interface>
  """
  let builder = newBuilderFromString(ui, ui.len)
  result = builder.getBox("examplewidget")

proc appActivate(app: Application) =
  let window = newApplicationWindow(app)
  window.show()
  window.setChild(buildSomeUI())

let app = newApplication("gintro.bug.example")
app.connect("activate", appActivate)
discard app.run()

Compiling the above with nim compile --gc:arc --run example.nim results in an empty window and the following complaints from GTK:

(example:839253): Gtk-CRITICAL **: 13:27:11.506: gtk_window_set_child: assertion 'child == NULL || GTK_IS_WIDGET (child)' failed

(example:839253): GLib-GObject-CRITICAL **: 13:27:11.506: g_object_set_qdata: assertion 'G_IS_OBJECT (object)' failed

Wrapping the sub-UI in a widget instantiated with newBox (or any other widget) before returning will fix the problem;

proc buildSomeUI():Box =
  ...
  let builder = newBuilderFromString(ui, ui.len)
  let wrapperBox = newBox(Orientation.vertical)
  wrapperBox.append(builder.getBox("examplewidget"))
  result = wrapperBox
...

Creating the whole sub-UI with newXXXX proc calls rather than using Builder also works, so something is obviously different between the way that Builder instantiates widgets and the way that the newXXXX procs instantiate widgets.

Looking at the code (and reading the GTK docs, which mention that gtk_builder_get_object does not increase the reference count of the object it returns), it appeared that perhaps a discard g_object_ref_sink(result.impl) in getBox(builder: Builder; name: string): Box (in gintro/gtk4.nim) might fix the problem for me, and indeed it does.

However, I'm not sure if this is the correct approach (hence this issue, rather than a PR with a fix!). Since Builder appears to be working properly in other situations, will adding an additional g_object_ref_sink in the newXXXX procs lead to memory leaks in the currently working situations because one too many references are held? Should it go somewhere else? Is a different approach required? I don't have a deep enough understanding of the bindings at this point to know (I'm working on it, though!).

@StefanSalewski
Copy link
Owner

Thanks for reporting. Indeed the use of GtkBuilder with gintro has not been tested that much. I will try to investigate this issue soon.

@StefanSalewski
Copy link
Owner

I can indeed reproduce your issue.
The good news is, that the example builder.nim with the external UI file builder.ui from https://ssalewski.de/gtkprogramming.html#_label seems to work fine with ARC. Will investigate your issue in the next few days.

@finitemonkey
Copy link
Author

Thanks, that would be great!

Yeah, I've generally found it all to work really well, with no major problems and a couple of pleasant surprises. Builder has broadly worked for me too, the only real problem with Builder so far being the above-mentioned issue of passing a Builder-built widget reference out of a disappearing scope can result in the underlying GTK object being collected.

@StefanSalewski
Copy link
Owner

OK, I have created a C example from your code, which seems to be a valid use of GtkBuilder but shows the same issue, see https://discourse.gnome.org/t/gtkbuilder-issue/9831

So my feeling is, that this is not a gintro issue that we can fix. Maybe it is a GtkBuilder issue, or your use is invalid. I hope someone of the GTK experts can comment on the issue. If you want to continue using code of this shape, you can avoid the issue, by calling GC_ref() on the builder object, that avoid that the Nim memory management system frees it. But generally, you should better only use code shapes which other GTK developers use, that may avoid problems. And for the case that you have not yet much experience with GTK at all, I would recommend you to use a different GUI toolkit. The fidget GUI seems to be popular currently.

@finitemonkey
Copy link
Author

Thanks for that - the discussion on the GNOME board is really interesting.

Just to note that the only reason for creating the main window that way in the example was for clarity - avoiding the distraction of multiple irrelevant XML fragments and builders. And the example only exists to isolate the issue with minimal extraneous noise (initially to aid me in finding the source of the problem), it all makes more sense in the context of the actual application. The same thing does happen when everything is Builder-generated too.

I did investigate other widget kits, including Fidget prior to choosing GTK. Part of the reason for choosing GTK was past experience with it and that GTK is the native toolkit in the environment I'm targeting. However, it was also, crucially, that the others appear to be missing important widgets or features that I wanted. I also wasn't 100% sold on the approach that Fidget takes (although I didn't get as far as actually trying it, due to the lack of required widgets, I only looked at their examples - so that assessment may be unfair). Perhaps it is a little bit of Stockholm syndrome, but so far I've been really happy with GTK4!

So it looks like the best temporary fix may be to keep the builder reference alive (at least until the widgets are used), either by returning it instead of the widget, or by creating one at startup and passing it in to the code that builds sub-sections of the UI.

I'd be interested to see if the underlying issue can be fixed though, because it still feels like it is something that should be possible, even if it might be a strange thing to do!

@StefanSalewski
Copy link
Owner

I'd be interested to see if the underlying issue can be fixed though, because it still feels like it is something that should be possible, even if it might be a strange thing to do!

Yes, fixing should be possible, but may involve larger changes, which then needs again carefully testing. Following Mr. Droege, we would need for all of our Nim proxy objects an additional field which is a back reference to objects that we have to keep alive. Builder functions like builder.getBox() would increase the ref count of the builder instance, and then the destructor or finalizer for GtkBox would decrease the ref count. Our current memory management works mostly fine, and has been tested by some users, we would have to be carefully that we make it not worse.

A simpler solution would be, that we just never free builder instances. That would avoid issues, but is ugly of course.

Please do not expect a fix for this summer. And note that the number of gintro users is tiny currently, very close to zero, and there is no hope that we can improve the situation. Low numbers of users means low number of testers, so we have to be very carefully with larger changes.

For me currently the Nim book has a higher priority, as that book has at least a few readers already, while gintro has zero active users. So I may try to finish the book at the end of this years -- I have to write the section about concepts still, maybe some more sections like debugging, FFI, JavaScript backend, GUI, games... And finally do a lot of proof reading. And then I would like to work again on the SDT tool and the topological PCB router -- unfortunately I had no spare time to work on all that since 2014 when I came to Nim.

For the Nim bindings, we initially intended to hire some GTK and Bindings experts, maybe from the Rust team, to completely rewrite the Nim bindings. I have offered to donate up to 1000 Euro to support that, so with some more supporters, it would be the hope to collect maybe 50k Euro to pay that devs. But with our actual user base, there is no hope for that project any more.

@finitemonkey
Copy link
Author

Thank you for your time on this, it is very much appreciated, I totally understand what you're saying (although I have a slightly different view about the low number of users).

I similarly have a lack of time, but I am interested in gaining a deeper understanding of Nim and this is both a useful learning exercise for me as well as being actually directly useful to me. So, I'll continue to investigate and see if I can make any progress - perhaps I can put together a useful PR for you in the future (I have a couple of other bug fixes in my localy-patched gintro, which I'd like to transform into proper PRs for you, when I can, too).

@StefanSalewski
Copy link
Owner

although I have a slightly different view about the low number of users

There is no evidence for any users at least. In the last five years, in which gintro is in a working stage, we may have a handful of users , maybe seven. One is the Mr AVR, one the one who created a game unlock tool, one is Mr. DemoHero, I think he was using libnice with Windows8.1. Then there was the author of the famous bacon/dingle paper, who asked for webkit support, but vanished soon. And maybe a few others. And when you look into the Gnome forum, well there are only the pure consumer types, but no one who is still creating new GTK software. I think we have to admit that the time of GTK is over, GTK4 has not really improved the situation.

When you are interested in learning Nim, I would not really recommend starting with gintro, other stuff may be more fun. The gintro generator script, called gen.nim is ugly and hard to understand. I put 1600 hours of work into it, which is more than initially intended. Gobject-introspection is really very hard and time-consuming. My feeling is, that Rust may have the best GTK support currently, so whenever someone has too much free time, maybe creating Nim bindings on the Rust ones may be a task. When I started gintro in 2015, the Rust project was not already available, so I started from scratch. No one told me that time how hard it would be. And no one told me, that it may be a better idea to parse the XML GIR files directly, instead of using gobject-introspection.

Sorry, have to do some other work now.

Best regards, Stefan Salewski

@finitemonkey
Copy link
Author

Ah, sorry, I wasn't too clear there - I didn't mean that I think there are really lots of hidden users - I meant that perhaps it doesn't matter too much if there are a low number of users. If a piece of software has only one user who finds it useful and maintains it, then sometimes that is enough (I've certainly written software in the past that only I used, because it filled some specific niche need for myself or for a company that I worked for at the time).

I also meant that sometimes it can be the case that software that has a low number of users will continue to have a low number of users if it is difficult to use or doesn't fulfill whatever needs a potential new user might have, so it's possible to then find yourself in a catch-22 - you feel that fixing isn't worthwhile because there are no users, but there are no users because it isn't fixed.

However, I don't mean that I generally disagree with your points, because I do agree and I totally understand what you're saying. I can see how the situation changes over time and perhaps these bindings are less relevant now. I personally still think that GTK is relevant - a lot of the other options seem to be at approx GTK1.2 level at the moment - promising, but lacking some of the hard-won (and labour-intensive to create) maturity of current GTK (e.g. having a full-featured table implementation, or having extensive customisation or extension abilities).

Also, as you suggest, perhaps a Rust inspired (or otherwise differently approached) re-write may be the path to take. I'm currently working on a Poppler binding, but I had been doing that by hand (with assistance from c2nim) and so it's a slightly different thing (it's also a simpler library). However it too has led me to question the approach I'm taking with it a number of times (and I've only just started doing it - getting it to render a single page in a GTK window, but nothing else at the moment). Like you, though, I probably don't have time fully understand and to completely re-write the gintro bindings, so a few fixes here and there may have to do for now!

Anyway, thank you for the time and effort you've put into these bindings, even if they are eventually replaced with something else, they are useful right here, right now and that is very valuable. Also, your discussions over this bug have been very valuable in terms of informing the future paths I explore. Thank you!

@StefanSalewski
Copy link
Owner

Sorry, I feel a bit stupid now. See my comment at https://discourse.gnome.org/t/gtkbuilder-issue/9831/7

So it is not too hard to fix. But I am not sure if I can do it this week, I have some problems with my hardware. My main box did some random reboots in the last weeks, so I tried to clean the internal fan and heat spreader. Box is a tiny Intel NUC, and after cleaning, the fan makes more noise than before, and box still reboots. So I ordered a replacement notebook, which finally arrived on friday, after two weeks. But its ENTER key works not reliable, so I have to send it back tomorrow. Really a lot of trouble currently.

@StefanSalewski
Copy link
Owner

Actually, the fully automatic generated getObject() proc contains the needed g_object_ref(), while the builder procs with well defined return data types like getBox() does not. I think that the procs like getBox() have been created years ago, and have never been updated. So fixing that should be indeed not that difficult, I may try that this evening.

salewski@nuc ~/gintrotest/tests $ grep -A22 "proc getObject\*(self: Builder;" ~/.nimble/pkgs/gintro-#head/gintro/gtk4.nim 
proc getObject*(self: Builder; name: cstring): gobject.Object =
  let gobj = gtk_builder_get_object(cast[ptr Builder00](self.impl), name)
  if gobj.isNil:
    return nil
  let qdata = g_object_get_qdata(gobj, Quark)
  if qdata != nil:
    result = cast[type(result)](qdata)
    assert(result.impl == gobj)
  else:
    fnew(result, gobject.finalizeGObject)
    result.impl = gobj
    GC_ref(result)
    discard g_object_ref_sink(result.impl)
    g_object_add_toggle_ref(result.impl, toggleNotify, addr(result[]))
    g_object_unref(result.impl)
    assert(g_object_get_qdata(result.impl, Quark) == nil)
    g_object_set_qdata(result.impl, Quark, addr(result[]))

salewski@nuc ~/gintrotest/tests $ grep -A22 "getBox" ~/.nimble/pkgs/gintro-#head/gintro/gtk4.nim 
proc getBox*(builder: Builder; name: string): Box =
  let gt = gtk_box_get_type()
  assert(gt != g_type_invalid_get_type())
  let gobj = gtk_builder_get_object(cast[ptr Builder00](builder.impl), name)
  assert(gobj != nil)
  if g_object_get_qdata(gobj, Quark) != nil:
    result = cast[type(result)](g_object_get_qdata(gobj, Quark))
    assert(result.impl == gobj)
  else:
    fnew(result, gtk4.finalizeGObject)
    result.impl = gobj
    result.ignoreFinalizer = true
    GC_ref(result) # as following g_object_unref() call will also  execute a GC_unref(result) call
    g_object_add_toggle_ref(result.impl, toggleNotify, addr(result[])) # do toggle_refs make sense at all for builder objects?
    g_object_unref(result.impl) # new for v0.8.4 and GTK4, make close main windows working
    assert(g_object_get_qdata(result.impl, Quark) == nil)
    g_object_set_qdata(result.impl, Quark, addr(result[]))
  assert(toBool(g_type_check_instance_is_a(cast[ptr TypeInstance00](result.impl), gt)))

@finitemonkey
Copy link
Author

That is great news! Happy to help if there is anything I can do (e.g. testing).

@StefanSalewski
Copy link
Owner

I have just pushed the tiny fix to GitHub. You can test with

nimble uninstall gintro
nimble install gintro@#head

Your initial example seems to work now, and the example builder.nim from https://ssalewski.de/gtkprogramming.html#_label as well. Unfortunately I have not more tests available currently. I had some fear that the provided fix would cause problem for refc GC, as with that one the destroying of objects is delayed, which can result in strange results, like main windows which do not close when we terminate the program. We had such issues some years ago, but only for GTK4, maybe it was an early GTK4 issue. But not it seems to work -- I have not tested for GTK3.

This is the actual fix:

$ diff ~/gintrotest/tests/gen.nim gen.nim 
3c3
< # v 0.9.9 2022-MAY-17
---
> # v 0.9.8 2022-APR-01
4279,4282c4279,4281
<     result.impl = gobj # not floating, with ref count == 1
<     #result.ignoreFinalizer = true
<     #discard g_object_ref(result.impl) # v0.99, try to fix issue
<     GC_ref(result) # when builder instance goes out and is destroyed, it unrefs its childs...
---
>     result.impl = gobj
>     result.ignoreFinalizer = true
>     GC_ref(result) # as following g_object_unref() call will also  execute a GC_unref(result) call
4284c4283
<     #g_object_unref(result.impl) # new for v0.8.4 and GTK4, make close main windows working
---
>     g_object_unref(result.impl) # new for v0.8.4 and GTK4, make close main windows working

Seems to make some sense.

@finitemonkey
Copy link
Author

This is great, thank you! But in that patch, the g_object_ref line is commented out?

@StefanSalewski
Copy link
Owner

Yes, that is correct. Took me a few minutes to make it working, but more than an hour to remember and understand why it may work :-) Full proc is

proc get$1*(builder: Builder; name: string): $1 =
  let gt = $2
  assert(gt != g_type_invalid_get_type())
  let gobj = gtk_builder_get_object(cast[ptr Builder00](builder.impl), name)
  assert(gobj != nil)
  if g_object_get_qdata(gobj, Quark) != nil:
    result = cast[type(result)](g_object_get_qdata(gobj, Quark))
    assert(result.impl == gobj)
  else:
    fnew(result, $3.finalizeGObject)
    result.impl = gobj # not floating, with ref count == 1
    #result.ignoreFinalizer = true
    #discard g_object_ref(result.impl) # v0.99, try to fix issue
    GC_ref(result) # when builder instance goes out and is destroyed, it unrefs its childs...
    g_object_add_toggle_ref(result.impl, toggleNotify, addr(result[])) # do toggle_refs make sense at all for builder objects?
    #g_object_unref(result.impl) # new for v0.8.4 and GTK4, make close main windows working
    assert(g_object_get_qdata(result.impl, Quark) == nil)
    g_object_set_qdata(result.impl, Quark, addr(result[]))
  assert(toBool(g_type_check_instance_is_a(cast[ptr TypeInstance00](result.impl), gt)))

""" % [i, prefix & "_" & myCamelToSnake(i) & "_get_type()", modname])

Before, result.ignoreFinalizer = true was active, and g_object_unref(result.impl) as well. Ref and unref cancel each other out. I was going to remove the GC_ref(result), but it became clear that this is important. Unfortunately I am not sure about the old comment "# new for v0.8.4 and GTK4, make close main windows working". That is why I was a bit worried, but my feeling is that it works now. You can test yourself, and let me know when there are still issues. The whole stuff is really complicated unfortunately.

@StefanSalewski
Copy link
Owner

Well, the actual increase of the ref count by one is done by g_object_add_toggle_ref().

@StefanSalewski
Copy link
Owner

And what I forget to mention: When you compile with --gc:arc you can specify -d:gintroDebug, and you get messages when objects are destroyed by destructors.

@finitemonkey
Copy link
Author

I'd like to say, "ah, I see!", but I don't quite yet! :-) Thanks for the explanation though, I'll take some time tomorrow to look at the code while reading it again and try to understand it all.

@StefanSalewski
Copy link
Owner

I just did one more test, and I am not fully happy with the result:

import gintro/[gobject,gio,gtk4]

proc buildSomeUI():Box =
  let ui = """
    <interface>
      <object class="GtkBox" id="examplewidget">
        <property name="valign">center</property>
        <property name="halign">center</property>
        <property name="orientation">vertical</property>
        <child>
          <object class="GtkButton" id="examplebutton">
            <property name="label">Hello, I am a button! I like being a button.</property>
          </object>
        </child>
      </object>
    </interface>
  """
  let builder = newBuilderFromString(ui, ui.len)
  let xxx = builder.getBox("examplewidget")
  let yyy = builder.getBox("examplewidget")
  echo "!!!", cast[int](xxx.addr), " ", cast[int](yyy.addr)
  echo ":::", cast[int](xxx.impl.addr), " ", cast[int](yyy.impl.addr)
  
  result = builder.getBox("examplewidget")

proc appActivate(app: Application) =
  let window = newApplicationWindow(app)
  window.show()
  window.setChild(buildSomeUI())
  let hhh = buildSomeUI()
  let hhh1 = buildSomeUI()


let app = newApplication("gintro.bug.example")
app.connect("activate", appActivate)
discard app.run()
$ nim c --gc:arc -d:gintroDebug t.nim
$ ./t
!!!140728991379968 140728991379976
:::140459962405072 140459962405072
destroy Builder:ObjectType 140459962405032
!!!140728991379968 140728991379976
:::140459962405104 140459962405104
destroy Builder:ObjectType 140459962405032
!!!140728991379968 140728991379976
:::140459962405136 140459962405136
destroy Builder:ObjectType 140459962405032
destroy Box:ObjectType 140459962405128
destroy Box:ObjectType 140459962405096
destroy Box:ObjectType 140459962405064
destroy Application:ObjectType 140459962404936

In the second line the impl addresses are the same, so I would think that addresses of proxy objects in the first line should be the same too? Should be not a too serious issue for you, but I think we should try to investigate that. I have to do some other work now.

@finitemonkey
Copy link
Author

I have to do some other work now

No worries - I'll try to have a look and see if I can make some progress. I know what it's like when everything just keeps sucking time, especially when you have hardware problems that have to be resolved first - software isn't easy with uncooperative hardware.

In the second line the impl addresses are the same, so I would think that addresses of proxy objects in the first line should be the same too?

Would some sort of memoization of the proc work? Then a proxy object is only created the first time, the already existing one is returned in subsequent calls.

The assumption being that the same parameters always refer to the same widget, but is it possible to load an additional builder ui fragment into an already existing GtkBuilder, overloading/changing already existing objects in it? My gut feeling is that no, that won't be possible (e.g. because I already know that GtkBuilder refuses to parse your ui fragment if you have duplicate ids, even if they are empty), however I haven't actually tested that particular situation.

Should be not a too serious issue for you

That's correct - in my application, I will never request the same object twice from the Builder, but I can imagine somebody else wanting to do that then finding the underlying object goes missing from their second reference as soon as the first goes out of scope.

@finitemonkey
Copy link
Author

Actually... thinking about it, memoization would mean the the underlying widget is never freed, wouldn't it? Because the memoization machinery (however implemented) would need to hold a reference to it in case anybody ever called the function again.

@finitemonkey
Copy link
Author

In fact, does it matter that the references are different? Provided the ref counts of the underlying objects are correctly incremented and decremented when they are created and destroyed, shouldn't everything work correctly in every situation?

@StefanSalewski
Copy link
Owner

Sorry, I was not clear enough. The g_object_get_qdata() with g_object_set_qdata() should do this, that is give us the Nim Proxy object back from the GTK instance. Was coded and tested in 2017, but never tested again, and maybe never tested with GtkBuilder. Currently I can not see a reason why GtkBuilder is special, I think testing was done by adding a Button to a container widget, and then calling getChild() to get it again. Was working that time. I will investigate in the next days.

@StefanSalewski
Copy link
Owner

At least it seems to be not fully broken:

##  https://gitlab.gnome.org/GNOME/gtk/-/blob/master/examples/hello-world.c

import gintro/[gtk4, gobject, gio]

proc destroyWindow(b: Button; w: gtk4.ApplicationWindow) =
  gtk4.destroy(w)

proc printHello(widget: Button) =
  echo("Hello World")

proc activate(app: gtk4.Application) =
  let window = newApplicationWindow(app)
  window.title = "Window"
  window.defaultSize = (20, 20)
  let box = newBox(Orientation.horizontal, 0)
  echo "xxx", cast[int](box), " ", cast[int](box.impl)
  window.setChild(box)
  let child = window.getChild
  echo "yyy", cast[int](child), " ", cast[int](child.impl)
  let button = newButton("Hello World")
  button.connect("clicked", printHello)
  button.connect("clicked", destroyWindow, window)
  box.append(button)
  window.show

proc main =
  let app = newApplication("org.gtk.example", {})
  app.connect("activate", activate)
  let status = app.run
  quit(status)

main()
$ nim c --gc:arc -d:gintroDebug t.nim

xxx139690396237992 93887417815504
yyy139690396237992 93887417815504
destroy Box:ObjectType 139690396237992
destroy Button:ObjectType 139690396238152

@StefanSalewski
Copy link
Owner

Well that was easy, stupid error this morning in my testing code. Box instance as well as impl are already references or pointers, so of course we do not have to apply addr(). Correct test code is

  echo "!!!", cast[int](xxx), " ", cast[int](yyy)
  echo ":::", cast[int](xxx.impl), " ", cast[int](yyy.impl)

And with that all is fine. But good that we tested this point again.

@finitemonkey
Copy link
Author

Thank you for all your work on this, I've tested this fix and it is working for me.

I have also pulled the git repo and will attempt to apply my other fixes to the appropriate place there (at the moment they are just patches to the generated files) and feed back when I have done that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants