You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This basically boils down to iterator invalidation. If we have two references to a vector,
a GcVec::push on one may mutate the vector's length while the user coerced the other reference to a slice (or iterator).
To put it another way, it's impossible for a garbage collected vector/list to implement all of the following properties:
Coerce the vector into a slice that directly points to the underlying memory
Allow multiple references to the vector (implement Copy)
The ability to mutate the length and potentially re-allocate
Keep capacity immutable, and have the area [len, capacity) uninitialized (as opposed to initialized with null)
This is a problem common to all garbage-collected languages, not just zerogc (as I describe below).
What Java/Python do
Most languages sidestep the problem by forbidding option 1 and refusing to give direct pointer-access to garbage-collected arrays.
Usually, forbidding option 1 is not a problem, since arrays in Java and lists in Python are builtin into the language and replace most of the users of pointers.
There are some exceptions though. Java's JNI allows raw access to an arrays's memory with GetPrimitiveArrayCritical. However, there is no need to worry about mutating length because java arrays have a fixed size (although it does temporarily prevent garbage collection). Python's native CAPI exposes access to a list's raw memory with PyListObject->ob_items, although native code must be careful of any user code changing .
This problem can also come up in unexpected places. For example, the C implementation of Python's list.sort coerces the list into a raw pointer, then sorts the memory directly using the pointers. The original code continued to use the original pointer even if a user comparison, leading to bugs. The current implementation temporarily sets the length to zero and carefully guards against any mutations.
What to do
If we took the Java/Python approach of refusing to give out raw pointers, it would also prevent access as a slice.
In idiomatic rust, this is unrealistic, since slices are so pervasive.
As a temporary solution, I ended up forbidding option 2.
However, doesn't work well if you're trying to use GcVec as the underlying implementation of Java arrays or Python lists. These languages allow users to create multiple references to an array or a list. In our current system, this would require double-indirection of Gc<GcVec<T>>.
SIDENOTE: Right now GcVec
Instead, I propose a new system of four types and two traits (which will incidentally solve the write-barrier problem):
GcRawVec - Allows multiple references, but has unsafe slice or pointer access
This will even allow mutable slices, as long as the user correctly triggers write barriers.
GcVec - Effectively a RefCell<GcRawVec<T>>. Includes safe slice access.
GcVecBuffer - A GcRawVec that is !Copy, but gains safe slice access.
trait GcVecAccess - Abstracts over different forms of GcVec. Calls to push require a GcContext. Requires the user to pass a context
trait GcArrayAccess - Allows read/write access to arrays, without mutating their length.
Implemented for GcArray in addition to GcVec
TODO: Better naming for GcVec and friends?
The text was updated successfully, but these errors were encountered:
Merges #29Closes#16
Right now, access to arrays doesn't trigger write barriers. I need to implement that before I can work on generational
or incremental collection......
The vector API is a little rough around the edges
because `GcVec` doesn't implement `Copy` as all the other `Gc` types do.
See #30 for more details
This basically boils down to iterator invalidation. If we have two references to a vector,
a
GcVec::push
on one may mutate the vector's length while the user coerced the other reference to a slice (or iterator).To put it another way, it's impossible for a garbage collected vector/list to implement all of the following properties:
Copy
)[len, capacity)
uninitialized (as opposed to initialized withnull
)This is a problem common to all garbage-collected languages, not just zerogc (as I describe below).
What Java/Python do
Most languages sidestep the problem by forbidding option 1 and refusing to give direct pointer-access to garbage-collected arrays.
Usually, forbidding option 1 is not a problem, since arrays in Java and lists in Python are builtin into the language and replace most of the users of pointers.
There are some exceptions though. Java's JNI allows raw access to an arrays's memory with
GetPrimitiveArrayCritical
. However, there is no need to worry about mutating length because java arrays have a fixed size (although it does temporarily prevent garbage collection). Python's native CAPI exposes access to a list's raw memory withPyListObject->ob_items
, although native code must be careful of any user code changing .This problem can also come up in unexpected places. For example, the C implementation of Python's
list.sort
coerces the list into a raw pointer, then sorts the memory directly using the pointers. The original code continued to use the original pointer even if a user comparison, leading to bugs. The current implementation temporarily sets the length to zero and carefully guards against any mutations.What to do
If we took the Java/Python approach of refusing to give out raw pointers, it would also prevent access as a slice.
In idiomatic rust, this is unrealistic, since slices are so pervasive.
As a temporary solution, I ended up forbidding option 2.
However, doesn't work well if you're trying to use
GcVec
as the underlying implementation of Java arrays or Python lists. These languages allow users to create multiple references to an array or a list. In our current system, this would require double-indirection ofGc<GcVec<T>>
.SIDENOTE: Right now
GcVec
Instead, I propose a new system of four types and two traits (which will incidentally solve the write-barrier problem):
GcRawVec
- Allows multiple references, but has unsafe slice or pointer accessGcVec
- Effectively aRefCell<GcRawVec<T>>
. Includes safe slice access.GcVecBuffer
- AGcRawVec
that is!Copy
, but gains safe slice access.trait GcVecAccess
- Abstracts over different forms ofGcVec
. Calls topush
require aGcContext.
Requires the user to pass a contexttrait GcArrayAccess
- Allows read/write access to arrays, without mutating their length.GcArray
in addition toGcVec
TODO: Better naming for
GcVec
and friends?The text was updated successfully, but these errors were encountered: