Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to mark double-ended iterator #5

Open
hax opened this issue Sep 18, 2020 · 25 comments
Open

How to mark double-ended iterator #5

hax opened this issue Sep 18, 2020 · 25 comments

Comments

@hax
Copy link
Member

hax commented Sep 18, 2020

We need to mark whether an iterator is a double-ended iterator. So let [..., last] = singleEndedIterator could throw (or fallback to normal iteration --- this should be a separate issue).

As current README, Symbol.deIterator is introduced (follow Symbol.asyncIterator). For built-in iterators/generators, we could just spec them, but userland generators by default do not have it. Programmers need add it manually:

function *g() { ... }
g.prototype[Symbol.deIterator] = g.prototype[Symbol.iterator]

But it shows that Symbol.deIterator may be unnecessary because in almost every cases a double-ended iterator should have same value for [Symbol.deIterator] and [Symbol.iterator]. (A double-ended iterator always can be used as single-ended iterator.)

Another way is let g have DoubleEndedIteratorPrototype in its prototype chain. But this also require DoubleEndedGenerateFunction which seems not a good idea.

So it seems the simplest solution is just a boolean property on iterator.

g.prototype.back = true

In either cases, such boilerplate code could be solved by decorator in the future:

@Iterator.doubleEnded function *g() { ... }

Or we could introduce a new syntax for it:

function *g*() { ... } // stars in double-ended :-)
@Jack-Works
Copy link
Member

If so, how do we mark a higher-order iterator?

@hax
Copy link
Member Author

hax commented Sep 18, 2020

Higher-order iterator need to set its back property accordingly. For example, Iterator.prototype.map() should use the this.back as the back value of returned iterator.

It seems we can't achieve that using generator function directly, we need to wrap it.

Object.assign(IteratorProto, {
	map(mapper) {
		const result = this._map(mapper)
		result.back = this.back
		return result
	},
	*_map(mapper) {
		for (;;) {
			const {done, value} = this.next(function.sent)
			if (done) return
			yield mapper(value)
		}
	},
})

Again, decorator may help us.

@bakkot
Copy link

bakkot commented Aug 31, 2021

An alternative: rather than overloading .next('back'), just use .previous. You can then test if a given iterator is double-ended by checking if it has a method named .previous.

This makes it a little harder to use with generators, but I don't think it really makes sense to try to make generators double-ended: the values yielded by generators follow the control flow of the body of the function, and you can't usually move to an earlier point in the control flow in JavaScript.

@conartist6
Copy link

conartist6 commented Aug 31, 2021

My problem with that as a library maintainer is that if I want to write a map implementation over iterators that supports everything the language supports, I need to provide a next() and a previous() method, at which point it will no longer be clear reflectively whether the iterator (particularly its source) really supports double ended iteration.

In addition, I'd have to create an iterator to have any idea if double ended iteration is supported, and creating an iterator is allowed to have side effects, making it difficult to impossible to write good defensive code around this feature.

@conartist6
Copy link

Disposable resources have even more problems than that too. As mentioned above creating an iterator is allowed to create a disposable resource, which can then be disposed in one of two ways, either abruptly when iterator.return() is called (which should be fine), OR the iterator should be responsible for disposing any underlying resources when iteration completes normally, i.e. by returning { done: true }. I foresee problems with normal completion, because the result of calling [Symbol.iterator] is really two iterators, so then the only way normal completion can happen is if both iterators are done, something which is unlikely in the extreme.

@conartist6
Copy link

conartist6 commented Sep 1, 2021

Also as regards the idea of quietly failing over requests for reverse iteration with forwards iteration, please oh please no. The longer I do this the less I know why anyone would think the following behavior is desirable in a program: "What you asked for wasn't possible, so I quietly did something else. Oh and by the way you won't even be able to tell if I did what you asked for or not!". As a programmer and as a user, no!! If I asked to do something that can't be done, the correct response is an error.

@hax
Copy link
Member Author

hax commented Jan 22, 2022

and you can't usually move to an earlier point in the control flow in JavaScript.

@bakkot You may misunderstand what double-ended mean. It doesn't mean move to early point (that is bidirectional). Double-ended means u can consume the sequence in both end, it's more close to the concept of deque (double-ended queue), deiter could be seen as a deque only support removing items, next() from the front-end, next('back') from the back-end.

So in many cases (for example, Array), it's as easy as normal generator to write a double-ended generator.

@hax
Copy link
Member Author

hax commented Jan 22, 2022

@conartist6 First, pls notice my previous comment, deiter is not bidirectional, it will never be "calling [Symbol.iterator] is really two iterators", and with current design, u could just use generator (with function.sent proposal which allow u to get the param from first next(param) call) to write a deiter, so no "need to provide a next() and a previous() method". The disposing logic is also same as normal iterator. It doesn't have the complexity like u worry about 😅 .

Also as regards the idea of quietly failing over requests for reverse iteration with forwards iteration, please oh please no.

Yes I agree this is so bad, this is why we have this issue: "How to mark double-ended iterator".

@bakkot
Copy link

bakkot commented Jan 22, 2022

@hax the problem is that the way that generators represent sequences is with control flow. So you write yield 1; yield 2; yield 3; to represent the sequence "1, 2, 3". If you want to be able to consume the 3 before consuming the 1, you'd need to be able to evaluate the yield 3 before you evaluate the yield 1. That's not how control flow normally works.

So unless I'm missing something, it still seems to me that generators are not a natural fit for double-ended iterators.

@hax
Copy link
Member Author

hax commented Jan 23, 2022

@bakkot Yeah, the point is the control flow . The control flow work in particular way to represent the logic of how to generate the next value. So yes, if the iterator/generator is intend to be double-ended, the developer of coz need to deal with the parameter which represent which end (front-end or back-end) will be needed for the next value and yield the next value according to that. And next(arg) already have the arg which generator could read, just like a generator could generate the value according to the arguments and receiver (thisArgument) of the generator functions or even other captured variables in the context. So there is nothing new in the mechanism of generator.

@hax
Copy link
Member Author

hax commented Jan 23, 2022

@bakkot
For example, this is the current logic of Array.prototype.values if write in generator:

Array.prototype.values = function *values() {
  let next = 0
  while (next < this.length) {
    yield this[next++]
  }
}

This is a double-ended version

Array.prototype.values = function *values() {
  let next = 0, nextBack = this.length
  while (next < nextBack) {
    if (function.sent === 'back') yield this[--nextBack]
    else yield this[next++]
  }
}

@hax
Copy link
Member Author

hax commented Jan 23, 2022

function *give123() { yield 1; yield 2; yield 3; } obviously not intend to be double-ended, but it's easy to upgrade many generators to double-ended because many abstraction behind a generator are double-ended.

// this is a special case which already double-ended
function *repeat(x) {
  for(;;) yield x
}
// not double-ended
function *range(start, end) {
  for (let v = start; v < end; ++v) {
    yield v
  }
}

// double-ended version
function *range(start, end) {
  while (start < end) {
    if (function.sent === 'back') yield --end
    else yield start++
  }
}
// string code points iterator
function *codepoints(s) {
  for (let i = 0; i < s.length; ++i) {
    const v = s.codePointAt(i)
    yield v
    if (v > 0xffff) ++i
  }
}

// double-ended version of string code points iterator
// a little bit tricky, but not so hard (not tested, hope I write it correctly :-)
function *codepoints(s) {
  let start = 0, end = s.length
  while (start < end) {
    if (function.sent === 'back') {
      let v = s.codePointAt(--end)
      if (v >= 0xdc00 && v < 0xe000) v = s.codePointAt(--end)
      yield v
    } else {
      const v = s.codePointAt(start)
      yield v
      start += v > 0xffff ? 2 : 1
    }
  }
}

@conartist6
Copy link

As a developer why would I want to have to to write the body of my codepoints generator twice? Again I'd just want to write string.reversed().codepoints().

My experience with code suggests that asking developers to write everything twice to prepare for a use case they don't have yet will lead to a lot of very poor quality code.

@hax
Copy link
Member Author

hax commented Jan 23, 2022

@conartist6

why would I want to have to to write the body of my codepoints generator twice? Again I'd just want to write string.reversed().codepoints().

Actually with deiter, u don't write it twice, u only write codepoints() once and it works for both normal iteration and reverse iteration and also work for [firstCodePoint, ..., lastCodePoint] = s.codepoints().

On the other side, x.reversed() requires u write two generators for x , and still doesn't work for [first, ..., last] = x.

Also, string.reversed().codepoints() is wrong unless string.reversed() give u a new reversed string [1]. You can't have codepoints() method on an iterator.

[1] It's very unlikely JS will have such api, because generally speaking reverse a string doesn't make sense.

@conartist6
Copy link

What would a map implementation look like?

A current forward iterator can use this implementation:

function* map(source, mapper) {
  for (const value of source) yield mapper(value);
}

How would you express a map generator over a deiter?

@conartist6
Copy link

You could say that map just always works on forwards iterators, but then the map implementation would need to be:

function* map(source, mapper) {
  for (const value of source) {
    if (function.sent === 'back') throw new IterationError('unsupported');
    yield mapper(value);
  }
}

All the existing copies of the map function implemented in the naive way would cause silent failures.

@hax
Copy link
Member Author

hax commented Jan 23, 2022

What would a map implementation look like?

You don't need to throw error, because it just work (if keep passing param semantic). See #1 (comment) .

All the existing copies of the map function implemented in the naive way would cause silent failures.

This is why this issue "How to mark double-ended iterator" is open, we need a both backward and forward compatible way to make it work. It's hard but possible.

@hax
Copy link
Member Author

hax commented Jan 24, 2022

As the table of #1 (comment) , it's clear that we need to mark the "type" of a iterators/generators:

  • double-ended (can deal with both next() and next('back'))
  • front-only (all current iterators/generators, do not recognize the param)
  • back-only (only accept next('back'))
  • theoretically could have a none-ended which can't invoke next() anymore, can only invoke return().

Solution 1: Introducing new protocol ("interface" as current spec) and well-known symbols

Currently, we have Iterable protocol which have @@iterator method,
also AsyncIterable protocol which have @@asyncIterator method.

As this solution, we add DoubleEndedIterable protocol which require @@doubleEndedIterator method
and ReverseIterable protocol with @@reverseIterator method (as reverse iterator proposal),
alsoAsyncDoubleEndedIterable protocol with @@asyncDoubleEndedIterator
and AsyncReverseIterable protocol with @@asyncReverseIterator.

Note, ReverseIterable protocol doesn't extends Iterable protocol, they are two distinct protocol. And DoubleEndedIterable protocol extends both protocols.

It would be better to have first-class protocol which could provide default methods based on other methods.

protocol Iterable {
  [Symbol.iterator]
}
protocol ReverseIterable {
  [Symbol.reverseIterator]
}
protocol DoubleEndedIterable extends Iterable, ReverseIterable {
  [Symbol.doubleEndedIterable]
  get [Symbol.iterator]() { return this[Symbol.doubleEndedIterable] }
  [Symbol.reverseIterator]() { return this[Symbol.doubleEndedIterable]().reversed() }
}

class UserlandDeIterable implements DoubleEndedIterable {
  [Symbol.doubleEndedIterator]() {
    return {
      __proto__: %DoubleEndedIteratorPrototype%,
     next(dir) { ... }
    }
  }
  // no need to implement [Symbol.iterator] and [Symbol.reverseIterator] methods, 
  // use default methods from DoubleEndedIterable protocol
}
// current %IteratorPrototype%, for reference
%IteratorPrototype% = {
  [@@iterator]() { return this },
  map() {},
  filter() {}, 
  ...
}

%DoubleEndedIteratorPrototype% = {
  __proto__: %IteratorPrototype%
  [@@doubleEndedIterator]() { return this }
  [@@reverseIterator]() { return this.reversed() }
  reversed() {},
  takeLast(n) {}, 
  ...
}

%ArrayIteratorPrototype and all double-ended iterable butlins need to be updated to use %DoubleEndedIteratorPrototype% as the value of their [[Prototype]] slot, which will be a tiny breaking change.

There are two possible algorithms for destructing.

A.

let [x, ...] = a  // $it = a[@@iterator](); $it.next(); $it.return()
let [..., y] = a  // $it = a[@@reverseIterator](); $it.next(); $it.return()
let [x, ..., y] = a  // $it = a[@@doubleEndedIterator](); $it.next(); $it.next('back'); $it.return()

or, it could
B.

let [x, ...] = a  // $it = (a[@@doubleEndedIterator] ?? a[@@iterator])()
let [..., y] = a  // if (a[@@doubleEndedIterator]) { ... } else { $it = a[@@reverseIterator](); $it.next(); $it.return() }
let [x, ..., y] = a  // $it = a[@@doubleEndedIterator]()

Prefer B (@@doubleEndedIterator take precedence) because it allow people could just impl [@@doubleEndedIterator] without first-class protocol proposal.

Iterator helpers need to be updated to:

  1. support fetching arg of the first next(arg) invoke (as function.sent proposal)
  2. the impl of helpers need to be upgraded according to the table. It need change %IteratorHelperPrototype% to 3 (or 4) prototype object for deiter, backOnly iter and frontOnly iter (or dead iter).

The left question is how generator express it's a doubleEndedIterator, seems double star syntax could be sufficient .


Solution 2 (to be continued...)

@conartist6
Copy link

I don't think noneEndedIterator sounds like a good idea. How were you thinking that might happen?

You'd also need doubleEndedAsyncIterator and reverseAsyncIterator, for a total of 6 iteration protocols.

The thing I worry about is if there's ever any other way the iteration protocol needs to be changed (and personally I have another change in mind already) then you'd end up with 9 protocols, and then some unforeseen change might make it 18, and eventually you have a real mess.

@conartist6
Copy link

The additional protocols I am imagining all stem from the idea of reducing overhead where it is not needed in order to enable processing iterables of characters. Characters are obviously very small and documents made of characters may be very large, so loops which work with characters must be efficient. Yet for..await does two awaits (one of the most expensive operations in the language) even for data available synchronously (e.g. because character data is loaded in chunks, packets, etc to avoid overhead). Another pain point is that all the protocols allocate step ({done, value}) objects despite the fact that this is also unneeded overhead unless you ever plan to return {done: true, value: 'somevalue'}, which very few people ever do.

@hax
Copy link
Member Author

hax commented Jan 27, 2022

@conartist6

I don't think noneEndedIterator sounds like a good idea. How were you thinking that might happen?

No it's not a real exist noneEndedIterator, we don't need that, I only temporarily use none-ended to denote the dead state of a iterator. It's not an idea, but a consequence of some high-order operations.

For example:

let x = [backOnly, frontOnly]
let iter = x.values().flatMap(x => x)
// neither next() nor next('back') could work, so any next() invoke just throw.

The thing I worry about is if there's ever any other way the iteration protocol needs to be changed (and personally I have another change in mind already) then you'd end up with 9 protocols, and then some unforeseen change might make it 18, and eventually you have a real mess.

I can't say that there will be no new iteration protocol (but as I understand, not likely to happen, if double-ended iteration won't advance, I can't imagine any other could meet). But, I am sure syncAndAsyncItertor have no chance to be accepted by the committee. Mixing of sync and async is forbidden from the first day of introducing promise, or even node.js callback convention deny that and name it as Zalgo.

@hax
Copy link
Member Author

hax commented Jan 27, 2022

The additional protocols I am imagining all stem from the idea of reducing overhead where it is not needed in order to enable processing iterables of characters. Characters are obviously very small and documents made of characters may be very large, so loops which work with characters must be efficient.

In my opinion one specific use case can not prove that it need to be added into language. Unless such use case is very strong.

For your specific case, iteration of huge of characters, IMO can't be solved by adding any new iteration protocol.

You should first figure out what task you want to do for long text. Because text is complex, many tasks need high-level api like grapheme cluster, word, sentence segment... and it's by nature heavy task and iteration itself do not cost much.

On the other hand, if u just want to do byte-level or unicode code point level operations, the essential issue of processing string in JS, is the mistake in the early days that JS use UTF-16 encoding. So what u need may be operate buffer or buffer-based CodePointIterator.

Another pain point is that all the protocols allocate step ({done, value}) objects despite the fact that this is also unneeded overhead unless you ever plan to return {done: true, value: 'somevalue'}, which very few people ever do.

We can't change {done, value} now, it's the decision in ES6 design time, and because python or old firefox use StopIterationError to denote the end of the iteration, so it's clear this is the intentional design. I can't say whether it's good or bad, but we can't change any semantic (like {done: true, value: 'somevalue'} ) of it now. So we can only hope engines smart enough to optimize the allocation of {done, value} objects. If redesigned today maybe it could be Record #{done, value} which obviously much easy to optimize.

@conartist6
Copy link

It's an interesting case with flatMap, but really you're just describing any time a there's a backwards/forwards mismatch right? In flatMap though you might not throw until you've already emitted half the results, which is interesting. But I guess warning you about that error in advance is what type systems are for.

@conartist6
Copy link

As for sync and async iteration, I do not think it angers Zalgo. In fact that article talks about exactly what I'm discussing. It has its own subheading: "If the result is usually available right now, and performance matters a lot:". It says:

Check if the result is available [synchronously].
If it is, return it.
If it is not, return an error code, set a flag, etc.
Provide some Zalgo-safe mechanism for handling the error case and awaiting future availability.

Well, there you go. This is talking about callbacks, but when you're working with promises it's even simpler. Either return the value, or return a Promise so that the caller knows that it must wait for the value.

As he notes in the next section:

Note that this applies to not just callbacks, but also to Promises and any other control-flow abstractions. If your API is frequently returning a Promise for a value that you know right now, then either you are leaving performance on the table, or your Promise implementation releases [Zalgo].

@conartist6
Copy link

I also don't buy your argument that I need to know one specific operation to optimize. Indeed optimizing one specific operation is already possible. What we're discussing is creating a way to define operations once and have them work efficiently and correctly for any input that implements iteration protocols.

Right now someone working with chunked data would have two choices:

  • They could use async iteration and have many powerful methods available but, chaining many small async transforms (e.g. map(x => x.foo)) together starts to have costs dominated by the overhead of the for await..of loop itself. This problem has already been reported by the first users attempting this kind of async work.
  • They could use nested for loops (sync inside async), which eliminates the unnecessary overhead of extra awaits but causes a bunch of new corner cases and eliminates the possibility of using the grammar defined for iterators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants