Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asynchronous multi-source dynamic completion #17

Open
karthink opened this issue Apr 20, 2024 · 8 comments
Open

asynchronous multi-source dynamic completion #17

karthink opened this issue Apr 20, 2024 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@karthink
Copy link

karthink commented Apr 20, 2024

@armindarvish A proof of concept, following our reddit discussion.:

  • Fully asynchronous multi-source dynamic completion,
  • combining local and remote sources,
  • with some extra gptel spice: shows a < 10 words summary of the answer in the minibuffer, for a token cost of < 30 (so that's basically free)
consult-web-full-async-demo-2.mp4

The code is here. Some notes:

  • The function cw--multi-search is clumsy because we don't have consult-multi's abstractions to rely on. If minad is okay adding some support for async consult sources it will look much more straightforward. Right now it has one code block for each source.

  • The rest of the code is very straightforward, except for this control flow that took a while for me to wrap my head around:

(consult-read
 (consult--async-split
  (consult--async-throttle
   (cw--multi-search
    (consult--async-refresh-timer
     (consult--async-sink))))))

Once I finally understood how the consult-async system works (much thanks to minad) this was fairly straightforward.

  • I ended up writing new formatting functions for brave and gptel but you can just plug in your versions instead. I reused consult-web-elfeed with no changes.

  • The code uses plz, but this is unnecessary. You can easily replace it with url-retrieve (the asynchronous version) or request. All we need is a callback function from these fetchers.

  • Even though it's "asynchronous", Emacs does hitch when starting 3-4 processes at once, mostly from garbage collection. But the live updates are still a big improvement in my estimate.

From our reddit discussion:

(I said:) I'll whip up a proof-of-concept two-source async command with consult--read when I get the time. (It won't address the second issue -- the differing field formatting for different sources. I'm not sure about how to handle that.)

Addressing the field formatting turned out not to be a problem.

Also there is going to be timing issues here that needs to be handled gracefully because there is going to be different latencies as minad said in a comment above.

As you can see, this wasn't an issue when the Elfeed results updated before the others did.

The gptel candidate in consult-web-* commands is just a placeholder (with no request) and therefore available immediately,

Side note: gptel can integrate much more deeply with other Emacs tools using gptel-request, which is a lower-level interface than gptel-send to all LLMs it supports. gptel-send and the transient menu options all use gptel-request under the hood.

but getting many hits from an API will be available much later and consult--read has to append the results in a reasonable time-interval and etc.

I don't see the issue here. The responses come in when they come in. The issue of grouping together disparate local and remote sources (like buffers/multi-line and web searches) is not a technical one.

From what I remember, request sends the response to a buffer and runs the parser in that buffer and saves the parsed response in a global variable, but passing this global variable (or a function that acts on it) to consult--read or consult-dynamic-collection did not work, probably because I could not figure out if there is a way to signal consult to refresh candidates on demand!

The other issue that I still don't understand how to handle at least with request.el is that the response from brave-url and bing-url cannot at the same time be saved in the same global variable, and because for different sources we may need different fields (for example pubmed v.s. google),

I don't think request uses a single global variable? Or at least I don't remember. In any case, both url-retrieve and plz don't have this issue, every request is independent.

when I send the candidates to consult, consult needs to know the source and how to build the preview and return actions, etc. for that candidate, I need to track the source, and if I am trying to update the results dynamically I have to do the clearing and updating the global variables myself rather than using the consult-dynamic-collection

There isn't much difference between the current implementation and this proof of concept -- I reused your :on-preview, :on-callback and :state functions, slightly modified :lookup, and wrote a :group function (that you didn't need because consult's source mechanism handles this for you).

That would mean reinventing the wheel on consult, request and possibly both to get the general functionality I wanted for multiple sources

There is some wheel-reinvention going on here, mostly to paper over the unavailability of async+multiple sources. See cw-actions.

This is where I stopped last time and ended up wrapping the request function in a let-binding block, which then creates an issue with global v.s. local variables, and therefore had to make it synchronous otherwise things get lost in the local variables

No global variables are used in this proof of concept, everything is folded into the closures used by the consult-async functions. It probably generates more garbage than is ideal but that's okay, we generally have < 50 items in the results.

@armindarvish armindarvish added the enhancement New feature or request label Apr 20, 2024
@armindarvish armindarvish self-assigned this Apr 20, 2024
@karthink
Copy link
Author

minad's comment clued me in on implementing consult--multi with async, which should make this process much simpler. I'll try to improve the proof of concept to be more modular using minad's idea.

@armindarvish
Copy link
Owner

Adding the discussion on the consult for reference here as well: minad/consult#1007 (comment)

I think we can get this to work on the consult-web side pretty quickly thanks to @karthink's prrof of concept alread. We can then wait for consult to incorporate multi-source async as a more general solution. Once that is in place, this package can possibly be a lot smaller and cleaner!

@armindarvish
Copy link
Owner

armindarvish commented May 2, 2024

@karthink I am trying to get your consult-web-mini to work with request instead of plz, but I cannot get the async request to work. Any chance you can make an example for brave using request or url-retrieve instead of plz?

@karthink
Copy link
Author

karthink commented May 2, 2024 via email

@armindarvish
Copy link
Owner

armindarvish commented May 2, 2024

@karthink Ooooo Nice! I can follow the async merge results (at least to some extent), but I cannot write elisp as good (concise) as you do :D! This is awesome!

By the way, I have managed to add some extra stuff to what you have to get dynamic grouping to work (which also requires passing arguments from the minibuffer). Take a look below and see if you have some suggestion:

(defun cw--group-function (sources cand transform)
  "Group candidates by GROUP-BY keyword.

This is passed as GROUP to `consult--read' on candidates and is used to define the grouping for CAND. "
  (if transform (substring cand)
    (let* ((group-by (or consult-web--override-group-by consult-web-group-by))
           (group-by (if (not (keywordp group-by)) (intern (concat ":" (format "%s" group-by))) group-by)))
      (cond
       ((equal group-by :domain)
        (if-let* ((url (get-text-property 0 :url cand))
                  (urlobj (if url (url-generic-parse-url url) nil))
                  (domain (if (url-p urlobj) (url-domain urlobj))))
            domain
          nil))
       ((member group-by '(:nil :none :no :not))
        nil)
       (group-by
        (if-let ((group (get-text-property 0 group-by cand)))
            group
          "N/A"))
       (t
        (if-let* ((source (plist-get (consult--multi-source sources cand) :name)))
            source
          nil)))
      )))

(defun cw--multi-group (sources cand transform)
  "Return title of candidate CAND or TRANSFORM the candidate given SOURCES."
  (if transform cand
    (let* ((fun (and (plist-member (consult--multi-source sources cand) :group)
                     (plist-get (consult--multi-source sources cand) :group))))
      (cond
       ((functionp fun)
        (funcall fun sources cand transform))
       ((stringp fun)
        fun)
       ((eq fun 'nil)
        nil)
       (t
        (plist-get (consult--multi-source sources cand) :name))))))
        

I then replase consult--multi-group with cw--multi-group in cw--multi:

(defun cw--multi (sources &rest options)
  (let* ((sources (consult--multi-enabled-sources sources))
         (selected
          (apply #'consult--read
                 (consult--async-split
                  (consult--async-throttle
                   (cw--multi-async
                    (consult--async-refresh-timer
                     (consult--async-sink))
                    sources)))
                 (append
                  options
                  (list
                   :sort        t
                   :history     'cw--search-history
                   :initial     (consult--async-split-initial nil)
                   :category    'multi-category
                   :predicate   (apply-partially #'consult--multi-predicate sources)
                   :annotate    (apply-partially #'cw--annotate sources)
                   :group       (apply-partially #'cw--multi-group sources)
                   :lookup      (apply-partially #'consult--multi-lookup sources)
                   :preview-key (consult--multi-preview-key sources)
                   :narrow      (consult--multi-narrow sources)
                   :state       (consult--multi-state sources))))))
    (if (plist-member (cdr selected) :match)
        (when-let (fun (plist-get (cdr selected) :new))
          (funcall fun (car selected))
          (plist-put (cdr selected) :match 'new))
      (when-let (fun (plist-get (cdr selected) :action))
        (funcall fun (car selected)))
      (setq selected `(,(car selected) :match t ,@(cdr selected))))
    selected))

note that I also add the property :source to each candidate for this purpose so (i can use -- --group source. For example for brave I add :source "Brave" to each candidate:

(defun cw--brave-request (query callback)
  (pcase-let* ((`(,input . ,args) (cw--split-command query)))
    (apply
     #'plz 'get (apply #'cw-brave-url-string input (car args))
     (cw-brave-query-args
         (lambda (attrs)
           (when-let* ((raw-results (map-nested-elt attrs '(:web :results)))
                       (annotated-results
                        (mapcar
                         (lambda (item)
                           (let* ((title (map-elt item :title))
                                  (search-url (apply #'cw-brave-url-string input (car args)))
                                  (url (map-elt item :url))
                                  (urlobj (and url (url-generic-parse-url url)))
                                  (domain (and (url-p urlobj) (url-domain urlobj)))
                                  (domain (and (stringp domain)
                                               (propertize domain 'face 'font-lock-variable-name-face)))
                                  (path (and (url-p urlobj) (url-filename urlobj)))
                                  (path (and (stringp path)
                                             (propertize path 'face 'font-lock-warning-face)))
                                  (decorated (concat title "\t"
                                                     (propertize " " 'display '(space :align-to center))
                                                     domain path
                                                     )))
                             (propertize decorated
                                         :source "Brave"
                                         :title title
                                         :url url
                                         :search-url search-url
                                         :query query)))
                         raw-results)))
             (funcall callback annotated-results)))))))

For passing the argument to cw--multi I have been experimenting with doing it inside the source request funcitons for now. For example for brave it looks like:

(defun cw--split-command (query &rest args)
  (pcase-let* ((`(,input . ,opts) (consult--command-split query))
               (remaining-opts (list))
               (args (or args (list)))
               )
    (if opts
        (progn
      (cl-loop for opt in opts
               do
               (pcase-let* ((`(,key . ,val) (consult-web--extract-opt-pair opt opts (list "--group" ":group"))))
                 (when key
                   (setq args (append args (list key val)))
                   (setq remaining-opts (cl-delete-duplicates (append remaining-opts (list opt (format "%s" val))))))
                 ))

      (setq opts (seq-difference opts remaining-opts))

      (when (member "-n" opts)
        (setq args (append args `(:count ,(cadr (member "-n" opts))))))

      (when (member "-p" opts)
        (setq args (append args `(:page ,(cadr (member "-p" opts))))))

      (if (or (member "-g" opts) (member ":group" opts) (member "--group" opts))
          (cond
           ((member "-g" opts)
            (setq consult-web--override-group-by (cadr (member "-g" opts))))
           ((member "--group" opts)
            (setq consult-web--override-group-by (cadr (member "--group" opts))))
           ((member ":group" opts)
            (setq consult-web--override-group-by (cadr (member ":group" opts)))))
        (setq consult-web--override-group-by nil)
        ))
      (setq consult-web--override-group-by nil))
    (list input args)
    ))

(defun cw--brave-request (query callback)
  (pcase-let* ((`(,input . ,args) (cw--split-command query)))
    (apply
     #'plz 'get (apply #'cw-brave-url-string input (car args))
     (cw-brave-query-args
         (lambda (attrs)
           (when-let* ((raw-results (map-nested-elt attrs '(:web :results)))
                       (annotated-results
                        (mapcar
                         (lambda (item)
                           (let* ((title (map-elt item :title))
                                  (search-url (apply #'cw-brave-url-string input (car args)))
                                  (url (map-elt item :url))
                                  (urlobj (and url (url-generic-parse-url url)))
                                  (domain (and (url-p urlobj) (url-domain urlobj)))
                                  (domain (and (stringp domain)
                                               (propertize domain 'face 'font-lock-variable-name-face)))
                                  (path (and (url-p urlobj) (url-filename urlobj)))
                                  (path (and (stringp path)
                                             (propertize path 'face 'font-lock-warning-face)))
                                  (decorated (concat title "\t"
                                                     (propertize " " 'display '(space :align-to center))
                                                     domain path
                                                     )))
                             (propertize decorated
                                         :source "Brave"
                                         :title title
                                         :url url
                                         :search-url search-url
                                         :query query)))
                         raw-results)))
             (funcall callback annotated-results)))))))

But, I think eventually I should move the (pcase-let* (((,input . ,args) (cw--split-command query)))to the control flow inside thecw--multiandcw--multi-async` so it is run once for all the sources instead of for each source, but haven't looked at that yet.

here is a screenshot:

cw-multi

another thing I may have to redo is the consult--multi-lookup. I don't like retrieving the propertized text from candidate. I'd rather strip the proprties or even get some property from the candidate (like :title) instead of the whole candidate. But haven't looked at that yet, either.

@armindarvish
Copy link
Owner

Update:
I have reimplemented the async process from @karthink prototype in the async branch now. There are still some rough edges I need to think about, but it's already almost ready to merge.

This is a complete redesign of the package and prioritizing the async/dynamic over the synchronous candidate collection.

I have also:

  • improved the url-retrieve backend with support for request and plz as well.
  • I added support for browser-hist.el

I am going to test this for a while, add support for some more sources and work on updating the documentation, before I merge it to the develop/main.

@karthink
Copy link
Author

karthink commented May 12, 2024

This is a complete redesign of the package and prioritizing the async/dynamic over the synchronous candidate collection.

That sounds great! Looking forward to it so I can switch back to consult-web.

Sorry for the lack of feedback to your previous comment and the discussion on the Consult repo -- having a busy couple of weeks here and I haven't been able to focus on the example code you've provided. Will catch up eventually.

@armindarvish
Copy link
Owner

Update:

I have added support for async commands (like grep and notmuch). Need more testing of edge cases, but overall everything is operational.

The only annoying part that is left, is the ordering of the candidates and how they get appended to the list when they arrive. Right now, some new items get appended in the middle of the list, which can be a bit confusing. I Have not been able to get consult to insert newly arriving async candidates at the end of the list.

Ideally I would want new groups to be added at the end of the current groups and new candidates to be added to the end of the group list, but don't know how to achieve this yet!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants