Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimodal support with Phi 3 Vision + Transformers #1020

Open
wants to merge 318 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
318 commits
Select commit Hold shift + click to select a range
ae1baa6
int -> bytes
hudson-ai Jun 17, 2024
077e4b9
Explicitly pass kwargs to EngineCallResponse
hudson-ai Jun 18, 2024
ede3333
matched
hudson-ai Jun 18, 2024
ed4b220
Add some comments
hudson-ai Jun 18, 2024
d26536c
Merge branch 'main' into lazy_grammars
hudson-ai Jun 19, 2024
6f98db5
done is callable now
hudson-ai Jun 19, 2024
7848c09
LLGUIDANCE_LOG_LEVEL
hudson-ai Jun 19, 2024
2351793
Prelim greedy json
hudson-ai Jun 18, 2024
c8f89fc
Move temperature to get_logits
hudson-ai Jun 18, 2024
ecb902c
captures already decoded
hudson-ai Jun 18, 2024
46c1cfb
More helpful exceptions
hudson-ai Jun 19, 2024
1fd2445
Consume bytes in init
hudson-ai Jun 19, 2024
791ce57
valid_next_bytes
hudson-ai Jun 19, 2024
6293930
next_byte_mask
hudson-ai Jun 19, 2024
ffd51b3
adapt parser tests
hudson-ai Jun 19, 2024
97a91fc
Fix ParserExceptions
hudson-ai Jun 19, 2024
161cd9f
Serialize ByteRange as if they are wrapped in GenCommitPoint
hudson-ai Jun 19, 2024
eadd79e
Typo
hudson-ai Jun 19, 2024
126aaa1
Epsilon for repr of grammars with null
hudson-ai Jun 21, 2024
55b3079
Byte(b".") -> b"."
hudson-ai Jun 21, 2024
3531c43
black
hudson-ai Jun 21, 2024
c59d1a2
more inclusive number schema
hudson-ai Jun 22, 2024
36301e8
Byte(b".") -> b"."
hudson-ai Jun 22, 2024
bfe67c2
no more Byte/ByteRange in tests
hudson-ai Jun 22, 2024
0ebb59c
use eos token as stop token
hudson-ai Jun 24, 2024
4423906
fix LLGUIDANCE_LOG_LEVEL
mmoskal Jun 24, 2024
ee133e7
make string lexemes contextual
hudson-ai Jun 24, 2024
7f35ed0
cache json definitions
hudson-ai Jun 24, 2024
60855b6
Pass max_tokens to json
hudson-ai Jun 24, 2024
a6e7f33
<Bandaid> inject BOS token after process_prompt
hudson-ai Jun 25, 2024
d40c5ae
refactor sample_with_temperature to take mask
hudson-ai Jun 26, 2024
acddbc6
Make mock produce valid tokenizations
hudson-ai Jun 26, 2024
a4ec9a4
Temporary fix/hack for whitespace validation in tests
hudson-ai Jun 26, 2024
217f8ee
Azure guidance temperature tests
hudson-ai Jun 26, 2024
7580624
xfails
hudson-ai Jun 27, 2024
36a4c7e
fix recursion check in commit_point()
mmoskal Jun 27, 2024
e677c14
mark flaky tests as xpass
mmoskal Jun 27, 2024
063a011
Make mock always sample greedily
hudson-ai Jun 27, 2024
85908b0
further fix for recursive regex
mmoskal Jun 27, 2024
5d966e0
Merge branch 'lazy_grammars' of https://github.com/hudson-ai/guidance…
mmoskal Jun 27, 2024
a8469fc
remove some xfails (failed due to mock not forcing next token in byte…
hudson-ai Jun 27, 2024
815ed6b
Using commit point as a regex (ish)
hudson-ai Jun 27, 2024
e1645c1
remove debug out
mmoskal Jun 28, 2024
3dd9019
add tests for LLParser
mmoskal Jun 28, 2024
a031f20
add docs
mmoskal Jun 28, 2024
34aa88b
add nimble fighter test
mmoskal Jun 28, 2024
6b82dbf
more comments
mmoskal Jun 28, 2024
a88456f
more dolphin tests
mmoskal Jun 28, 2024
2e40d43
test ll backtrack
mmoskal Jun 28, 2024
4b8e423
more backtrack testing
mmoskal Jun 28, 2024
3aca8e2
pop tokens test
mmoskal Jun 28, 2024
20db158
add nullable lexeme test
mmoskal Jun 28, 2024
f8ef812
add ll_max_tokens test; fix test names
mmoskal Jun 28, 2024
feb6712
Remove 12 xfails
hudson-ai Jun 29, 2024
4acc833
check gen() with different max_tokens
mmoskal Jun 29, 2024
ac05ebe
force parse from matched to done when getting captures
hudson-ai Jul 1, 2024
959f2b9
remove more xfails
hudson-ai Jul 1, 2024
6372a15
number as its own lexeme
hudson-ai Jul 1, 2024
0a5a5d1
disambiguate json number lexemes
hudson-ai Jul 2, 2024
b545d04
fix test
hudson-ai Jul 2, 2024
28b2309
Revert "disambiguate json number lexemes"
hudson-ai Jul 2, 2024
1afc9bb
remove more xfails
hudson-ai Jul 2, 2024
9ef9b07
fix https://github.com/hudson-ai/guidance/issues/18 (added test)
mmoskal Jul 2, 2024
87c80b8
a few tests for grammar.match
hudson-ai Jul 3, 2024
b00a695
greedy tests for ambiguous ending lexemes
hudson-ai Jul 3, 2024
cf56a9e
nullable final lexeme test
hudson-ai Jul 3, 2024
606b208
Delete code in model
hudson-ai Jul 3, 2024
546a013
Remove all non-gen_mode branches in _gen.py
hudson-ai Jul 3, 2024
1b26e13
Fix non-recursive selects inside commit points
hudson-ai Jul 3, 2024
28171f0
serialize substring as regex
hudson-ai Jul 3, 2024
34fd004
add test for https://github.com/hudson-ai/guidance/issues/20
mmoskal Jul 3, 2024
c4c4872
send EOS at end of Mock in substring tests
hudson-ai Jul 3, 2024
e4f37de
refresh xfails
hudson-ai Jul 3, 2024
25f9caf
add test for https://github.com/hudson-ai/guidance/issues/19
mmoskal Jul 3, 2024
43a105f
Revert "Revert "disambiguate json number lexemes""
hudson-ai Jul 3, 2024
bb9e742
remove more xfails
hudson-ai Jul 3, 2024
6b30a97
whitespace test misspec
hudson-ai Jul 3, 2024
84c1445
whitespace test misspec (remove xfail)
hudson-ai Jul 3, 2024
6f81987
add nice man tests
mmoskal Jul 3, 2024
f76b82f
Add failing variant of 'nice man' test
hudson-ai Jul 3, 2024
b60641b
Make test less flaky
hudson-ai Jul 3, 2024
733da2c
Add explicit xfails for commit points
hudson-ai Jul 3, 2024
85f80b6
Change test to be less dependent on idiosyncratic Mock behavior
hudson-ai Jul 3, 2024
86cd01b
Remove test case that was completely dependent on idiosyncratic Mock …
hudson-ai Jul 3, 2024
142dfa9
Remove xfail from file
hudson-ai Jul 3, 2024
cb216f9
compact flag to json generation
hudson-ai Jul 5, 2024
68f256b
add negative tests for compact/whitespace-flexible modes
hudson-ai Jul 5, 2024
b7d892a
string never returns singleton bytes
hudson-ai Jul 5, 2024
724cd28
compact/flexible pydantic negative cases
hudson-ai Jul 5, 2024
9dc7b86
consolidate generate_and_check implementations a bit
hudson-ai Jul 5, 2024
61b857c
make tests pass with latest phi3
mmoskal Jul 9, 2024
925e0f5
further test fixes
mmoskal Jul 9, 2024
23433ee
Slight refactor of Engine __call__, removing 'next'
hudson-ai Jul 9, 2024
9187500
Add extra layer of indirection -- 'get_next_token'
hudson-ai Jul 9, 2024
8bf616c
Start refactoring get_logits w/o forced_butes or current_temp
hudson-ai Jul 9, 2024
c724445
drop unused attr
hudson-ai Jul 9, 2024
578fb2c
annotations
hudson-ai Jul 9, 2024
4bc1e3f
rough changes to GrammarlessEngine to get it working
hudson-ai Jul 9, 2024
d59cfd5
speed things up by sharing a trie
hudson-ai Jul 9, 2024
d66648d
remove forced_bytes code entirely (trie does the trick)
hudson-ai Jul 9, 2024
722d682
mark _stop_ tests as xfail for now with az guidance
mmoskal Jul 10, 2024
b05fd96
fix log messages
hudson-ai Jul 10, 2024
c15877c
refactor get_logits function signature
hudson-ai Jul 10, 2024
22e5704
pass lazy field in Gen
mmoskal Jul 10, 2024
3b2e762
mark failing tool test as xfail
mmoskal Jul 10, 2024
b0972ca
use cpp ByteTrie
hudson-ai Jul 10, 2024
e8b57f0
remove old skip.txt
hudson-ai Jul 10, 2024
e33588e
Merge branch 'main' into lazy_grammars
hudson-ai Jul 10, 2024
3703396
llguidance post_process
hudson-ai Jul 10, 2024
653bf78
dedent ll_fighter
hudson-ai Jul 10, 2024
28c786d
Revert most GrammarlessEngine changes while still making it work
hudson-ai Jul 10, 2024
1a46468
adjust to new mid_process() api
mmoskal Jul 10, 2024
e122cd1
add test for stop token
mmoskal Jul 10, 2024
fbb6293
also check for lone stop tokens
mmoskal Jul 10, 2024
7ec0168
No need to prime generator internally
hudson-ai Jul 11, 2024
4798474
narrow types
hudson-ai Jul 11, 2024
2c35ad9
don't print logs, other than warnings from server llguidance
mmoskal Jul 11, 2024
857f5e3
the stop tests now pass (though they don't hide the stop from the model)
mmoskal Jul 11, 2024
c848b22
llguidance schemas
hudson-ai Jul 11, 2024
c2d3afb
more exceptions
hudson-ai Jul 12, 2024
9408a98
move schemas to _schema.py
hudson-ai Jul 12, 2024
d6bcb3e
some unused imports
hudson-ai Jul 12, 2024
51814ec
black
hudson-ai Jul 12, 2024
d84292a
ByteTrie to get next token in Mock
hudson-ai Jul 16, 2024
281cab5
Remove common base Parser class of LLParser and ByteParser
hudson-ai Jul 16, 2024
64e9574
typing
hudson-ai Jul 16, 2024
198da5b
remove some unused code from gen
hudson-ai Jul 16, 2024
a32a9e3
add llguidance dependency
hudson-ai Jul 16, 2024
c58c275
mypy
hudson-ai Jul 16, 2024
df2a3d9
make smoke test more exact
hudson-ai Jul 16, 2024
a389885
Restore commit_point failure in tool_call test
hudson-ai Jul 16, 2024
cb61d81
Merge branch 'main' into lazy_grammars
hudson-ai Jul 16, 2024
37cf618
Allow engine to take serialized or unserialized grammars
hudson-ai Jul 16, 2024
77c5dbd
Terminate byte patterns (mock semantics changed)
hudson-ai Jul 16, 2024
181ab71
Higher max_tokens (mock tokenizations changed)
hudson-ai Jul 16, 2024
f6bd641
metrics (maybe slightly fudged) for azure guidance
hudson-ai Jul 16, 2024
9268784
EngineCallResponse protobuf->pydantic
hudson-ai Jul 16, 2024
174c481
Handle stripping __LIST_APPEND: in validator
hudson-ai Jul 16, 2024
215bd30
Move GuidanceEngineMetrics to schemas
hudson-ai Jul 16, 2024
fe8df69
remove protobuf code
hudson-ai Jul 16, 2024
8a6b637
remove protobuf dep
hudson-ai Jul 16, 2024
3b31b2c
remove protobuf tests
hudson-ai Jul 16, 2024
fae0e9c
fix test getattr
hudson-ai Jul 16, 2024
8ef75f8
remove grammar from parser
hudson-ai Jul 16, 2024
ea5d4e9
move google.generativeai import up to module level and add ignore[imp…
hudson-ai Jul 17, 2024
dcc86f9
simplify Engine.__call__ since we already have an EngineCallResponse
hudson-ai Jul 17, 2024
7064c68
Add some exception types
hudson-ai Jul 17, 2024
101eeae
Make sure _parse generator actually returns
hudson-ai Jul 18, 2024
fd2ca41
Basic stop-reason exception handling
hudson-ai Jul 19, 2024
53d3df6
no temperature or captures in regex serialized grammars
hudson-ai Jul 19, 2024
53171e1
more type:ignore for google...
hudson-ai Jul 19, 2024
0d0e249
Merge branch 'main' into lazy_grammars
hudson-ai Jul 19, 2024
1b9774e
mypy
hudson-ai Jul 19, 2024
96ec5ce
llguidance now on pypi
hudson-ai Jul 19, 2024
8f9bede
Drop lazy/greedy NestedGrammar distinction, rename to Subgrammar
hudson-ai Jul 19, 2024
9eec277
GenLexeme -> Lexeme
hudson-ai Jul 19, 2024
3461a2b
Make subgrammar and lexeme more private for now
hudson-ai Jul 19, 2024
e7abb2b
remove regex implementation in favor of one based on gen
hudson-ai Jul 19, 2024
312262d
Adjust regex tests for unicode
hudson-ai Jul 20, 2024
1b3b360
black
hudson-ai Jul 20, 2024
772ecdc
Merge branch 'main' into lazy_grammars
hudson-ai Jul 20, 2024
ba3f7f9
allow for "Authorization" header in az guidance
mmoskal Jul 21, 2024
3bda4d7
make it user LLGUIDANCE_LOG_LEVEL
mmoskal Jul 21, 2024
9118bb3
temporarily narrow exception handler to bet better error in CI
hudson-ai Jul 22, 2024
e9048c3
Revert "temporarily narrow exception handler to bet better error in CI"
hudson-ai Jul 22, 2024
f9b4195
add protobuf and sentencepiece as temporary test dependencies until i…
hudson-ai Jul 22, 2024
44fed2f
Merge branch 'lazy_grammars' of https://github.com/hudson-ai/guidance…
mmoskal Jul 22, 2024
aad4b5a
\d -> [0-9] to prevent max_tokens from cutting off non-unicode digits
hudson-ai Jul 22, 2024
1914c0b
DIVIDE by temperature...
hudson-ai Jul 22, 2024
acce0ef
hard-code unicode start bytes
hudson-ai Jul 22, 2024
c5b6997
compress representations a bit
hudson-ai Jul 22, 2024
65adc62
Remove commit_point attr
hudson-ai Jul 23, 2024
99d2406
Remove hidden attr
hudson-ai Jul 23, 2024
24c9820
Remove nullable attr
hudson-ai Jul 23, 2024
cdf18e8
GenCommitPoint -> RegularGrammar
hudson-ai Jul 23, 2024
0e78ae9
Placeholder init
hudson-ai Jul 23, 2024
9235709
LLParser -> TokenParser
hudson-ai Jul 23, 2024
b896ca0
Make gen match multiline by default
hudson-ai Jul 23, 2024
36fca3c
Restore old Mock code but flesh out its tokenizer and make sure it pr…
hudson-ai Jul 23, 2024
b9ac1e9
Restore original behavior in which "illegal" tokens terminate grammar…
hudson-ai Jul 23, 2024
4ee9cff
Only make illegal tokens EOS at end of grammar
hudson-ai Jul 23, 2024
df1146e
matched -> is_accepting
hudson-ai Jul 23, 2024
e51ae59
commit_point -> as_regular_grammar
hudson-ai Jul 23, 2024
024bd33
Temporarily(fingers crossed) deprecate tool calling
hudson-ai Jul 23, 2024
7afa96a
allow http urls
mmoskal Jul 24, 2024
9fd36e1
Merge branch 'main' into lazy_grammars
hudson-ai Jul 24, 2024
8475d88
Merge branch 'main' into lazy_grammars
hudson-ai Jul 25, 2024
7e72483
Initial attempts
hudson-ai Jul 24, 2024
f77f733
subgrammar tool call impl
hudson-ai Jul 25, 2024
017313b
back to tool_args
hudson-ai Jul 26, 2024
95314e1
Infer tool call prefix
hudson-ai Jul 26, 2024
2edb750
Remove xfail
hudson-ai Jul 26, 2024
9699fa1
test tools actually called
hudson-ai Jul 26, 2024
fa5d578
More leading text in tool call
hudson-ai Jul 26, 2024
125028f
Be less clever
hudson-ai Jul 26, 2024
f5b5cfc
clean up gen captures a bit
hudson-ai Jul 26, 2024
30163a2
Add no-tool option back in
hudson-ai Jul 26, 2024
b2a249a
temperature when tool calling
hudson-ai Jul 26, 2024
616b366
Merge branch 'main' into lazy_grammars
hudson-ai Jul 29, 2024
7eda809
Move GenData to _schema.py, keep mask as bytes
hudson-ai Jul 29, 2024
bf03d78
Merge branch 'main' into lazy_grammars
hudson-ai Jul 29, 2024
6bd9fb3
Remove special case for allowing EOS in grammarless (logic now in all…
hudson-ai Jul 29, 2024
aea428a
fix types
hudson-ai Jul 29, 2024
824f836
directly return eos_token_id rather than putting eos_token in dqueue …
hudson-ai Jul 29, 2024
012ac7c
revert substring test to restore 'failure' case
hudson-ai Jul 29, 2024
b85f74d
Merge branch 'main' into lazy_grammars
hudson-ai Jul 29, 2024
7027d77
Fix tool-call tests in light of fixed token_count underflow
hudson-ai Jul 30, 2024
c3f128f
Test multiple tool calls
hudson-ai Jul 30, 2024
da2cf45
Simplify test
hudson-ai Jul 30, 2024
278cc42
Some ideas to implement prompt parts and multimodal components
nking-1 Jul 24, 2024
be35d3b
sketching out how we can process multimodal data down to get_logits
nking-1 Jul 26, 2024
6619093
Eliminating some things from grammarless (prototype)
nking-1 Jul 30, 2024
e0b010e
More work toward reworking grammarless
nking-1 Jul 31, 2024
29a1ba6
Saving draft of how multimodal might look for openai
nking-1 Aug 1, 2024
f922ccc
rename tests so pytest -k works
mmoskal Aug 2, 2024
5fa82db
fix duplicate warning printing and add request data logging (at 4-5 l…
mmoskal Aug 2, 2024
2772297
Rework the structure and strategy for storing prompt with multimodal …
nking-1 Aug 3, 2024
22a2827
Saving phi3vision dev notebook
nking-1 Aug 7, 2024
3982ccb
Revert some previous changes to tokenization in preparation for next …
nking-1 Aug 7, 2024
7fbd550
Refactor token parser to be more flexible in initialization
nking-1 Aug 9, 2024
0dd84b7
Refactor parser to give more control of usage when needed
nking-1 Aug 9, 2024
ec6f43f
Phi 3 vision with transformers- draft
nking-1 Aug 9, 2024
5755165
Undo grammarless changes
nking-1 Aug 10, 2024
d5e0ac8
Rename and export phi 3 vision model
nking-1 Aug 10, 2024
04fcd9f
Fix phi3 vision fixture import errors
nking-1 Aug 12, 2024
f61d620
Merge branch 'main' into lazy_grammars
hudson-ai Aug 12, 2024
22c493b
Attempting a fix for phi 3 vision tokenization
nking-1 Aug 13, 2024
30a8559
Add phi 3 vision chat template
nking-1 Aug 14, 2024
3520114
add xfail for now (covered by llguidance issue #7)
hudson-ai Aug 14, 2024
e6260c2
add more explicit test for list append with no explicit stop (xfailed…
hudson-ai Aug 14, 2024
d046ef5
add back image pattern
nking-1 Aug 14, 2024
178cda5
save phi3vision dev notebook
nking-1 Aug 14, 2024
a72352b
Merge remote-tracking branch 'hudson/lazy_grammars' into multimodal_2
nking-1 Aug 14, 2024
2a0f3f7
image loading and passing to model
nking-1 Aug 22, 2024
9384b25
save phi-3 vision notebook
nking-1 Aug 22, 2024
c7b89fc
Merge remote-tracking branch 'upstream/main' into multimodal_2
nking-1 Aug 22, 2024
d474e6a
Fix tokenizer issue with phi3vision (hack, probably needs review)
nking-1 Aug 26, 2024
38cecb1
phi 3 vision chat template
nking-1 Sep 5, 2024
b4a2947
Merge branch 'main' into multimodal_2
nking-1 Sep 7, 2024
d7c5c10
dev notebooks for llguidance prompt processing
nking-1 Sep 8, 2024
cc8ac87
experimental phi 3 vision testing scripts
nking-1 Sep 10, 2024
73fa881
constraints tests for guidance + img
nking-1 Sep 10, 2024
761326b
Refactoring and cleanup of transformers & phi3v code
nking-1 Sep 10, 2024
a135311
Merge branch 'main' into multimodal_2
nking-1 Sep 10, 2024
160a449
KV caching for phi 3 vision
nking-1 Sep 10, 2024
2b7410b
Code cleanup - remove dev code
nking-1 Sep 11, 2024
4b46880
Small fixes to parameter types and logic
nking-1 Sep 12, 2024
105d648
Minor code cleanup
nking-1 Sep 26, 2024
af9d11b
Merge remote-tracking branch 'upstream/main' into phi3vision
nking-1 Oct 7, 2024
ee91785
parser PR feedback
nking-1 Oct 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 63 additions & 43 deletions guidance/_parser.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import json
import logging
import os
from typing import Any, Generator, Optional, Tuple, Union
from typing import Any, Generator, Optional, Sequence, Tuple, Union

import llguidance # type: ignore[import-untyped]
import numpy as np
Expand All @@ -12,6 +13,9 @@
from .models._tokenizer import Tokenizer


logger = logging.getLogger(__name__)


class TokenParserException(Exception):
pass

Expand All @@ -30,29 +34,11 @@ class TokenParser:

def __init__(
self,
grammar: Union[GrammarFunction, str],
tokenizer: Tokenizer,
prompt: bytes = b"",
ensure_bos_token: bool = True,
ll_interpreter: llguidance.LLInterpreter,
prompt_tokens: list[int]
):
if isinstance(grammar, GrammarFunction):
# we can't have a terminal as the root
if isinstance(grammar, Terminal):
grammar = Join([grammar])
serialized_grammar = json.dumps(grammar.ll_serialize())
else:
serialized_grammar = grammar

self.tokenizer = tokenizer
self.ll_tokenizer = llguidance.LLTokenizer(
llguidance.TokenizerWrapper(tokenizer)
)
self.ll_interpreter = llguidance.LLInterpreter(
self.ll_tokenizer,
serialized_grammar,
log_level=int(os.environ.get("LLGUIDANCE_LOG_LEVEL", "1")),
)
self._generator = self._parse(prompt, ensure_bos_token)
self.ll_interpreter = ll_interpreter
self._generator = self._parse(prompt_tokens)
self._done = False

def is_accepting(self) -> bool:
Expand All @@ -70,28 +56,10 @@ def advance(
self._done = True
return None, e.value

def _process_prompt(self, prompt: bytes, ensure_bos_token: bool) -> list[int]:
prompt_tokens = self.ll_interpreter.process_prompt(
self.tokenizer.encode(prompt)
)
if (
ensure_bos_token
and self.tokenizer.bos_token is not None
and prompt_tokens[:1] != [self.tokenizer.bos_token_id]
):
# add the beginning of sequence token if needed
prompt_tokens = [self.tokenizer.bos_token_id] + prompt_tokens

return self.tokenizer.recode(prompt_tokens)


def _parse(
self,
prompt: bytes,
ensure_bos_token: bool,
tokens: list[int],
) -> Generator[Tuple[Optional[GenData], EngineCallResponse], Optional[int], EngineCallResponse]:
tokens = self._process_prompt(prompt=prompt, ensure_bos_token=ensure_bos_token)

while True:
mask, resp = self.ll_interpreter.mid_process()
r = LLInterpreterResponse.model_validate_json(resp)
Expand Down Expand Up @@ -133,6 +101,57 @@ def _parse(
return response


def process_prompt(prompt_tokens: Sequence[int], ll_interpreter: llguidance.LLInterpreter, bos_token_id: Optional[int]=None) -> list[int]:
# Allows ll_interpreter to make adjustments to prompt tokens, such as token healing
processed_tokens = ll_interpreter.process_prompt(prompt_tokens)
if (
bos_token_id is not None
and prompt_tokens[:1] != [bos_token_id]
):
# add the beginning of sequence token if needed
processed_tokens = [bos_token_id] + processed_tokens

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tokens will likely need to be recoded before being sent to the LLM for logits (I think only in the case that we just added a BOS token).

You could probably just throw that line in create_token_parser after calling process_prompt..? Just since you'll have access to a tokenizer there.

See tests/model_integration/test_model.py::test_associativity

return processed_tokens


def serialize_grammar(grammar: Union[GrammarFunction, str]) -> str:
if isinstance(grammar, GrammarFunction):
# we can't have a terminal as the root
if isinstance(grammar, Terminal):
grammar = Join([grammar])
return json.dumps(grammar.ll_serialize())
else:
return grammar


def create_token_parser(
grammar: Union[GrammarFunction, str],
tokenizer: Tokenizer,
prompt: bytes = b"",
ensure_bos_token: bool = True,
trace: bool = False
) -> TokenParser:
serialized_grammar = serialize_grammar(grammar)
ll_tokenizer = llguidance.LLTokenizer(
llguidance.TokenizerWrapper(tokenizer)
)
ll_interpreter = llguidance.LLInterpreter(
ll_tokenizer,
serialized_grammar,
log_level=2 if trace else int(os.environ.get("LLGUIDANCE_LOG_LEVEL", "1")),
)
if ensure_bos_token:
if tokenizer.bos_token_id is None:
logger.warning("Tokenizer does not have a BOS token, but ensure_bos_token is True")
bos_token_id = tokenizer.bos_token_id
else:
bos_token_id = None
prompt_tokens = tokenizer.encode(prompt)
processed_tokens = process_prompt(prompt_tokens, ll_interpreter, bos_token_id)
processed_tokens = tokenizer.recode(processed_tokens)
return TokenParser(ll_interpreter, processed_tokens)


class ByteParserException(Exception):
def __init__(self, *args, **kwargs):
self.current_byte = kwargs.pop("current_byte", None)
Expand All @@ -149,7 +168,7 @@ def __init__(
ensure_bos_token: bool = True,
):
self.tokenizer = ByteTokenizer()
self.token_parser = TokenParser(grammar, self.tokenizer, prompt, ensure_bos_token)
self.token_parser = create_token_parser(grammar, self.tokenizer, prompt, ensure_bos_token)
self.bytes = b""
self.gen_data: Optional[GenData] = None
self.pos = 0
Expand Down Expand Up @@ -289,3 +308,4 @@ def _update_capture(self, response: EngineCallResponse):
pass
self._variables[k] = v
self._variables_log_probs[k] = response.capture_group_log_probs[k]

18 changes: 18 additions & 0 deletions guidance/chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,9 @@ def get_role_end(self, role_name=None):
phi3_medium_template = "{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|user|>' + '\n' + message['content'] + '<|end|>' + '\n' + '<|assistant|>' + '\n'}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|end|>' + '\n'}}{% endif %}{% endfor %}"


# https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/blob/main/tokenizer_config.json#L397
phi3_vision_template = "{% for message in messages %}{{'<|' + message['role'] + '|>' + '\n' + message['content'] + '<|end|>\n' }}{% endfor %}{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{- '<|assistant|>\n' -}}{% endif %}"

# Although the templates are different, the roles are the same between medium and small (for now)
class Phi3SmallMediumChatTemplate(ChatTemplate):
# available_roles = ["user", "assistant"]
Expand All @@ -230,9 +233,24 @@ def get_role_start(self, role_name):
def get_role_end(self, role_name=None):
return "<|end|>\n"

class Phi3VisionChatTemplate(ChatTemplate):
template_str = phi3_vision_template

def get_role_start(self, role_name):
if role_name == "user":
return "<|user|>\n"
elif role_name == "assistant":
return "<|assistant|>\n"
else:
raise UnsupportedRoleException(role_name, self)

def get_role_end(self, role_name=None):
return "<|end|>\n"

CHAT_TEMPLATE_CACHE[phi3_small_template] = Phi3SmallMediumChatTemplate
CHAT_TEMPLATE_CACHE[phi3_medium_template] = Phi3SmallMediumChatTemplate
CHAT_TEMPLATE_CACHE[phi3_vision_template] = Phi3VisionChatTemplate


# --------------------------------------------------
# @@@@ Mistral-7B-Instruct-v0.2 @@@@
Expand Down
8 changes: 3 additions & 5 deletions guidance/library/_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
import typing
import urllib

from guidance.models._model import Modality

from .._guidance import guidance


Expand All @@ -29,9 +31,5 @@ def image(lm, src: typing.Union[str, pathlib.Path, bytes], allow_local: bool = T
else:
raise Exception(f"Unable to load image bytes from {src}!")

bytes_id = str(id(bytes_data))

# set the image bytes
lm = lm.set(bytes_id, bytes_data)
lm += f"<|_image:{bytes_id}|>"
lm = lm.append_multimodal(bytes_data, Modality.IMAGE)
return lm
1 change: 1 addition & 0 deletions guidance/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

# local models
from .transformers._transformers import Transformers, TransformersTokenizer
from .transformers._transformers_phi3v import TransformersPhi3Vision
from .llama_cpp import LlamaCpp
from ._mock import Mock, MockChat

Expand Down
2 changes: 1 addition & 1 deletion guidance/models/_grammarless.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ def _reset_shared_data(self, new_data: bytes, temperature: float):
self._last_stream_start = self._data

def get_next_token(
self, token_ids: list[int], mask: Optional[bytes], temperature: float) -> int:
self, prompt: str, token_ids: list[int], mask: Optional[bytes], temperature: float, media: Optional[dict]=None) -> int:

logger.debug(
f"Start Grammarless.get_next_token({token_ids=}, {mask=}, {temperature=})"
Expand Down
4 changes: 2 additions & 2 deletions guidance/models/_mock.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,9 @@ def __init__(self, tokenizer, byte_patterns, compute_log_probs, force):
# seed the random number generator
self._rand_generator = np.random.default_rng(seed=42)

def get_next_token(self, token_ids: list[int], mask: Optional[bytes], temperature: float) -> int:
def get_next_token(self, prompt: bytes, token_ids: list[int], mask: Optional[bytes], temperature: float, media: Optional[dict]=None) -> int:
self.called_temperatures.append(temperature)
return super().get_next_token(token_ids, mask, temperature)
return super().get_next_token(prompt, token_ids, mask, temperature, media)

def get_logits(self, token_ids: list[int]) -> np.ndarray:
"""Pretends to compute the logits for the given token state."""
Expand Down
Loading
Loading