Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide more restrictive defaults for white space patterns in JSON #839

Closed
rlouf opened this issue Apr 25, 2024 · 0 comments · Fixed by #916
Closed

Provide more restrictive defaults for white space patterns in JSON #839

rlouf opened this issue Apr 25, 2024 · 0 comments · Fixed by #916
Labels

Comments

@rlouf
Copy link
Member

rlouf commented Apr 25, 2024

Repeated new lines is a common issue when generating JSON so many users end up setting the value of the whitespace_pattern argument. We should probably set a more restrictive limit on the number of consecutive white spaces and new lines.

rlouf pushed a commit that referenced this issue May 24, 2024
Fixes #839 #908 #690 #450

## Problem

A major problem, especially with smaller language models, is the
repetition problem.

For example, let's say a model is generating json and must provide 12
space tokens for indentation in json output. Often a language model will
assign a high probability to a 13th space token, and do the same for a
14th space, and then enter an infinite space generation loop.

This is a problem with NLG that has been known for half a decade, but
only has mitigations (mirostat, repetition penalty, using hundreds of
billions of weights, etc), no absolute solutions (except for
**structured generation**)

## Solution

For structured json generation, we set a sane default whitespace pattern
of `r"[ ]?"`. This removes all newlines and indentation. It disallows
any syntactic whitespace beyond a single space separator.

Users can still set the argument `whitespace_pattern=` if they want
different behavior
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants