For FeedGen's core purpose of applying LLM prompts at scale, BigQuery's ML.GENERATE_TEXT() function is an obvious option: provided that the source feed is available in BigQuery, the generation of titles and descriptions can be done entirely there. This Guide describes how to do this, with a focus on one-time generation of titles & descriptions for a given set of products.
- how to facilitate recurring processing of newly added products,
- how to extract product attributes (like FeedGen does),
- how to use the Product Studio API from BigQuery, or
- how to build a graphical user interface around this.
The following factors determine the throughput you can expect:
- Prompting frequency
This limit defaults to 60 requests per minute for Gemini 1.5 Pro and 200 for Flash. To increase this, both of the following need to be changed:- General Vertex AI limits (how to change)
- BQ-specific limits (for ML.GENERATE_TEXT, request changes at [email protected])
- Response latency / Records processed in parallel
Each prompt may take several seconds to be processed, so processing them sequentially would force throughput drastically below the above limits. Hence, by default, for Pro 3 records are processed in parallel, and 5 for Flash. Increases can be requested at [email protected]. - Queries processed in parallel
This limit is 5 for both Pro and Flash. In practice, it appears that the 5th execution might fail, but 4 seem safe to use. - Prompts needed per product
Titles and descriptions are generated separately, so we need 2 prompts per product. - Products processed per prompt
This can be freely set, as long as the model's limits are not exceeded: for Flash, these are 1M tokens for the input and 8k tokens for the output (Pro: 2M / 8k).
Assuming that the first three factors let us max out 200 requests per minute and we process 10 products per title/description prompt, we could process 200×10÷2 = 1000 products per minute. In practice, it will be much less than that, unless the amount of records processed in parallel is increased.
LIMIT 600
in here). While there is also a limit on rows per query, that is over 20k, compared with about 60 resulting from the default 600 input records per query when grouping 10 products per prompt. The reason to keep this figure low is rather that progress is then saved more frequently.
The Vertex AI costs of this approach should rather be less than those of FeedGen, as titles and descriptions are processed in batches, so the static parts of prompts are evaluated less often. Prominently, one needs to choose between the Flash and Pro models that have a price factor of 10 between them (as of July 2024). The additional BigQuery costs should be rather low at the scale of usual product feeds.
For more information, see Google Cloud's pricing calculator.