Skip to content

Commit

Permalink
Take full page screenshots #143
Browse files Browse the repository at this point in the history
Added the fullPage flag to take full screen screenshots
updated the UI accordingly to properly show the screenshots instead of scaling it down
  • Loading branch information
kamtschatka committed May 10, 2024
1 parent cbc8dde commit d7833f9
Show file tree
Hide file tree
Showing 4 changed files with 9 additions and 3 deletions.
6 changes: 4 additions & 2 deletions apps/web/components/dashboard/preview/LinkContentSection.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@ function ScreenshotSection({ link }: { link: ZBookmarkedLink }) {
return (
<div className="relative h-full min-w-full">
<Image
fill={true}
alt="screenshot"
src={`/api/assets/${link.screenshotAssetId}`}
className="object-contain"
width={0}
height={0}
sizes="100vw"
style={{ width: "100%", height: "auto" }}
/>
</div>
);
Expand Down
3 changes: 2 additions & 1 deletion apps/workers/crawlerWorker.ts
Original file line number Diff line number Diff line change
Expand Up @@ -231,10 +231,11 @@ async function crawlPage(jobId: string, url: string) {
// If you change this, you need to change the asset type in the store function.
type: "png",
encoding: "binary",
fullPage: serverConfig.crawler.fullPageScreenshot,
}),
]);
logger.info(
`[Crawler][${jobId}] Finished capturing page content and a screenshot.`,
`[Crawler][${jobId}] Finished capturing page content and a screenshot. FullPageScreenshot: ${serverConfig.crawler.fullPageScreenshot}`,
);
return { htmlContent, screenshot, url: page.url() };
} finally {
Expand Down
1 change: 1 addition & 0 deletions docs/docs/03-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,5 +42,6 @@ Either `OPENAI_API_KEY` or `OLLAMA_BASE_URL` need to be set for automatic taggin
| CRAWLER_NUM_WORKERS | No | 1 | Number of allowed concurrent crawling jobs. By default, we're only doing one crawling request at a time to avoid consuming a lot of resources. |
| CRAWLER_DOWNLOAD_BANNER_IMAGE | No | true | Whether to cache the banner image used in the cards locally or fetch it each time directly from the website. Caching it consumes more storage space, but is more resilient against link rot and rate limits from websites. |
| CRAWLER_STORE_SCREENSHOT | No | true | Whether to store a screenshot from the crawled website or not. Screenshots act as a fallback for when we fail to extract an image from a website. You can also view the stored screenshots for any link. |
| CRAWLER_FULL_PAGE_SCREENSHOT | No | false | Whether to store a screenshot of the full page or not. Disabled by default, as it can lead to much higher disk usage. If disabled, the screenshot will only include the visible part of the page |
| CRAWLER_JOB_TIMEOUT_SEC | No | 60 | How long to wait for the crawler job to finish before timing out. If you have a slow internet connection or a low powered device, you might want to bump this up a bit |
| CRAWLER_NAVIGATE_TIMEOUT_SEC | No | 30 | How long to spend navigating to the page (along with its redirects). Increase this if you have a slow internet connection |
2 changes: 2 additions & 0 deletions packages/shared/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ const allEnv = z.object({
CRAWLER_NUM_WORKERS: z.coerce.number().default(1),
CRAWLER_DOWNLOAD_BANNER_IMAGE: stringBool("true"),
CRAWLER_STORE_SCREENSHOT: stringBool("true"),
CRAWLER_FULL_PAGE_SCREENSHOT: stringBool("false"),
MEILI_ADDR: z.string().optional(),
MEILI_MASTER_KEY: z.string().default(""),
LOG_LEVEL: z.string().default("debug"),
Expand Down Expand Up @@ -66,6 +67,7 @@ const serverConfigSchema = allEnv.transform((val) => {
navigateTimeoutSec: val.CRAWLER_NAVIGATE_TIMEOUT_SEC,
downloadBannerImage: val.CRAWLER_DOWNLOAD_BANNER_IMAGE,
storeScreenshot: val.CRAWLER_STORE_SCREENSHOT,
fullPageScreenshot: val.CRAWLER_FULL_PAGE_SCREENSHOT,
},
meilisearch: val.MEILI_ADDR
? {
Expand Down

0 comments on commit d7833f9

Please sign in to comment.