Skip to content

Extract and summarize web content using OCI Generative AI for quick analysis and comprehension.

License

Notifications You must be signed in to change notification settings

oracle-devrel/oci-content-summary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCI GenAI Automatic Content Extractor & Summarizer

License: UPL Quality gate

Introduction

This project retrieves the 25 trending projects from a day (from here), reads their README.md files, and summarizes them in a way which is ready for social media.

Companies can use this in their content generation pipeline strategies, or individuals can use it when trying to grow their social media following with organic and up-to-date content!

These are the LLM hyperparameters used for generating the summary:

  • Prompt: "You are an expert AI researcher. Generate an abstractive summary of the given Markdown contents. Share an interesting insight to captivate attention. Here are the contents: "
  • Maximum number of tokens to generate: 550
  • Temperature (creativeness in generating the response): 1.0 (100%). The lower the temperature, the more consistent and less wild/imaginative generations are
  • Frequency penalty (penalize generating repeated text): 0.0 (0%)
  • Top p (use the p% most common words in the selected language): 0.75 (75%)

You can easily switch between multiple LLMs offered through the OCI Generative AI Service, simply by modifying the model_id variable in summarize_llm.py. Here are the currently supported models, including the newest Llama 3.1 model:

  • cohere.command-r-16k: A versatile model for general language tasks like text generation, summarization, and translation, with a context size of 16K tokens. Ideal for building conversational AI with a good balance of performance and cost-effectiveness.
  • cohere.command-r-plus: An enhanced version with more sophisticated understanding and deeper language capabilities. Best for complex tasks requiring nuanced responses and higher processing capacity.
  • meta.llama-3.1-70b-instruct: A 70B parameter model with 128K token context length and multilingual support.
  • meta.llama-3.1-405b-instruct: The largest publicly available LLM (405B parameters) with exceptional capabilities in reasoning, synthetic data generation, and tool use. Best for enterprise applications requiring maximum performance.

Check out the demo here

Getting Started

0. Prerequisites and setup

Follow these links below to generate a config file and a key pair in your ~/.oci directory:

After completion, you should have following 2 things in your ~/.oci directory:

  • A config file
  • A key pair named oci_api_key.pem and oci_api_key_public.pem
  • Now make sure you change the reference of key file in config file (where key file point to private key: key_file=/YOUR_OCI_CONFIG_DIR/oci_api_key.pem)

Then, we're going to configure a new file, called config.yaml that contains this structure, which will allow you to authenticate to OCI and call the OCI GenAI summarization model, to summarize the content from each project's README files:

compartment_id: "ocid1.compartment.oc1..ocid"
config_profile: "profile_name_in_your_oci_config"

Note: You can find your oci configuration in ~/.oci/config. Make sure you have previously installed OCI SDK in your computer.

Finally, we install Python dependencies:

pip install -r requirements.txt

1. Automatically running everything

You can run the bash script to generate all outputs in the output/ dir:

chmod a+x run.sh # if you don't have exec permissions initially for the .sh file
./run.sh

2. (Optional) Running each component step-by-step

scrapy runspider trending_spider.py # this will get trending repositories
scrapy runspider info_spider.py # then, for each trending repository, it will extract info.
python main.py # to process their README.md files as well, and runs a summarizer on top of it.

Appendix: Getting Started with LinkedIn Poster

Note that the described resources in this annex are unofficial, as LinkedIn does not have an official API to publish, and this process has to be emulated with a 3-legged access token using their Developer Portal. This part is experimental and you should probably look for a more robust way to automate publishing to social media - still, if you're interested in how this 3-legged access token can be used, here are the steps:

  1. Create or use an existing developer application from the LinkedIn Developer Portal
  2. Request access to the Sign In With LinkedIn API product. This is a self-serve product that will be provisioned immediately to your application.
  3. Generate a 3-legged access token using the Developer Portal token generator tool, selecting the r_liteprofile scope.

Physical Architecture

arch

Notes/Issues

None at this moment.

URLs

Contributing

This project is open source. Please submit your contributions by forking this repository and submitting a pull request! Oracle appreciates any contributions that are made by the open source community.

License

Copyright (c) 2024 Oracle and/or its affiliates.

Licensed under the Universal Permissive License (UPL), Version 1.0.

See LICENSE for more details.

ORACLE AND ITS AFFILIATES DO NOT PROVIDE ANY WARRANTY WHATSOEVER, EXPRESS OR IMPLIED, FOR ANY SOFTWARE, MATERIAL OR CONTENT OF ANY KIND CONTAINED OR PRODUCED WITHIN THIS REPOSITORY, AND IN PARTICULAR SPECIFICALLY DISCLAIM ANY AND ALL IMPLIED WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. FURTHERMORE, ORACLE AND ITS AFFILIATES DO NOT REPRESENT THAT ANY CUSTOMARY SECURITY REVIEW HAS BEEN PERFORMED WITH RESPECT TO ANY SOFTWARE, MATERIAL OR CONTENT CONTAINED OR PRODUCED WITHIN THIS REPOSITORY. IN ADDITION, AND WITHOUT LIMITING THE FOREGOING, THIRD PARTIES MAY HAVE POSTED SOFTWARE, MATERIAL OR CONTENT TO THIS REPOSITORY WITHOUT ANY REVIEW. USE AT YOUR OWN RISK.

About

Extract and summarize web content using OCI Generative AI for quick analysis and comprehension.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published