**403 Error** Cant access my .py files in my Repo? #121229
Replies: 3 comments
-
Hi @liot-io , here's the content of your import os
import html2text
from requests_html import HTMLSession
from urllib.parse import urlparse
def extract_filename_from_url(url: str) -> str:
"""Extract the filename from the URL."""
parsed_url = urlparse(url)
filename = parsed_url.netloc # Extract domain name from URL
return filename + ".md" # Add .md extension
def download_and_save_in_markdown(url: str, dir_path: str) -> None:
"""Download the HTML content from the web page and save it as a markdown file."""
# Extract a filename from the URL
filename = extract_filename_from_url(url)
print(f"Downloading {url} into {filename}...")
session = HTMLSession()
response = session.get(url, timeout=30)
# Check if the content type is HTML
content_type = response.headers.get('content-type', '')
if 'text/html' not in content_type:
print(f"Skipping {url} as it is not an HTML page")
return
# Render the page, which will execute JavaScript
response.html.render(timeout=60) # Increased timeout to 60 seconds
# Convert the rendered HTML content to markdown
h = html2text.HTML2Text()
markdown_content = h.handle(response.html.raw_html.decode("utf-8"))
# Write the markdown content to a file
filename = os.path.join(dir_path, filename)
if not os.path.exists(filename):
with open(filename, "w", encoding="utf-8") as f:
f.write(markdown_content)
def download_target_page(url: str) -> None:
"""Download the HTML content from the target page and save it as a markdown file."""
# Create the content directory if it doesn't exist
base_dir = os.path.dirname(os.path.abspath(__file__))
dir_path = os.path.join(base_dir, "content")
os.makedirs(dir_path, exist_ok=True)
# Download and save the target page
download_and_save_in_markdown(url, dir_path)
print("Target page has been successfully downloaded!")
# Define the target page
TARGET_PAGES = [
"https://Example.dk/",
]
if __name__ == "__main__":
for target_page in TARGET_PAGES:
download_target_page(target_page) |
Beta Was this translation helpful? Give feedback.
-
Hey there, That sounds frustrating! It's never fun to hit a roadblock when you're in the middle of a project. Let's try to troubleshoot this together. First off, have you checked your GitHub permissions and made sure you're logged in with the right account? Sometimes, things get a bit wonky with permissions, so it's worth a double-check. If everything seems fine on your end, it might be worth reaching out to GitHub support. They're usually pretty helpful and might be able to shed some light on what's going on. In the meantime, if you have local copies of those files, you should still be able to access them. If not, don't worry just yet. We'll figure this out together! |
Beta Was this translation helpful? Give feedback.
-
Hi! I can access all from my phone. Seems to be a problem with SSL certificate and Company policies - as i'm on my work computer. thank you for quick reply! and Thank you @davevad93! |
Beta Was this translation helpful? Give feedback.
-
Hi.
Suddenly i cannot access some of my scripts in my repo?
I can still open some of them, but others return an error with the following text:
Error loading page
An unexpected 403 error occured. Try reloading the page.
I have been working on some custom projects, that i would really hate to loose :S Is there someone or somehow to rescue the content in the files i cannot open?
Repo = https://github.com/liot-io/AIOpenDK/blob/main/projects/scrapers/web_scraper.py
Beta Was this translation helpful? Give feedback.
All reactions