LeetCode-AC-Code-Crawler

Origin

It is time consuming to copy and paste hundreds of problems that I solved on LeetCode, so I decided to write a crawler instead.

At the begining, I found a useful dependency in python called BeautifulSoup, but it seems that it is not able to deal with dynamic website, which means that if some elements shown in a website is controlled by javascript. After searching for a while, I found selenium, which automatically controll website like a human, but it seems that it is unefficient to use selenium to parse data in html, so I combine BeautifulSoup and selenium to finish this project.

Install Dependencies

# python3 setup.py install

Note: python2 is now not supported, unless there is a need. Feel free to let me know.

Install ChromeDriver

This project use Chromedriver in selenium, before running this project, you have to add the downloaded executable file to PATH enviroment variable, or if your python version is greater than 3.5, than you can enable chromedriver auto install by setting true in ["Chromedriver"]["AutoInstall"] in conf.json.

Note: Chromedriver-autoinstaller is a open source project, it is indeed convenient, but there are still some limitations, and if you are having issue using it, maybe consider manually download Chromedriver by yourself.

Setting

You have to modify variables in conf.json.

{
    "Username": "YOUR_USERNAME",
    "Password": "YOUR_PASSWORD",
    "OutputDir": "THE_ABSOLUTE_PATH_THAT_YOU_WANT_TO_SAVE_FILES",
    "Chromedriver": {
        "AutoInstall": false,
        "Path": "THE_ABSOLUTE_PATH_OF_CHROMEDRIVER"
    },
    "Headless": true,
    "Overwrite": false
}

Execute

After executing the project, each LeetCode problem that you've solved will have a folder containing a file named "Solution" with the language you used as its suffix.

I tried this project on Windows10 & Ubuntu, and it works fine.

Notice

If you have multiple sessions in your account, switch to the session that you want to crawl ac codes before execute.
If you set the Headless to true, and keep seeing Loading took too much time! message, maybe it's the Google reCAPTCHA causing the problem, this software is not able to handle this problem, you need to set the Headless to false, run the software again and then manually pass the Google reCAPTCHA testing.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
LeetCode_AC_Code_Crawler.py		LeetCode_AC_Code_Crawler.py
README.md		README.md
conf.json		conf.json
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LeetCode-AC-Code-Crawler

Origin

Install Dependencies

Install ChromeDriver

Setting

Execute

Notice

About

Releases

Packages

Languages

AristoChen/LeetCode-AC-Code-Crawler

Folders and files

Latest commit

History

Repository files navigation

LeetCode-AC-Code-Crawler

Origin

Install Dependencies

Install ChromeDriver

Setting

Execute

Notice

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages