Crawl and download meta information and documents on technical standards and contributions
You can install the development version from GitHub with:
pip install git+https://github.com/lorenzbr/pystandards.git
Please make sure you have Google Chrome and the corresponding chromedriver.exe (see here) installed to crawl meta information on ITU-T recommendations.
- Crawl meta information on IEEE contributions (see here)
- You can find the name of a standard (std_name) by clicking on the standard of interest. The standard name can be extracted from the URL as follows: https://mentor.ieee.org/ [standard name] /documents (e.g., 802.11, 802.16, ...)
- Please specify from which pages you want to get the meta information (start_page and end_page)
- Download IEEE contribution documents (see here)
- A data frame which contains the meta information on IEEE contributions, i.e. it has at least the three columns dl_link, file and doc_type
- A path where the documents are saved
- Crawl meta information on ITU-T recommendations/standards (see here)
- Specify the recommendation series (e.g., A, G, H, ...)
- Provide path and name of the Chrome driver
- Download ITU-T recommendation/standard documents (see here)
- A data frame which contains the meta information of ITU-T standards, i.e. it has at least the two columns download_link_recommendation and citation
- A path where the documents are saved
To parse standard documents and for related functions (e.g., accessing ETSI standard documents), see here.
# Crawl meta information and download IEEE contributions
from pystandards.itut_standards import itut_standards
from pystandards.ieee_contributions import ieee_contributions
ieee_contr = ieee_contributions(verbose=True)
# Name of the WiFi standard
std_name = "802.11"
# Get meta information
df_output = ieee_contr.get_meta(std_name, start_page=1, end_page=3)
# Download three contribution documents
df_download = df_output[0:3]
ieee_contr.download_contributions(df_download, path="")
# Crawl meta information and download ITU-T recommendations
itut_std = itut_standards(verbose=True)
series = ['A']
# Specify the file of the Chrome driver (required for the use of Selenium)
driver_file = "chromedriver.exe"
# Get meta information
df_output = itut_std.get_meta(series, driver_file)
# Download three standard documents as PDFs
df_download = df_output[0:3]
itut_std.download_standards(df_download, path="")
This repository is licensed under the MIT license.
See here for further information.