Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] 2023 updates #76

Open
wants to merge 30 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
179a962
Update requirements
simonwoerpel Nov 2, 2023
86c5052
Ignore scraper data directories
simonwoerpel Nov 2, 2023
fd28600
Update scraper user agent
simonwoerpel Nov 2, 2023
6f3d0ca
BE: 2023 tweaks
simonwoerpel Nov 2, 2023
8e32816
ee spider: fix year parameter to take effect
Tilana Nov 3, 2023
104c68d
es_scraper: add 2022 url and apply minor fixes
Tilana Nov 3, 2023
ddf652b
fr_scraper: adjust for 2022 data
Tilana Nov 3, 2023
2eb0b7b
gb_scraper: adjust to 2022
Tilana Nov 3, 2023
8eaccee
LU: scrapy working 2023
simonwoerpel Nov 5, 2023
497d221
AT: scrapy working 2023
simonwoerpel Nov 5, 2023
a1b0ebc
DE: Add scrapy scraper
simonwoerpel Nov 5, 2023
a90d424
LV: add scrapy scraper
simonwoerpel Nov 5, 2023
e04b65b
SK: Fix scrapy scraper for 2022
simonwoerpel Nov 5, 2023
ec23cd1
HU: Update converter for 2023
simonwoerpel Nov 5, 2023
e9634bf
EU: Cool URIs always change
simonwoerpel Nov 5, 2023
045ecad
here as well -.-
simonwoerpel Nov 5, 2023
580e1fa
Merge pull request #1 from investigativedata/dev
simonwoerpel Nov 7, 2023
6840e9b
BE: slugify
simonwoerpel Nov 7, 2023
d15ec4c
MT: add a bad notebook
simonwoerpel Nov 9, 2023
e1088e7
LT: add some code
simonwoerpel Nov 9, 2023
2ba064a
CY: adjust data source to 2022 / flag old scraper
Tilana Nov 9, 2023
e7017d3
Improve folder structure
jfilter Nov 22, 2024
fb13a5e
Improve README
jfilter Nov 22, 2024
27419ca
Update requirements to get it to run again
jfilter Nov 22, 2024
2639dc1
Add basic cli structure
jfilter Dec 8, 2024
6a4d5b8
Add file download
jfilter Dec 9, 2024
e5c892c
Add file processing
jfilter Dec 10, 2024
12117e2
Add description for hard-to-automate exports
jfilter Dec 12, 2024
b7ac44f
Add new RO scraper
jfilter Dec 12, 2024
0e71615
Add option to scrape sequentially
jfilter Dec 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# EditorConfig is awesome: http://EditorConfig.org

# top-most EditorConfig file
root = true

# Tab indentation
[*]
indent_style = space
indent_size = 4
trim_trailing_whitespace = true
insert_final_newline = true

# The indent size used in the `package.json` file cannot be changed
# https://github.com/npm/npm/pull/3180#issuecomment-16336516
[{.travis.yml,npm-shrinkwrap.json,package.json}]
indent_style = space
indent_size = 4
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
data
.idea/
*.pyc
*.swp
Expand All @@ -11,4 +12,5 @@
*.jpg
*.xlsx
.ipynb_checkpoints
lv/__pycache__/
__pycache__
cache
21 changes: 5 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,8 @@
FarmSubsidy.org Scrapers
========================
# FarmSubsidy.org Scrapers

[FarmSubsidy](http://farmsubsidy.openspending.org/) is a website that collects the payment data of the Common Agriculture Policy (CAP) which represents about a third of the EU budget. It was run by a group of journalists and activists for the past years. In 2013 the [OpenSpending project](http://openspending.org/) of the [Open Knowledge Foundation](http://okfn.org/) took over responsibility of the website.
[FarmSubsidy](https://farmsubsidy.org) is a platform that collects payment data related to the EU’s Common Agricultural Policy (CAP), which accounts for approximately one-third of the EU budget. This repository focuses on the initial data collection phase, often using web scraping. However, many EU member states now offer bulk data downloads, reducing the need for scraping.

The FarmSubsidy data is mostly scraped from member state websites. The old scrapers were working well, but were running in costly and proprietary software. This year we need Free and Open Source scrapers and this repository will collect these scrapers and coordinate the effort.
## Resources

Please have a look at the [member state scraper issues](https://github.com/openspending/farmsubsidy-scrapers/issues?labels=memberstate&page=1&state=open). If you can help provide a scraper that would be awesome.


Developer Documentation
-----------------------

Developer documentation for both website and scrapers can be found at http://farmsubsidy.readthedocs.org.

[Member states data sites](http://ec.europa.eu/agriculture/cap-funding/beneficiaries/shared/index_en.htm)


[Financial Reports](http://ec.europa.eu/agriculture/cap-funding/financial-reports/index_en.htm)
- **[Member States Data Sites](https://agriculture.ec.europa.eu/common-agricultural-policy/financing-cap/beneficiaries_en):** Links to member states’ CAP payment data portals.
- **[Financial Reports](http://ec.europa.eu/agriculture/cap-funding/financial-reports/index_en.htm):** Summary reports on CAP funding and expenditures.
6 changes: 0 additions & 6 deletions bg/README.md

This file was deleted.

Loading