Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to the 2025 Accessibility SQL - For Government Sites #3848

Open
mgifford opened this issue Nov 10, 2024 · 4 comments
Open

Updates to the 2025 Accessibility SQL - For Government Sites #3848

mgifford opened this issue Nov 10, 2024 · 4 comments

Comments

@mgifford
Copy link
Contributor

mgifford commented Nov 10, 2024

I don't want to add the PR yet, but do want to capture the changes requested.

https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/sql/2024/accessibility/lighthouse_score_by_government.sql
https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/sql/2024/accessibility/lighthouse_score_by_government_with_urls.sql

I was also interested in seeing if it would be possible to somehow consolidate these two queries since most of the guts of the two is the same.

Some UK government institutions to add:

Some state url don't use .gov state.mi.us, state.in.us, state.mn.us, etc.

The .gov scan should be trimmed more as it is catching some mi.gov.br sites in this. I just wasn't able to write a better SQL than this. It is still mostly accurate, but includes some random sites.

Looking at the list of official French list of domains, we could probably extend it to include:
\b(?:www.)?grand[\w-]+.(fr|alsace|bzh)\b
\b(?:www.)?(?:region[\w-]+|[\w-]+region).(fr|alsace|bzh)\b
\b(?:www.)?(?:region[\w-]+|[\w-]+region).(fr|alsace|bzh)\b
\b(?:www.)?departement[\w-].(fr|alsace|bzh)\b
\b(?:www.)?cc-[\w-]+.(fr|alsace|bzh)\b
\b(?:www.)?[\w-]+.agglo.(fr|alsace|bzh)\b
\b(?:www.)?commune(?:-[\w]+)
.(fr|alsace|bzh)\b
\b(?:www.)?mairie(?:-[\w]+)*.(fr|alsace|bzh)\b
\b(?:www.)?[\w-]+.gouv.(fr|alsace|bzh|app)\b

USA

I will add more below.

@mgifford mgifford changed the title Updates to the 2025 Accessibility SQL Updates to the 2025 Accessibility SQL - For Government Sites Nov 10, 2024
@mgifford
Copy link
Contributor Author

New Brunswick's government uses gnb.ca

It's in here:
WHEN REGEXP_CONTAINS(page, r'\.(gc\.ca|canada\.ca|alberta\.ca|gov\.ab\.ca|gov\.bc\.ca|manitoba\.ca|gov\.mb\.ca|gnb\.ca|gov\.nb\.ca|gov\.nl\.ca|novascotia\.ca|gov\.ns\.ca|ontario\.ca|gov\.on\.ca|gov\.pe\.ca|quebec\.ca|gouv\.qc\.ca|revenuquebec\.ca|saskatchewan\.ca|gov\.sk\.ca|gov\.nt\.ca|gov\.nu\.ca|yukon\.ca|gov\.yk\.ca)/') THEN 'Canada'

So it should be downloaded, but not in:

      '|\\.gc\\.ca/'  -- Canada and provinces
      '|\\.canada\\.ca/'
      '|\\.alberta\\.ca/'
      '|\\.gov\\.ab\\.ca/'
      '|\\.gov\\.bc\\.ca/'
      '|\\.manitoba\\.ca/'
      '|\\.gov\\.mb\\.ca/'
      '|\\.gnb\\.ca/'
      '|\\.gov\\.nb\\.ca/'
      '|\\.gov\\.nl\\.ca/'
      '|\\.novascotia\\.ca/'
      '|\\.gov\\.ns\\.ca/'
      '|\\.ontario\\.ca/'
      '|\\.gov\\.on\\.ca/'
      '|\\.gov\\.pe\\.ca/'
      '|\\.quebec\\.ca/'
      '|\\.gouv\\.qc\\.ca/'
      '|\\.revenuquebec\\.ca/'
      '|\\.saskatchewan\\.ca/'
      '|\\.gov\\.sk\\.ca/'
      '|\\.gov\\.nt\\.ca/'
      '|\\.gov\\.nu\\.ca/'
      '|\\.yukon\\.ca/'
      '|\\.gov\\.yk\\.ca/'

gnb.ca isn't included.

@mgifford
Copy link
Contributor Author

I've also missed Liechtenstein https://www.llv.li

There may also be regions here that would be good to check on:

https://digital-strategy.ec.europa.eu/en/policies/web-accessibility-monitoring

@mgifford
Copy link
Contributor Author

Luxembourg has a list of domains from:
https://data.public.lu/en/datasets/inventaire-des-sites-publics/

\b(?:https?://)?(?:www.)?(?:[a-zA-Z0-9-]+.)?(?:public|gov|etat|data|service|security|mfi|lux)(?:.public|.gov|.etat)?.lu\b

The US has a federal listing here:
https://github.com/cisagov/dotgov-data

France has this list:
https://gitlab.adullact.net/dinum/noms-de-domaine-organismes-secteur-public

The UK has:
https://www.gov.uk/government/publications/list-of-most-used-websites

Similar aggregations could be made to pull out more domains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant