-
-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding Luxembourg administrative sanctions #849
Conversation
There are some sanctions where the title doesn't mention the company, such as "Sanction administrative prononcée à l’encontre d’un réviseur d’entreprises agréé" (Administrative sanction pronounced against an approved company auditor). We are currently ignoring those. There also appears to be a bug where the lookup doesn't recognize 2 titles:
Thus they are raising warnings even though they are already in the YAML |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Some potential improvements
subtitle = ( | ||
response.find('.//*[@class="single-news__subtitle"]').text_content().strip() | ||
) | ||
res = context.lookup("subtitle_to_names", subtitle) | ||
if res: | ||
names = cast("List[str]", res.names) | ||
else: | ||
context.log.warning("Can't find the name of the company", text=subtitle) | ||
names = [subtitle] | ||
|
||
# If the subtitle doesn't contain any names | ||
if not names or len(names) == 0: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of maintaining all items in the yaml, I think we can come up with an algorithm that's fairly safe against accidental nonsense names:
- strip
r"(Sanction|amende) administrative prononcée à l’encontre d[eu] (gestionnaire de fonds d’investissement)?"
from the start - if the rest starts with upper case, that looks like a name, so use it
- else if there's a match in lookups, use the values from that
- else use the rest as a name and warn to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more rules to the REGEX and I was able to get more names, but there was still some that required lookups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah totally fine - it's just good to automatically handle the common cases so we don't literally have to add each and every record.
I realise now, the instances where they don't mention a name, we'll have to manually pull the name from PDF. How about the following logic to do a second lookup:
if not res.names:
pdf_res = lookups(detail_url)
if pdf_res is None:
#warn
names = names_res.names
else:
names = res.names
for name in names:
...
the fist lookup handles if we can't parse a name from the subtitle
the second lookup happens if the first lookup gave us an empty list, then we use the detail url to look up a name someone found in the PDF and added - for cases like these https://github.com/opensanctions/opensanctions/pull/849/files#diff-31dfffcd8054c60130382e3389c526fa63b3e350581059737aaf4f8ab5c12834R55
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to assume that we can give the name AIFM Law
for Sanction administrative prononcée à l’encontre d’un gestionnaire de fonds d’investissement
- it's totally conceivable that they'd use the same string again for a different sanction, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked all links and could only find 3 with a company name in the PDF. From what I understood, AIFM Law
is not a name of a company, it's the Alternative Investment Fund Managers Law.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol shows how well I read that. sorry. But this, looks good!
datasets/lu/administrative_sanctions/lu_administrative_sanctions.yaml
Outdated
Show resolved
Hide resolved
datasets/lu/administrative_sanctions/lu_administrative_sanctions.yaml
Outdated
Show resolved
Hide resolved
datasets/lu/administrative_sanctions/lu_administrative_sanctions.yaml
Outdated
Show resolved
Hide resolved
Thanks! looking good! Just a couple of changes then we're there |
Fixes opensanctions/crawler-planning#157