Update ad filter

Recent changes to ads in search results caused Whoogle to display ads
for certain searches. In particular, ads recently started appearing
grouped into one div, as opposed to a singular ad per div. This was
accompanied by the div label "ads" (instead of just "ad"), which threw
off the existing ad filter. The ad keyword blacklist has been updated
accordingly, and has been enhanced to only check against alpha chars for
each label.

This only seems to have affected English language searches, and only for
very specific searches.
main
Ben Busby 2022-02-25 23:02:58 -07:00
parent 5069838e69
commit b28fa86e33
No known key found for this signature in database
GPG Key ID: B9B7231E01D924A1
1 changed files with 7 additions and 5 deletions

View File

@ -18,10 +18,11 @@ BLANK_B64 = ('data:image/png;base64,'
# Ad keywords # Ad keywords
BLACKLIST = [ BLACKLIST = [
'ad', 'anuncio', 'annuncio', 'annonce', 'Anzeige', '广告', '廣告', 'Reklama', 'ad', 'ads', 'anuncio', 'annuncio', 'annonce', 'Anzeige', '广告', '廣告',
'Реклама', 'Anunț', '광고', 'annons', 'Annonse', 'Iklan', '広告', 'Augl.', 'Reklama', 'Реклама', 'Anunț', '광고', 'annons', 'Annonse', 'Iklan',
'Mainos', 'Advertentie', 'إعلان', 'Գովազդ', 'विज्ञापन', 'Reklam', 'آگهی', '広告', 'Augl.', 'Mainos', 'Advertentie', 'إعلان', 'Գովազդ', 'विज्ञापन',
'Reklāma', 'Reklaam', 'Διαφήμιση', 'מודעה', 'Hirdetés', 'Anúncio' 'Reklam', 'آگهی', 'Reklāma', 'Reklaam', 'Διαφήμιση', 'מודעה', 'Hirdetés',
'Anúncio'
] ]
SITE_ALTS = { SITE_ALTS = {
@ -89,7 +90,8 @@ def has_ad_content(element: str) -> bool:
bool: True/False for the element containing an ad bool: True/False for the element containing an ad
""" """
return (element.upper() in (value.upper() for value in BLACKLIST) element_str = ''.join(filter(str.isalpha, element))
return (element_str.upper() in (value.upper() for value in BLACKLIST)
or '' in element) or '' in element)