There are certain links (such as the age verification link mentioned in
issue #1083) that should trigger removal of the entire container div on
the results page, rather than just hiding the link itself.
This introduces a new `unsupported_g_divs` list that holds links that
will trigger a removal of the result div on the result page.
Fixes#1083
Scroller results (like the "latest from ___" or "top stories" results)
shouldn't have a site icon associated with them. This extracts the class
that those types of results have and skips over the process of inserting
an icon.
Audio controls are now always shown by default (mostly found in searches
that contain word pronunciation guides).
Site icons were moved to the left side of the results.
This appends an icon element to each search result, using the result
domain's "/favicon.ico" path.
Note that some sites do not have a standard /favicon.ico, but have a
unique path to a specifically sized favicon instead. Worse still, some
sites use javascript to load their favicon, which would make it even
more difficult for Whoogle to figure out.
For now this approach is fine, but can be expanded upon in the future
if desired.
Fix the exception `AttributeError: 'Filter' object has no attribute 'block_url'`
introduced in this commit [1].
`self.block_title` and `self.block_url` were members of the Filter
object[2], but not anymore after commit [1].
This bug can be reproduced with setting WHOOGLE_CONFIG_BLOCK_URL to a
non-empty string.
[1] 10a15e06e1
[2] 284a8102c8
Navigating between pages of results now includes the user's preferences
string, which allows them to retain their config for a particular
instance between result pages.
Fixes#960
Medium redirects needed further cleanup to account for instances where a
link contains a subdomain that would not make sense in a Farside
redirect link.
Fixes#947
Replacing result links and text when site alts are enabled is now part
of its own function, and handles replacement of link location and link
description separately.
Fixes#880
Queries performed in a different language than what is configured
contain a result div that prompts the user to configure their language
preferences using google's preferences page.
Since we want all language configuration to occur on Whoogle only, we
can safely remove this result div.
Fixes#444Fixes#386
The majority of image links and links that are not handle by whoogle are not
opening in new tabs, this allow links that are not related to the application
to open in new tabs.
Google updated their styling of the result page, which broke some
components of Whoogle's result page styling (namely the result div
backgrounds for dark mode).
The GClasses class has been updated to keep track of what class names
have been updated to, and roll them back to a value that works for
Whoogle. A function was added that loops through new class names and
replaces them with their older counterparts.
A user reported a bug where searches with a leading slash (in this case:
"/e/OS apps" were interpreted as a Google specific link when clicking
the next page of results.
This was due to the behavior that Google's search results exhibit, where
internal links for pages like support.google.com are delivered with
params like "?q=/support" rather than a direct link. This fixes that
scenario by checking the "q" param value against the user's original
query to ensure they don't match before assuming that the result is
intended as a redirect.
Fixes#776
It appears that result links beginning with '/url' were mistakenly
commited with an inefficient filtering process in its place. With the
way the code is structured, this less effective '/url' link filter took
precedence over the previous link filter, and also caused users with the
"open link in new tab" config enabled to no longer have access to that
feature.
Fixes#769
The leading slash was previously removed without noticing it was part of a
string replacement in #734. This caused the href of "View Image" contain a
leading "/" which is wrong.
Pages in the Whoogle footer that by default route to Google pages were
previously being removed, but caused results that also routed to similar
pages to no longer be accessible. This was due to the removal of the
'/url' endpoint that Google uses for each result.
To fix this, the result link is now parsed so that the domain of the
result can be checked against the disallowed G page list. Since results
are delivered in a "/url?q=<domain>" format -- even for pages to
Google's own products -- and the footer links are formatted as
"<product>.google.com", footer links are removed and result links are
parsed correctly.
Fixes#747
If a trailing slash is defined here, it causes the Whoogle instance to
redirect these element requests back to the home page, causing unwanted
behavior.
* Relativization of search results
* Fix JavaScript error when opening images
* Replace single-letter logo and remove sign-in link
* Add `WHOOGLE_URL_PREFIX` env var to support relative path redirection
The `WHOOGLE_URL_PREFIX` var can now be set to fix internal app
redirects, such as the `/session` redirect performed on the first visit
to the Whoogle home page.
Co-authored-by: Ben Busby <contact@benbusby.com>
* Expand `/window` endpoint to behave like a proxy
The `/window` endpoint was previously used as a type of proxy, but only
for removing Javascript from the result page. This expands the existing
functionality to allow users to proxy search result pages (with or without
Javascript) through their Whoogle instance.
* Implement filtering of remote content from css
* Condense NoJS feature into Anonymous View
Enabling NoJS now removes Javascript from the Anonymous View, rather
than creating a separate option.
* Exclude 'data:' urls from filter, add translations
The 'data:' url must be allowed in results to view certain elements on
the page, such as stars for review based results.
Add translations for the remaining languages.
* Add cssutils to requirements
If the alt for a particular service is blank, the original source is
used instead.
Example:
1. Site alts enabled in config
2. User wants wikipedia links, not wikiless
3. WHOOGLE_ALT_WIKI set to ""
4. All available alt links redirected to farside, except wikipedia
Fixes#704
Links that were directed at G domains were previously removed
universally, when really they only needed to be removed from the footer
to reduce possible confusion caused by mixed Whoogle and G links.
Fixes#656
Introduces a header for switching between result types (i.e. "All", "News",
etc) that is consistent between the different result types. Previously, image
results had a tab header that was formatted in a drastically different manner,
which was jarring when switching from a different result page to the Images
page.
Created a G class enum to reference class names returned in search
results. As noted in the class doc, this should only be used/updated as
a last resort, as class names change frequently. For some instances,
such as replacing the tbm tab, it's a lot easier to just replace by
header name than attempting to replace it based on how the element is
structured.
Also updated a few styles to revert the latest styling changes being
applied by Google.
Co-authored-by: jacr13 <ramos.joao@protonmail.com>
Co-authored-by: Ben Busby <contact@benbusby.com>
Google's latest formatting changes broke the modifications made when enabling
`WHOOGLE_MINIMAL`. This updates the result filtering to work with the new
changes.
Fixes#634
Previously had hardcoded POST requests for all requests that didn't use
the header template (which currently is only the image tab).
Also refactored how the Filter class works. It now requires a valid
Config model to be provided, which is then set up as a class var that
the filtering functions can use as needed, rather than setting specific
values from the config as individual values (which was confusing and
sloppy).
Fixes#561
This introduces a new approach to handling user sessions, which should
allow for users to set more reliable config settings on public instances.
Previously, when a user with cookies disabled would update their config,
this would modify the app's default config file, which would in turn
cause new users to inherit these settings when visiting the app for the
first time and cause users to inherit these settings when their current
session cookie expired (which was after 30 days by default I believe).
There was also some half-baked logic for determining on the backend
whether or not a user had cookies disabled, which lead to some issues
with out of control session file creation by Flask.
Now, when a user visits the site, their initial request is forwarded to
a session/<session id> endpoint, and during that subsequent request
their current session id is matched against the one found in the url. If
the ids match, the user has cookies enabled. If not, their original
request is modified with a 'cookies_disabled' query param that tells
Flask not to bother trying to set up a new session for that user, and
instead just use the app's fallback Fernet key for encryption and the
default config.
Since attempting to create a session for a user with cookies disabled
creates a new session file, there is now also a clean-up routine included
in the new session decorator, which will remove all sessions that don't
include a valid key in the dict. NOTE!!! This means that current user
sessions on public instances will be cleared once this update is merged
in. In the long run that's a good thing though, since this will allow session
mgmt to be a lot more reliable overall for users regardless of their cookie
preference.
Individual user sessions still use a unique Fernet key for encrypting queries,
but users with cookies disabled will use the default app key for encryption
and decryption.
Sessions are also now (semi)permanent and have a lifetime of 1 year.
Activating minimal mode should also remove all collapsed sections, if
any are found.
WHOOGLE_MINIMAL now documented in readme and app.json (for heroku).
The levelup.gitconnected.com site is a Medium site that can also be
replaced with scribe.rip whenever privacy respecting site alternatives
are enabled in the config.
Also modified how link descriptions are updated when that config is
enabled (before it was missing replacements on quite a few
descriptions).
Previously if a result element marked for collapsing didn't have a valid
"parent" element, the collapsing was skipped altogether. This loops
through child elements until a valid parent is found (or if one isn't
found, the element will not be collapsed).
Sections such as "People also asked" and "related searches" typically
take up a lot of room on the results page, and don't always have the
most useful information. This checks for result elements with more than
7 child divs, extracts the section title, and wraps all elements in a
"details" element that can be expanded/collapsed by the user.
Note that this functionality existed previously (albeit not implemented
as well), but due to changes in how Google returns searches (switching
from using <h2> elements for section headers to <span> or <div>
elements), the approach to collapsing these sections needed to be
updated.
Since the interface language defaults to IP geolocation by google, the
default language is now set to english. Still not sure if this is the
best solution, but at least temporarily should clear up some confusion
for users with instances deployed in countries outside of their own.
Also performed some minor cleanup:
- Updated name of strip_blocked_sites to clean_query
- Added clean_query to list of jinja template functions
- Ensured site block list doesn't contain duplicate filters