Add a function to check if target_word contains CJK characters
If a search term contains Chinese, Japanese, or Korean characters,
the term is bolded in search results regardless of whitespace.
CJK characters: Chinese, Japanese (hiragana, katakana, kanji),
and Korean (hangul syllables, hangul jamo)
Co-authored-by: Ben Busby <contact@benbusby.com>
Parent sites using a 'www' subdomain or something similar were not
redirecting properly. This updates the hostname check to only validate
against the primary domain, except for Wikipedia since the subdomain is
used for interface translation in that case.
Fixes#901
Replacing result links and text when site alts are enabled is now part
of its own function, and handles replacement of link location and link
description separately.
Fixes#880
New changes to google search now include ads prefixed with the keyword
"sponsored". This update should remove these from appearing in search
results.
Fixes#871
Adds support for encoding (and optionally encrypting) user config values as
a single string that can be passed to any endpoint with the "preferences" url
param.
Co-authored-by: Ben Busby <contact@benbusby.com>
Farside can now redirect quora links to querte instances and imdb links
to libremdb instances. This updates Whoogle to perform link replacements
for both services when site alts are configured.
For users running local instances of service alternatives such as
invidious, the alt replacement procedure broke if the scheme of the
original service (almost always https) didn't match the scheme of their
defined local service (likely http).
This adds a small check to see if the alt has a defined scheme, and if
so, removes the original scheme for that result.
Fixes#806
Wikipedia -> Wikiless redirects always result in an english language
result, even if the Wikipedia result would've been in a non-english
language. This is due to Wikipedia using language specific subdomains
(i.e. de.wikipedia.org, en.wikipedia.org, etc) whereas Wikiless uses a
"lang" url param.
This has been fixed by inspecting the subdomain of the wikipedia link
and passing that value to Wikiless as the lang param if it's determined
to be a language specific value (currently just looking for a 2-char
subdomain).
See #805
The "anon-view" translation key is the correct one to use for accessing
anonymous view within the search results. "config-anon-view" is only for
the configuration menu on the home page.
* Relativization of search results
* Fix JavaScript error when opening images
* Replace single-letter logo and remove sign-in link
* Add `WHOOGLE_URL_PREFIX` env var to support relative path redirection
The `WHOOGLE_URL_PREFIX` var can now be set to fix internal app
redirects, such as the `/session` redirect performed on the first visit
to the Whoogle home page.
Co-authored-by: Ben Busby <contact@benbusby.com>
* Expand `/window` endpoint to behave like a proxy
The `/window` endpoint was previously used as a type of proxy, but only
for removing Javascript from the result page. This expands the existing
functionality to allow users to proxy search result pages (with or without
Javascript) through their Whoogle instance.
* Implement filtering of remote content from css
* Condense NoJS feature into Anonymous View
Enabling NoJS now removes Javascript from the Anonymous View, rather
than creating a separate option.
* Exclude 'data:' urls from filter, add translations
The 'data:' url must be allowed in results to view certain elements on
the page, such as stars for review based results.
Add translations for the remaining languages.
* Add cssutils to requirements
If the alt for a particular service is blank, the original source is
used instead.
Example:
1. Site alts enabled in config
2. User wants wikipedia links, not wikiless
3. WHOOGLE_ALT_WIKI set to ""
4. All available alt links redirected to farside, except wikipedia
Fixes#704
Wikipedia, imgur, and translate alternatives were all still using
hardcoded URLs when replaced with their respective alternative frontend.
This updates them to use farside instead.
Recent changes to ads in search results caused Whoogle to display ads
for certain searches. In particular, ads recently started appearing
grouped into one div, as opposed to a singular ad per div. This was
accompanied by the div label "ads" (instead of just "ad"), which threw
off the existing ad filter. The ad keyword blacklist has been updated
accordingly, and has been enhanced to only check against alpha chars for
each label.
This only seems to have affected English language searches, and only for
very specific searches.
Currency amounts returned by google seem to randomly include unicode
chars ('\xa0' noted in #642) which broke the currency calculator
included in the project. This ensures that only strings that can be
converted to float are ever used in the conversion.
Fixes#642
Removes dependency on class names for creating the "my ip" info card in
the results list for searches pertaining to the user's public IP.
Adds test to prevent this from happening again.
Note to anyone reading this and looking to contribute: please avoid
using hardcoded class names at all costs. This approach of
creating/removing content just results in issues if/when Google decides
to introduce/remove class names from the result page.
Fixes#657
Introduces a header for switching between result types (i.e. "All", "News",
etc) that is consistent between the different result types. Previously, image
results had a tab header that was formatted in a drastically different manner,
which was jarring when switching from a different result page to the Images
page.
Created a G class enum to reference class names returned in search
results. As noted in the class doc, this should only be used/updated as
a last resort, as class names change frequently. For some instances,
such as replacing the tbm tab, it's a lot easier to just replace by
header name than attempting to replace it based on how the element is
structured.
Also updated a few styles to revert the latest styling changes being
applied by Google.
Co-authored-by: jacr13 <ramos.joao@protonmail.com>
Co-authored-by: Ben Busby <contact@benbusby.com>
* Integrate Farside into Whoogle
When instances are ratelimited (when a captcha is returned instead of
the user's search results) the user can now hop to a new instance via
Farside, a new backend service that redirects users to working instances
of a particular frontend. In this case, it presents a user with a
Farside link to a new Whoogle (or Searx) instance instead, so that the
user can resume their search.
For the generated Farside->Whoogle link, the generated link includes the
user's current Whoogle configuration settings as URL params, to ensure a
more seamless transition between instances. This doesn't translate to
the Farside->Searx link, but potentially could with some changes.
* Expand conversion of config<->url params
Config settings can now be translated to and from URL params using a
predetermined set of "safe" keys (i.e. config settings that easily
translate to URL params).
* Allow jumping instances via Farside when ratelimited
When instances are ratelimited (when a captcha is returned instead of
the user's search results) the user can now hop to a new instance via
Farside, a new backend service that redirects users to working instances
of a particular frontend. In this case, it presents a user with a
Farside link to a new Whoogle (or Searx) instance instead, so that the
user can resume their search.
For the generated Farside->Whoogle link, the generated link includes the
user's current Whoogle configuration settings as URL params, to ensure a
more seamless transition between instances. This doesn't translate to
the Farside->Searx link, but potentially could with some changes.
Closes#554Closes#559
This implements a method for converting between various currencies. When a user
searches "<currency A> to <currency B>" (including when prefixed by a specific
amount), they are now presented with a table for quickly converting between the
two. This makes use of the currency ratio returned as the first "card" in
currency related searches, and the table is inserted into this same card.
This introduces a new approach to handling user sessions, which should
allow for users to set more reliable config settings on public instances.
Previously, when a user with cookies disabled would update their config,
this would modify the app's default config file, which would in turn
cause new users to inherit these settings when visiting the app for the
first time and cause users to inherit these settings when their current
session cookie expired (which was after 30 days by default I believe).
There was also some half-baked logic for determining on the backend
whether or not a user had cookies disabled, which lead to some issues
with out of control session file creation by Flask.
Now, when a user visits the site, their initial request is forwarded to
a session/<session id> endpoint, and during that subsequent request
their current session id is matched against the one found in the url. If
the ids match, the user has cookies enabled. If not, their original
request is modified with a 'cookies_disabled' query param that tells
Flask not to bother trying to set up a new session for that user, and
instead just use the app's fallback Fernet key for encryption and the
default config.
Since attempting to create a session for a user with cookies disabled
creates a new session file, there is now also a clean-up routine included
in the new session decorator, which will remove all sessions that don't
include a valid key in the dict. NOTE!!! This means that current user
sessions on public instances will be cleared once this update is merged
in. In the long run that's a good thing though, since this will allow session
mgmt to be a lot more reliable overall for users regardless of their cookie
preference.
Individual user sessions still use a unique Fernet key for encrypting queries,
but users with cookies disabled will use the default app key for encryption
and decryption.
Sessions are also now (semi)permanent and have a lifetime of 1 year.
This modifies the search result page by bold-ing all appearances
of any word in the original query. If portions of the query are in
quotes (i.e. "ice cream"), only exact matches of the sequence of
words will be made bold.
Co-authored-by: Ben Busby <noreply+git@benbusby.com>
The levelup.gitconnected.com site is a Medium site that can also be
replaced with scribe.rip whenever privacy respecting site alternatives
are enabled in the config.
Also modified how link descriptions are updated when that config is
enabled (before it was missing replacements on quite a few
descriptions).
This introduces a new UI element for displaying the client IP
address when a search for "my ip" is used.
Note that this does not show the IP address seen by Google
if Whoogle is deployed remotely. It uses `request.remote_addr`
to display the client IP address in the UI, not the actual address
of the server (which is what Google sees in requests sent from
remote Whoogle instances).
scribe.rip is a privacy respecting front end for medium.com. This
feature allows medium.com results to be replaced with scribe.rip links,
and works for both regular medium.com domains as well as user specific
subdomains (i.e. user.medium.com).
[scribe.rip website](https://scribe.rip)
[scribe.rip source code](https://git.sr.ht/~edwardloveall/scribe)
Co-authored-by: Ben Busby <noreply+git@benbusby.com>