Flask's `request.url` uses `http` as the protocol, which breaks
instances that enforce `https`, since the session redirect relies on
`request.url` for the follow-through URL.
This introduces a new method for determining the correct URL to use for
these redirects by automatically replacing the protocol with `https` if
the `HTTPS_ONLY` env var is set for that instance.
Fixes#538Fixes#545
This introduces a new approach to handling user sessions, which should
allow for users to set more reliable config settings on public instances.
Previously, when a user with cookies disabled would update their config,
this would modify the app's default config file, which would in turn
cause new users to inherit these settings when visiting the app for the
first time and cause users to inherit these settings when their current
session cookie expired (which was after 30 days by default I believe).
There was also some half-baked logic for determining on the backend
whether or not a user had cookies disabled, which lead to some issues
with out of control session file creation by Flask.
Now, when a user visits the site, their initial request is forwarded to
a session/<session id> endpoint, and during that subsequent request
their current session id is matched against the one found in the url. If
the ids match, the user has cookies enabled. If not, their original
request is modified with a 'cookies_disabled' query param that tells
Flask not to bother trying to set up a new session for that user, and
instead just use the app's fallback Fernet key for encryption and the
default config.
Since attempting to create a session for a user with cookies disabled
creates a new session file, there is now also a clean-up routine included
in the new session decorator, which will remove all sessions that don't
include a valid key in the dict. NOTE!!! This means that current user
sessions on public instances will be cleared once this update is merged
in. In the long run that's a good thing though, since this will allow session
mgmt to be a lot more reliable overall for users regardless of their cookie
preference.
Individual user sessions still use a unique Fernet key for encrypting queries,
but users with cookies disabled will use the default app key for encryption
and decryption.
Sessions are also now (semi)permanent and have a lifetime of 1 year.
Validation of the Tor connection occasionally fails with a
ConnectionError from requests, which was previously uncaught. This is
now handled appropriately (error message shown and connection dropped).
Fixes#532
Using `format` for formatting bang queries caused a KeyError for some
searches, such as !hd (HUDOC). In that example, the URL returned in the
bangs json was `http://...#{%22fulltext%22:[%22{}%22]...`, where
standard formatting would not work due to the misidentification of
"fulltext" as a formatting key.
The logic has been updated to just replace the first occurence of "{}"
in the URL returned by the bangs dict.
Fixes#513
DDG style bang searches can now have the bang (!) at the end of
the search (i.e. "bologna w!" will now redirect to wikipedia just like
"bologna !w" would)
This modifies the search result page by bold-ing all appearances
of any word in the original query. If portions of the query are in
quotes (i.e. "ice cream"), only exact matches of the sequence of
words will be made bold.
Co-authored-by: Ben Busby <noreply+git@benbusby.com>
The levelup.gitconnected.com site is a Medium site that can also be
replaced with scribe.rip whenever privacy respecting site alternatives
are enabled in the config.
Also modified how link descriptions are updated when that config is
enabled (before it was missing replacements on quite a few
descriptions).
This introduces a new UI element for displaying the client IP
address when a search for "my ip" is used.
Note that this does not show the IP address seen by Google
if Whoogle is deployed remotely. It uses `request.remote_addr`
to display the client IP address in the UI, not the actual address
of the server (which is what Google sees in requests sent from
remote Whoogle instances).
scribe.rip is a privacy respecting front end for medium.com. This
feature allows medium.com results to be replaced with scribe.rip links,
and works for both regular medium.com domains as well as user specific
subdomains (i.e. user.medium.com).
[scribe.rip website](https://scribe.rip)
[scribe.rip source code](https://git.sr.ht/~edwardloveall/scribe)
Co-authored-by: Ben Busby <noreply+git@benbusby.com>
On app init, short hashes are generated from file checksums to use for
cache busting. These hashes are added into the full file name and used
to symlink to the actual file contents. These symlinks are loaded in the
jinja templates for each page, and can tell the browser to load a new
file if the hash changes.
This is only in place for css and js files, but can be extended in the
future for other file types if needed.
The new site filter breaks links to Maps results, so filter.py needed
to be updated to handle these links as a unique case. A new method was
introduced to easily remove any "-site:..." filters from the query,
which is now also used to format queries in the header template rather
than manually removing the blocked site list within the template itself.
Bumps version to 0.5.1 for releasing the bugfix
Fixes#329
* add view image option
* prevent whoogle links from opening in a new tab.
* remove view image template on mobile requests
* change loop values to be more robust to the number of images
* Update app/templates/imageresults.html
* fix "Basically the .cvifge class needs width: 100%; in order to expand the search input to fit the form width."
* Update app/templates/imageresults.html
* remove hardcoded string from template
* Add view image config var to app.json
* Add view image config var to whoogle.env
Co-authored-by: jacr13 <ramos.joao@protonmail.com>
Co-authored-by: Ben Busby <benbusby@protonmail.com>
* Block websites in search results via user config
Adds a new config field "Block" to specify a comma separated list of
websites to block in search results. This is applied for all searches.
* Add test for blocking sites from search results
* Document WHOOGLE_CONFIG_BLOCK usage
* Strip '-site:' filters from query in header template
The 'behind the scenes' site filter applied for blocked sites was
appearing in the query field when navigating between search categories
(all -> images -> news, etc). This prevents the filter from appearing in
all except "images", since the image category uses a separate header.
This should eventually be addressed when the image page can begin using
the standard whoogle header, but until then, the filter will still
appear for image searches.
This moves away from the previous (messy) approach of using two separate
keys for decrypting text and element URLs separately and regenerating
them for new searches. The current implementation of sessions is not very
reliable, which lead to keys being regenerated too soon, which would
break page navigation. Until that can be addressed, the single
key per session approach should work a lot better.
Fixes#250Fixes#90
The previous implementation of the is_heroku check in
search.needs_https() was implemented to only match URLs ending in
'.herokuapp.com', and skipped upgrading to HTTPS for other endpoints.
This allows the user to enable their preferred settings in a variety of
ways, depending on their deployment preference. Values added to
whoogle.env can be enabled using WHOOGLE_DOTENV=1, in which case all
values in the env var file will overwrite defaults or user provided
settings.
Co-authored-by: Ben Busby <benbusby@protonmail.com>
Eventually this should be part of a separate mypy ci build, but right
now it's just a general guideline. Future commits and PRs should be
validated for static typing wherever possible.
For reference, the testing commands used for this commit were:
mypy --ignore-missing-imports --pretty --disallow-untyped-calls app/
mypy --ignore-missing-imports --pretty --disallow-untyped-calls test/
The app/utils/*_utils weren't named very well, and all have been updated
to have more accurate names.
Function and class documention for the utils have been updated as well,
as part of the effort to improve overall documentation for the project.
Bang operator can now be placed anywhere in the query, to allow for peak
efficiency in stream of consciousness querying (i.e. `big !reddit
chungus` will search reddit for big chungus`).
Fixes#196
* Adds the ability to redirect reddit.com to libredd.it using the existing
"site alts" config setting.
This adds the WHOOGLE_ALT_RD environment variable for optionally
redirecting reddit links to libreddit
(https://github.com/spikecodes/libreddit).
* Include libreddit in home page site alt note
Heroku instances were using the base http url when formatting the
opensearch.xml template. This adds a new routing utility, "needs_https",
which can be used for determining if the url in question needs
upgrading.
The lxml dependency in the project was fairly unnecessary, and made the
initial build time for the project considerably slower. This replaces
all instances of lxml with either the default html parser (for bs4
constructors) or the built in xml.etree package (for search suggestion
parsing).
The BeautifulSoup constructur in gen_nojs needed to explicitly set
features='lxml' to silence a warning from the library.
Also temporarily disabled the site alts test since the results are too
unreliable. This should be moved to a unit test instead.
* Add ability to configure site alts w/ env vars
Site alternatives (i.e. twitter.com -> nitter.net) can now be configured
using environment variables:
WHOOGLE_ALT_TW='nitter.net' # twitter alt
WHOOGLE_ALT_YT='invidio.us' # youtube alt
WHOOGLE_ALT_IG='bibliogram.art/u' # instagram alt
Updated testing to confirm results have been modified.
* Add site alt vars to docker settings and readme
The invidious instance has been updated to invidious.snopyta.org, since
this instance is more reliable and has more users according to
instances.invidio.us
All site alternative redirects now redirect without the 'www' subdomain,
since most of the alternative sites don't have this subdomain set up.
Dark mode, country, interface language, and search language configs
can now be set in the search query by appending each option as a
url parameter.
Supported args are: 'dark', 'lang_search', 'lang_interface', and 'ctry'
Ex: /search?q=%s&dark=1&lang_search=lang_en...
These config settings persist across page navigation and switching
result type, but will be reset if the main search bar is used.
See #144
* Add tor and http/socks proxy support
Allows users to enable/disable tor from the config menu, which will
forward all requests through Tor.
Also adds support for setting environment variables for alternative
proxy support. Setting the following variables will forward requests
through the proxy:
- WHOOGLE_PROXY_USER (optional)
- WHOOGLE_PROXY_PASS (optional)
- WHOOGLE_PROXY_TYPE (required)
- Can be "http", "socks4", or "socks5"
- WHOOGLE_PROXY_LOC (required)
- Format: "<ip address>:<port>"
See #30
* Refactor acquire_tor_conn -> acquire_tor_identity
Also updated travis CI to set up tor
* Add check for Tor socket on init, improve Tor error handling
Initializing the app sends a heartbeat request to Tor to check for
availability, and updates the home page config options accordingly. This
heartbeat is sent on every request, to ensure Tor support can be
reconfigured without restarting the entire app.
If Tor support is enabled, and a subsequent request fails, then a new
TorError exception is raised, and the Tor feature is disabled until a
valid connection is restored.
The max attempts has been updated to 10, since 5 seemed a bit too low
for how quickly the attempts go by.
* Change send_tor_signal arg type, update function doc
send_tor_signal now accepts a stem.Signal arg (a bit cleaner tbh). Also
added the doc string for the "disable" attribute in TorError.
* Fix tor identity logic in Request.send
* Update proxy init, change proxyloc var name
Proxy is now only initialized if both type and location are specified,
as neither have a default fallback and both are required. I suppose the
type could fall back to http, but seems safer this way.
Also refactored proxyurl -> proxyloc for the runtime args in order to
match the Dockerfile args.
* Add tor/proxy support for Docker builds, fix opensearch/init
The Dockerfile is now updated to include support for Tor configuration,
with a working torrc file included in the repo.
An issue with opensearch was fixed as well, which was uncovered during
testing and was simple enough to fix here. Likewise, DDG bang gen was
updated to only ever happen if the file didn't exist previously, as
testing with the file being regenerated every time was tedious.
* Add missing "@" for socks proxy requests
Initialization of the app now includes generation of a ddg-bang json
file, which is used for all bang style searches afterwards.
Also added search suggestion handling for bang json lookup. Queries
beginning with "!" now reference the bang json file to pull all keys
that match.
Updated test suite to include basic tests for bang functionality.
Updated gitignore to exclude bang subdir.
Improves clarity of the meaning behind the "Country" filter -- Google
seemingly uses this value to only return results that are hosted in a
particular country, as evidenced in the search differences highlighted
in #123. It now mentions that the results are filtered by website
hosting location.
Also, now that invidio.us is shut down, the fallback URL (invidiou.site)
is now used instead.