Commit Graph

69 Commits (92e8ede24e9277a5440d403f75877209f1269884)

Author SHA1 Message Date
Ben Busby b39ba0533a
Suppress spurious warnings from bs4
More MarkupResemblesLocatorWarning warnings have been appearing. This
seems to be caused by parsing HTML content that contains a URL.

This new change suppresses the warning at the root level of the app
before any content has been parsed, so this error shouldn't appear
again.

Fixes #968
2023-03-22 12:29:05 -06:00
Ben Busby fdc63b862e
Autoload `whoogle.env` if it exists
The whoogle.env file previously needed to be created and enabled using
the WHOOGLE_DOTENV var. This removes the second step and loads the env
file if it's found during app init.

The Dockerfile has also been updated to copy in whoogle.env if it
exists.

Fixes #909
2023-01-04 10:35:42 -07:00
fiestasiesta 7041b43db9
Add time constraint to search options (#888)
Introduces the ability to refine searches by time period:
- Past hour
- Past 24 hours
- Past week
- Past month
- Past year

Co-authored-by: Ben Busby <contact@benbusby.com>
2022-12-21 13:24:27 -07:00
Ben Busby 0310f0f542
Use app init enc key by default for all queries
This can be updated later to allow users with cookies enabled to use a
key that is unique to their session (if they want, not mandatory), but
for now it makes more sense to just use a single key for all queries
from all users. This should eliminate a lot of issues that users have
reported where they are unable to decrypt queries or page elements due
to an expired/renewed session key.
2022-12-05 12:14:14 -07:00
Anna 08b16f5a0c
Switch to PEP517 standard for builds (#887)
* Sync setup.cfg with requirements.txt

* Include tests in PyPI tarballs

And exclude them from setuptools

* Set version number only once

Switch to PEP517 standard (pyproject.toml) for builds
2022-11-25 16:24:38 -07:00
Ben Busby d099b46336
Bump version to 0.8.0 2022-11-23 11:43:42 -07:00
João c42640e21c
Use `read_config_bool` for vars in app init (#848) 2022-09-20 11:11:27 -06:00
Ben Busby 32ad39d0e1
Refactor session behavior, remove `Flask-Session` dep
Sessions are no longer validated using the "/session/..." route. This
created a lot of problems due to buggy/unexpected behavior coming from
the Flask-Session dependency, which is (more or less) no longer
maintained.

Sessions are also no longer strictly server-side-only. The majority of
information that was being stored in user sessions was aesthetic only,
aside from the session specific key used to encrypt URLs. This key is
still unique per user, but is not (or shouldn't be) in anyone's threat
model to keep absolutely 100% private from everyone. Especially paranoid
users of Whoogle can easily modify the code to use a randomly generated
encryption key that is reset on session invalidation (and set
invalidation time to a short enough period for their liking).

Ultimately, this should result in much more stable sessions per client.
There shouldn't be decryption issues with element URLs or queries
during result page navigation.
2022-08-29 13:36:40 -06:00
Ben Busby cb5557cc2e
Check file sizes in session dir before validation
For pip installed instances of Whoogle, there seems to be an issue where
files other than sessions are being stored in the same directory as the
sessions. From a brief investigation, this does not seem to be caused by
Whoogle, since Flask-Session objects are the only files stored in that
directory. It could be an issue with the library that is being used for
sessions, however.

Regardless, the app shouldn't crash when trying to validate and remove
invalid sessions, so a file size limit of 4KB was imposed during
validation. Any file found in the session directory that exceeds this
size limit will be ignored.

Fixes #777
Fixes #793
2022-06-16 11:50:13 -06:00
Ben Busby d512745767
Bump version to 0.7.4 2022-06-13 10:37:28 -06:00
Ben Busby 47df4da4b5
Bump version to 0.7.3 2022-06-03 14:33:53 -06:00
Ben Busby f5d599e7d2
Use `lax` for session `SameSite` value (not `strict`)
SESSION_COOKIE_SAMESITE must be set to 'lax' to allow the user's
previous session to persist when accessing the instance from an external
link. Setting this value to 'strict' causes Whoogle to revalidate a new
session, and fail, resulting in cookies being disabled.

This could be re-evaluated if Whoogle ever switches to client side
configuration instead.

Fixes #749
2022-05-10 17:40:58 -06:00
Ben Busby 8a0b872337
Bump version to 0.7.2 2022-04-26 16:49:30 -06:00
Warren Spits d62ceb8423
Add proxyfix to honor `X-Forwarded-Proto` header (#731)
Fixes #730
2022-04-22 11:07:36 -06:00
Ben Busby 0048c2f9aa
Update remaining alternative frontends to use Farside
Wikipedia, imgur, and translate alternatives were all still using
hardcoded URLs when replaced with their respective alternative frontend.
This updates them to use farside instead.
2022-03-21 10:08:52 -06:00
Ben Busby 23402e27e1
Check for updates using 24 hour time delta
Rather than only checking for an available update on app init, the check
for updates now performs the check once every 24 hours on the first
request sent after that period.

This also now catches the requests.exceptions.ConnectionError that is
thrown if the app is initialized without an active internet connection.

Fixes #649
2022-02-14 12:19:02 -07:00
Joao A. Candido Ramos 11099f7b1d
Use consistent header for all result types (#535)
Introduces a header for switching between result types (i.e. "All", "News",
etc) that is consistent between the different result types. Previously, image
results had a tab header that was formatted in a drastically different manner,
which was jarring when switching from a different result page to the Images
page.

Created a G class enum to reference class names returned in search
results. As noted in the class doc, this should only be used/updated as
a last resort, as class names change frequently. For some instances,
such as replacing the tbm tab, it's a lot easier to just replace by
header name than attempting to replace it based on how the element is
structured.

Also updated a few styles to revert the latest styling changes being
applied by Google.

Co-authored-by: jacr13 <ramos.joao@protonmail.com>
Co-authored-by: Ben Busby <contact@benbusby.com>
2022-02-07 10:47:25 -07:00
Ben Busby 33f56bb0cb
Read `WHOOGLE_CONFIG_DISABLE` var as bool in app init
Fixes #636, which pointed out that the var was being interpreted as
"active" (config hidden) regardless of the value that was set.
2022-02-01 15:29:22 -07:00
Ben Busby 1af4566991
Bump version to 0.7.1 2022-01-26 10:41:41 -07:00
Ben Busby 72e5a227c8
Move bangs init to bg thread
Initializing the DDG bangs when running whoogle for the first time
creates an indeterminate amount of delay before the app becomes usable,
which makes usability tests (particularly w/ Docker) unreliable. This
moves the bang json init to a background thread and writes a temporary
empty dict to the bangs json file until the full bangs json can be used.
2022-01-25 12:28:06 -07:00
Ben Busby d02a7d90b9
Use UTF-8 encoding when loading json files
Fixes #581
2021-12-21 14:11:55 -07:00
Ben Busby 3d8da1db58
Bump version to 0.7.0 2021-12-08 17:57:22 -07:00
Ben Busby de28e06d8f
Improve cookie security when `HTTPS_ONLY` is set
Adds the "Secure" flag and "__Secure-" prefix if the `HTTPS_ONLY`
environment variable is enabled.

Fixes #539
2021-11-20 16:34:37 -07:00
Ben Busby e06ff85579
Improve public instance session management (#480)
This introduces a new approach to handling user sessions, which should
allow for users to set more reliable config settings on public instances.

Previously, when a user with cookies disabled would update their config,
this would modify the app's default config file, which would in turn
cause new users to inherit these settings when visiting the app for the
first time and cause users to inherit these settings when their current
session cookie expired (which was after 30 days by default I believe).
There was also some half-baked logic for determining on the backend
whether or not a user had cookies disabled, which lead to some issues
with out of control session file creation by Flask.

Now, when a user visits the site, their initial request is forwarded to
a session/<session id> endpoint, and during that subsequent request
their current session id is matched against the one found in the url. If
the ids match, the user has cookies enabled. If not, their original
request is modified with a 'cookies_disabled' query param that tells
Flask not to bother trying to set up a new session for that user, and
instead just use the app's fallback Fernet key for encryption and the
default config.

Since attempting to create a session for a user with cookies disabled
creates a new session file, there is now also a clean-up routine included
in the new session decorator, which will remove all sessions that don't
include a valid key in the dict. NOTE!!! This means that current user
sessions on public instances will be cleared once this update is merged
in. In the long run that's a good thing though, since this will allow session
mgmt to be a lot more reliable overall for users regardless of their cookie
preference.

Individual user sessions still use a unique Fernet key for encrypting queries,
but users with cookies disabled will use the default app key for encryption
and decryption.

Sessions are also now (semi)permanent and have a lifetime of 1 year.
2021-11-17 19:35:30 -07:00
Vansh Comar 3784d897d9
Add "update available" indicator to footer (#517)
This checks the latest released version of Whoogle against
the current app version, and shows an "update available"
message if the current version num < latest release num.

Closes #305
2021-11-02 10:35:40 -06:00
Ben Busby 334aabacb7
Bump version to 0.6.0 2021-10-11 17:44:57 -06:00
Ben Busby 9f84a8ad83
Remove form action from csp
Restricting form-action to 'self' in the content security policy
prevented Chrome (and likely other browsers) from using !bangs on the
home page.

Fixes #408
2021-08-31 07:57:50 -06:00
Ben Busby ad2b2554c1
Use UTF-8 encoding when loading languages json
Fixes #371
2021-08-30 17:23:19 -06:00
Ben Busby 13202cc6b1
Ensure existence of static build dir 2021-07-02 16:21:38 -04:00
Ben Busby 68fdd55482
Use cache busting for css/js files
On app init, short hashes are generated from file checksums to use for
cache busting. These hashes are added into the full file name and used
to symlink to the actual file contents. These symlinks are loaded in the
jinja templates for each page, and can tell the browser to load a new
file if the hash changes.

This is only in place for css and js files, but can be extended in the
future for other file types if needed.
2021-06-30 19:00:01 -04:00
Ben Busby c41e0fc239
Allow theme to mirror user system settings
Introduces a new config element and environment variable
(WHOOGLE_CONFIG_THEME) for setting the theme of the app. Rather than
just having either light or dark, this allows a user to have their
instance use their current system light/dark preference to determine the
theme to use.

As a result, the dark mode setting (and WHOOGLE_CONFIG_DARK) have been
deprecated, but will still work as expected until a system theme has
been chosen.
2021-06-28 10:26:51 -04:00
Ben Busby bcb1d8ecc9
Add lingva translation support in search (#360)
* Add support for Lingva translations in results

Searches that contain the word "translate" and are normal search queries
(i.e. not news/images/video/etc) now create an iframe to a Lingva url to
translate the user's search using their configured search language.

The Lingva url can be configured using the WHOOGLE_ALT_TL env var, or
will fall back to the official Lingva instance url (lingva.ml).

For more info, visit https://github.com/TheDavidDelta/lingva-translate

* Add basic test for lingva results

* Allow user specified lingva instances through csp frame-src

* Fix pep8 issue
2021-06-15 10:14:42 -04:00
Ben Busby 904091f440
Bump version to 0.5.4 2021-06-06 13:45:03 -04:00
Ben Busby a64a86efb6
Bump version to 0.5.3 2021-06-04 11:31:03 -04:00
Ben Busby 614dceeb70
Add fallback interface/search lang + cleanup
Since the interface language defaults to IP geolocation by google, the
default language is now set to english. Still not sure if this is the
best solution, but at least temporarily should clear up some confusion
for users with instances deployed in countries outside of their own.

Also performed some minor cleanup:
  - Updated name of strip_blocked_sites to clean_query
  - Added clean_query to list of jinja template functions
  - Ensured site block list doesn't contain duplicate filters
2021-06-04 11:09:30 -04:00
Ben Busby cbe32a081e
Hotfix: extract only 'q' element from query string
Occasionally the search results will contain links with arguments such
as 'dq', which was being erroneously used in attempts to extract the 'q'
element from query strings. This enforces that only links with '?q=' or
'&q=' (elements with a standalone 'q' arg) will have the element
extracted.

I also refactored the naming of this element once extracted to be just
'q'. Although this seems counterintuitive, it makes a little more sense
since this element is the one we're extracting. It's a vague url arg
name, but it is what it is.

Bump version to 0.5.2 for hotfix release
2021-05-29 12:22:37 -04:00
Ben Busby 43faaee77f
Hotfix: remove site filter for maps links
The new site filter breaks links to Maps results, so filter.py needed
to be updated to handle these links as a unique case. A new method was
introduced to easily remove any "-site:..." filters from the query,
which is now also used to format queries in the header template rather
than manually removing the blocked site list within the template itself.

Bumps version to 0.5.1 for releasing the bugfix

Fixes #329
2021-05-27 12:01:57 -04:00
Ben Busby 4649d96dda
Support basic localization (#325)
* Replace hardcoded strings using translation json file

This introduces a new "translations.json" file under app/static/settings
that is loaded on app init and uses the user config value for interface
language to determine the appropriate strings to use in Whoogle-specific
elements of the UI (primarily only on the home page).

* Verify interface lang can be used for localization

Check the configured interface language against the available
localization dict before attempting to use, otherwise fall back to
english.

Also expanded language names in the languages json file.

* Add test for validating translation language keys

Also adds Spanish translation to json (the only non-English language I
can add and reasonably validate on my own).

* Validate all translations against original keyset, update readme

Readme has been updated to include basic contributing guidelines for
both code and translations.
2021-05-24 17:03:02 -04:00
Ben Busby fcfa3783e3
Bump version to 0.5.0 2021-05-21 10:50:07 -04:00
Ben Busby a7bf9728e3
Allow 'data:' for img src in app CSP
Disallowing base64 images in the app resulted in broken image
placeholders for things like pronunciation guides, business reviews,
etc.
2021-05-05 12:51:11 -04:00
Angel Mario d6d7110e22
Add option to disable changing config from client (#295)
* Add option to disable changing of configuration

Introduces a test to ensure the correct response code is found when
attempting to update the config when disabled, and ensure default config
is unchanged when posting a new config dict.

Attempting to update the config using the API when disabled now returns
a 403 code + redirect.

Co-authored-by: Ben Busby <benbusby@protonmail.com>
2021-04-27 10:36:03 -04:00
Ben Busby ed32fb927c
Disable logging from imported modules
The logging from imported modules (stem, in particular) has caused quite
a few users to assume there are errors where there aren't any. The logs
from stem also aren't helpful, as everything in the library works as
expected despite the implication from the logs that it is not working.
2021-04-09 09:26:16 -04:00
Ben Busby a321d55f13
Hotfix: Send generic "Mozilla" in user agent
Randomizing the "Mozilla" portion of the user agent changed the
character encoding to GB2312. Setting it to plain "Mozilla" enforces
UTF-8 encoding.

Bump to version 0.4.1 for release of bug fix

Fixes #267
2021-04-08 09:43:41 -04:00
Ben Busby 30be540b97 Bump version to 0.4.0 2021-04-05 11:00:56 -04:00
Ben Busby df0b7afa50 Switch to single Fernet key per session
This moves away from the previous (messy) approach of using two separate
keys for decrypting text and element URLs separately and regenerating
them for new searches. The current implementation of sessions is not very
reliable, which lead to keys being regenerated too soon, which would
break page navigation. Until that can be addressed, the single
key per session approach should work a lot better.

Fixes #250

Fixes #90
2021-04-05 11:00:56 -04:00
Shimul 8a10efaa01 Allow setting environment variables in whoogle.env (#237)
This allows the user to enable their preferred settings in a variety of
ways, depending on their deployment preference. Values added to
whoogle.env can be enabled using WHOOGLE_DOTENV=1, in which case all
values in the env var file will overwrite defaults or user provided
settings.

Co-authored-by: Ben Busby <benbusby@protonmail.com>
2021-04-05 11:00:56 -04:00
Ben Busby 62a9b9e949 Allow user-defined CSS/theming (#227)
* Add custom CSS field to config

This allows users to set/customize an instance's theme and appearance to
their liking. The config CSS field is prepopulated with all default CSS
variable values to allow quick editing.

Note that this can be somewhat of a "footgun" if someone updates the
CSS to hide all fields/search/etc. Should probably add some sort of
bandaid "admin" feature for public instances to employ until the whole
cookie/session issue is investigated further.

* Symlink all app static files to test dir

* Refactor app/misc/*.json -> app/static/settings/*.json

The country/language json files are used for user config settings, so
the "misc" name didn't really make sense. Also moved these to the static
folder to make testing easier.

* Fix light theme variables in dark theme css

* Minor style tweaking
2021-04-05 11:00:56 -04:00
Shimul 337d0ebe37 Handle manifest-src in CSP (#231) 2021-04-05 11:00:56 -04:00
Ben Busby f8dfc78539 Improve naming of *_utils files, update fn/class doc
The app/utils/*_utils weren't named very well, and all have been updated
to have more accurate names.

Function and class documention for the utils have been updated as well,
as part of the effort to improve overall documentation for the project.
2021-04-05 11:00:56 -04:00
Ben Busby dcb80ac250 Send CSP header in all responses
Introduces a new content security policy header for responses to all
requests to reduce the possibility of ip leaks to outside connections.
By default blocks all inline scripts, and only allows content loaded
from Whoogle.

Refactors a few small inline scripting cases in the project to their own
individual scripts.
2021-04-05 11:00:56 -04:00