Commit Graph

72 Commits (337d0ebe376bc1f2e592b2cecfff9f9d6d9da246)

Author SHA1 Message Date
Ben Busby f8dfc78539 Improve naming of *_utils files, update fn/class doc
The app/utils/*_utils weren't named very well, and all have been updated
to have more accurate names.

Function and class documention for the utils have been updated as well,
as part of the effort to improve overall documentation for the project.
2021-04-05 11:00:56 -04:00
Ben Busby dcb80ac250 Send CSP header in all responses
Introduces a new content security policy header for responses to all
requests to reduce the possibility of ip leaks to outside connections.
By default blocks all inline scripts, and only allows content loaded
from Whoogle.

Refactors a few small inline scripting cases in the project to their own
individual scripts.
2021-04-05 11:00:56 -04:00
Ben Busby d146016860 Remove auth req for accessing opensearch
Requiring authentication for accessing the opensearch template prevents
the browser from accessing the file when adding as a default search
engine. This removes the authentication requirement from the opensearch
route, which should never provide any sensitive information anyways.
2021-04-05 11:00:56 -04:00
Ben Busby 329c38efb0
Hotfix: Enforce https in heroku opensearch template
Heroku instances were using the base http url when formatting the
opensearch.xml template. This adds a new routing utility, "needs_https",
which can be used for determining if the url in question needs
upgrading.
2021-01-23 14:50:30 -05:00
Ben Busby 6e7ec9918a
Move language/country settings to app config
Moves the language and country dicts from the config model to json files
that are loaded during app init and stored in the app config dict. This
substantially improves the readability of the config model and allows
for much more sensible loading of the language/country options.
2020-12-17 16:42:05 -05:00
Ben Busby 375f4ee9fd
PEP-8: Fix formatting issues, add CI workflow (#161)
Enforces PEP-8 formatting for all python code

Adds a github action build for checking pep8 formatting using pycodestyle
2020-12-17 16:06:47 -05:00
Ben Busby 44a5da1895
Fix heroku https upgrade, add funding options
Heroku app instances have been notoriously bad at having the instance
automatically upgraded to https. This adds a step in the before request
decorator to always upgrade heroku apps, since they're always deployed
with the certificate, but never configured to upgrade automatically.

Fixes #153
2020-12-05 15:53:42 -05:00
Ben Busby a519de90af
Enforce GET-only in opensearch for Chrome
The resolution for enabling full support for search + suggestions in
Chrome is to remove the "method" tag altogether for any Chrome based
browser. Any inclusion of this tag seems to break the search suggestion
feature, and makes the user add the search engine manually.
2020-11-18 10:31:19 -05:00
Ben Busby 72cbc342af Add ability to set temp config in search query
Dark mode, country, interface language, and search language configs
can now be set in the search query by appending each option as a
url parameter.

Supported args are: 'dark', 'lang_search', 'lang_interface', and 'ctry'

Ex: /search?q=%s&dark=1&lang_search=lang_en...

These config settings persist across page navigation and switching
result type, but will be reset if the main search bar is used.

See #144
2020-11-11 00:40:49 -05:00
Ben Busby 933ce7e068 Handle FF sending bad search suggestion param
Occasionally, Firefox will send the search suggestion
string to the server without a mimetype, resulting in the suggestion
only appearing in Flask's `request.data` field. This field is typically
not used for parsing arguments, as the documentation states:

Contains the incoming request data as string in case it came with a
mimetype Flask does not handle.

This fix captures the bytes object sent to the server and parses it into
a normal query to be used in forming suggestions.
2020-10-28 23:02:41 -04:00
Ben Busby 0ef098069e
Add tor and http/socks proxy support (#137)
* Add tor and http/socks proxy support

Allows users to enable/disable tor from the config menu, which will
forward all requests through Tor.

Also adds support for setting environment variables for alternative
proxy support. Setting the following variables will forward requests
through the proxy:
    - WHOOGLE_PROXY_USER (optional)
    - WHOOGLE_PROXY_PASS (optional)
    - WHOOGLE_PROXY_TYPE (required)
      - Can be "http", "socks4", or "socks5"
    - WHOOGLE_PROXY_LOC  (required)
      - Format: "<ip address>:<port>"

See #30

* Refactor acquire_tor_conn -> acquire_tor_identity

Also updated travis CI to set up tor

* Add check for Tor socket on init, improve Tor error handling

Initializing the app sends a heartbeat request to Tor to check for
availability, and updates the home page config options accordingly. This
heartbeat is sent on every request, to ensure Tor support can be
reconfigured without restarting the entire app.

If Tor support is enabled, and a subsequent request fails, then a new
TorError exception is raised, and the Tor feature is disabled until a
valid connection is restored.

The max attempts has been updated to 10, since 5 seemed a bit too low
for how quickly the attempts go by.

* Change send_tor_signal arg type, update function doc

send_tor_signal now accepts a stem.Signal arg (a bit cleaner tbh). Also
added the doc string for the "disable" attribute in TorError.

* Fix tor identity logic in Request.send

* Update proxy init, change proxyloc var name

Proxy is now only initialized if both type and location are specified,
as neither have a default fallback and both are required. I suppose the
type could fall back to http, but seems safer this way.

Also refactored proxyurl -> proxyloc for the runtime args in order to
match the Dockerfile args.

* Add tor/proxy support for Docker builds, fix opensearch/init

The Dockerfile is now updated to include support for Tor configuration,
with a working torrc file included in the repo.

An issue with opensearch was fixed as well, which was uncovered during
testing and was simple enough to fix here. Likewise, DDG bang gen was
updated to only ever happen if the file didn't exist previously, as
testing with the file being regenerated every time was tedious.

* Add missing "@" for socks proxy requests
2020-10-28 20:47:42 -04:00
Ben Busby ae05e8ff8b Finished basic implementation of DDG bang feature
Initialization of the app now includes generation of a ddg-bang json
file, which is used for all bang style searches afterwards.

Also added search suggestion handling for bang json lookup. Queries
beginning with "!" now reference the bang json file to pull all keys
that match.

Updated test suite to include basic tests for bang functionality.

Updated gitignore to exclude bang subdir.
2020-10-10 15:55:14 -04:00
Ben Busby 2126742b76
Merge branch 'develop' into develop 2020-10-07 18:38:36 -04:00
Ben Busby e471b012a0 Updated opensearch template
Reconfigured template to only use method parameter if set to search via
POST request (which is the default).

Apparently Chrome/Chromium based browsers don't like non-GET request
searches, and specifying a method caused Chrome to reject the template
altogether.
2020-08-15 14:03:26 -06:00
Ben Busby 0c0a01b83f Minor opensearch route and description updates
Bumped version to 0.2.1 for next release

Updated image in opensearch template to use base64 image

Updated opensearch route to serve file as attachment
2020-08-15 13:02:17 -06:00
Ben Busby 975ece8cd0
Privacy respecting alternatives in results view (#106)
Full implementation of social media alt redirects (twitter/youtube/instagram -> nitter/invidious/bibliogram) depending on configuration.

Verbatim search and option to ignore search autocorrect are now supported as well.

Also cleaned up the javascript side of whoogle config so that it now
uses arrays of available fields for parsing config values instead of manually assigning each
one to a variable.

This doesn't include support for Google Maps -> Open Street Maps, that
seems a bit more involved than the social media redirects were, so it
should likely be a separate effort.
2020-07-26 11:53:59 -06:00
Marvin Borner 5575bcd0af
Merge branch 'develop' into develop 2020-06-28 11:11:53 +02:00
Joao A. Candido Ramos bf4bf1ff2c
Split interface and results language config (#89)
Adding support to choose separately the language of search and the one for the interface (allowing a default givent by google).

Co-authored-by: Joao <ramos.joao@protonmail.com>
2020-06-27 14:23:17 -06:00
Marvin Borner dd9d87d25b
Added ddg-style !bang-operators
This is a proof of concept! The code works, but uses hardcoded operators
and may be placed in the wrong file/class.
The best-case scenario would be the possibility to use the 13.000+ ddg
operators, but I don't know if that's possible without having to
redirect to duckduckgo first.
2020-06-26 00:26:02 +02:00
Ben Busby 5f8309d2f0 Added footer to results page 2020-06-11 13:25:23 -06:00
Ben Busby f86a44b637 Removed no-cache enforcement, minor styling/formatting improvements 2020-06-11 12:14:57 -06:00
Ben Busby 4324fcd8f8 Added better multilingual support, updated filter
Results page now includes method for switching to "All Languages" from
whichever language is specified as the primary in the config (see #74).

Also removes the non-Whoogle links from the page footer, leaving only
the page navigation controls

Added support for the date range filter on the results page, though I'd
still recommend using the ":past <unit>" query instead.
2020-06-07 14:06:49 -06:00
Ben Busby 32e837a5e0 Refactored whoogle session mgmt
Now allows a fallback "default" session to be used if a user's browser
is blocking cookies
2020-06-05 15:24:44 -06:00
Ben Busby 64af72abb5 Moved custom conf files to their own directory 2020-06-02 14:38:29 -06:00
Ben Busby b6fb4723f9
Project refactor (#85)
* Major refactor of requests and session management

- Switches from pycurl to requests library
  - Allows for less janky decoding, especially with non-latin character
  sets
- Adds session level management of user configs
  - Allows for each session to set its own config (people are probably
  going to complain about this, though not sure if it'll be the same
  number of people who are upset that their friends/family have to share
  their config)
- Updates key gen/regen to more aggressively swap out keys after each
request

* Added ability to save/load configs by name

- New PUT method for config allows changing config with specified name
- New methods in js controller to handle loading/saving of configs

* Result formatting and removal of unused elements

- Fixed question section formatting from results page (added appropriate
padding and made questions styled as italic)
- Removed user agent display from main config settings

* Minor change to button label

* Fixed issue with "de-pickling" of flask session

Having a gitignore-everything ("*") file within a flask session folder seems to cause a
weird bug where the state of the app becomes unusable from continuously
trying to prune files listed in the gitignore (and it can't prune '*').

* Switched to pickling saved configs

* Updated ad/sponsored content filter and conf naming

Configs are now named with a .conf extension to allow for easier manual
cleanup/modification of named config files

Sponsored content now removed by basic string matching of span content

* Version bump to 0.2.0

* Fixed request.send return style
2020-06-02 12:54:47 -06:00
Ben Busby cb18bc6ccc Updated autocomplete styling
Added dark theme specific stylesheet to use if dark mode is active
2020-05-26 10:58:37 -06:00
Ben Busby 98d639883c Fixing styling/url/safe mode inconsistencies 2020-05-26 10:39:19 -06:00
Ben Busby 9212f9921a Fixed #76
Added enter key submit on results page

Added results type carryover for subsequent searches on results page

Removed redundant header on image search results
2020-05-25 10:53:15 -06:00
Ben Busby d1f38cf924 Fixed styling of footer in dark mode 2020-05-25 10:33:24 -06:00
Ben Busby 21012f5265
Feature: autocomplete/search suggestions (#72)
Basic autocomplete/search suggestion functionality added

* Adds new GET and POST routes for '/autocomplete' that accept a string query and returns an array of suggestions

* Adds new autoscript.js file for handling queries on the main page and results view

* Updated requests class to include autocomplete method

* Updated opensearch template to handle search suggestions

* Added header template to allow for autocomplete on results view

* Updated readme to mention autocomplete feature
2020-05-24 14:03:11 -06:00
Ben Busby 09c53b52af
Feature: country and safe search config options (#71)
* Added country and safe search config options

* Updated handling of parser error in results test

* Improved handling of default country

* Added 1px empty gif fallback as a replacement for images that fail to load
2020-05-23 14:27:23 -06:00
Ben Busby c51f186419 Added version footer, minor PEP 8 refactoring 2020-05-20 11:02:30 -06:00
Ben Busby 38b7b19e2a
Added basic authentication (#51)
Username/password can be set either as Dockerfile build arguments or
passed into the run script as "--userpass <username:password>"
2020-05-18 10:30:32 -06:00
Paul Rothrock 0e39b8f97b
Added "I'm feeling lucky" function (#46)
* Putting '! ' at the beginning of the query now redirects to the first search result

Signed-off-by: Paul Rothrock <paul@movetoiceland.com>

* Moved get_first_url outside of filter class

Signed-off-by: Paul Rothrock <paul@movetoiceland.com>
2020-05-18 10:28:23 -06:00
Ben Busby a4382d59f6
Updated redirect code used in https redirects
See https://developer.mozilla.org/en-US/docs/Web/HTTP/Redirections

301 redirections do not keep the request method intact, and can occasionally be changed from POST to GET

308 redirections always keep the request method, which is necessary for all POST search requests
2020-05-16 09:31:07 -06:00
Ben Busby b4165f9957 Minor improvement to https enforcement 2020-05-15 16:29:22 -06:00
Ben Busby 1ed6178e9a
Feature: https only -- adds option to enforce https on running instances (#48)
* Adding HTTPS enforcement

Command line runs of Whoogle Search through pip/pipx/etc will need the
`--https-only` flag appended to the run command.

Docker runs require the `use_https` build arg applied.

* Update README.md

Moved https-only note to top of docker run command, updated pip runner help output

* Dockerfile: removed HTTPS enforcement, updated PORT setting

Dockerfile no longer enforces an HTTPS connection, but still allows for
setting via a build arg. The Flask server port is now configurable as a
build arg as well, by setting a port number to "whoogle_port"

* Fixed incorrect port assignment
2020-05-15 15:44:50 -06:00
Ben Busby 87f0a8d496
Added volume mounted config to Dockerfile (#39) 2020-05-13 18:27:04 -06:00
Ben Busby f4bd3df2bb
Added option to search only via GET request (#36)
This addresses #18, which brought up the issue of searching with Whoogle
with the search instance set to always use a specific container in
Firefox Container Tabs.

Could also be useful if you want to share your search results or
something, I guess. Though nobody likes when people do that.
2020-05-13 00:19:51 -06:00
Ben Busby a11ceb0a57
Feature: language config (#27)
* Added language configuration support

Main page now has a dropdown for selecting preferred language of
results.

Refactored config to be its own model with language constants.

* Added more language support

Interface language is now updated using the "hl" arg

Fixed chinese traditional and simplified values

Updated decoding of characters to gb2312

* Updated to use conditional decoding dependent on language

* Updated filter to not rely on valid config to work properly
2020-05-12 17:15:53 -06:00
Jake Howard f700ed88e7
Swap out Flask's default web server for Waitress (#32)
* Ignore venv when building docker file

* Remove reference to 8888 port

It wasn't really used anywhere, and setting it to 5000 everywhere removes ambiguity, and makes things easier to track and reason about

* Use waitress rather than Flask's built in web server

It's not production grade

* Actually add waitress to requirements

Woops!
2020-05-12 17:14:55 -06:00
Ben Busby 7ccad2799e Added config option to address instance behind reverse proxy
Config options now allow setting a "root url", which defaults to the
request url root. Saving a new url in this field will allow for proper
redirects and usage of the opensearch element.

Also provides a possible solution for #17, where the default flask redirect method redirects to
http instead of https.
2020-05-10 13:27:02 -06:00
Ben Busby 130ac4532e Refactored handling of user config
Now implemented as a flask global variable reads from the same json file
as before, but doesn't crash if it does not find an existing file.

Removed user config creation from run script
2020-05-06 18:39:12 -06:00
Ben Busby d316fd77c6 Updated setup and routes for pipx compatibility 2020-05-06 18:13:02 -06:00
Ben Busby d01f56ea03 Removed referrer from links, refacored routes
Added <meta name="referrer" content="no-referrer"> to all whoogle
templates

Refactored search route to use conditionally use either request.args or
request.form, depending on rest call (get vs post respectively)
2020-05-05 18:28:43 -06:00
Ben Busby 708769f682 Minor styling refactor, updated app name 2020-05-04 18:00:43 -06:00
Ben Busby 3e404cb524 Restructured valid params checking, added empty query redirect 2020-04-29 18:53:58 -06:00
Ben Busby 0a3da5cea4 Updated js controller and config api route
Controller was refactored to be a bit less monolithic.

Config route was updated to accept an html form data POST rather than
just a json object.
2020-04-28 20:50:12 -06:00
Ben Busby 1cbe394e6f Updated tests, fixed a few bugs
Added opensearch routes test and individual tests for searching via GET
and POST separately.

Fixed incorrect assignment in gen_query.
2020-04-28 18:59:33 -06:00
Ben Busby 0c0ebb8917 Added POST search, encrypted query strings, refactoring
The implementation of POST search support comes with a few benefits. The
most apparent is the avoidance of search queries appearing in web server
logs -- instead of the prior GET approach (i.e.
/search?q=my+search+query), using POST requests with the query stored in
the request body creates logs that simply appear as "/search".

Since a lot of relative links are generated in the results page, I came
up with a way to generate a unique key at run time that is used to
encrypt any query strings before sending to the user. This benefits both
regular text queries as well as fetching of image links and means that
web logs will only show an encrypted string where a link or query
string might slip through.

Unfortunately, GET search requests still need to be supported, as it
doesn't seem that Firefox (on iOS) supports loading search engines by
their opensearch.xml file, but instead relies on manual entry of a
search query string. Once this is updated, I'll probably remove GET
request search support.
2020-04-28 18:19:34 -06:00