Previously the logic for testing site blocking was essentially "assert
blocked_site not part of result_site". This caused test failures, since
site blocking does not extend to subdomains for the blocked site. The
reversed logic makes more sense with what the test was trying to
accomplish.
* Integrate Farside into Whoogle
When instances are ratelimited (when a captcha is returned instead of
the user's search results) the user can now hop to a new instance via
Farside, a new backend service that redirects users to working instances
of a particular frontend. In this case, it presents a user with a
Farside link to a new Whoogle (or Searx) instance instead, so that the
user can resume their search.
For the generated Farside->Whoogle link, the generated link includes the
user's current Whoogle configuration settings as URL params, to ensure a
more seamless transition between instances. This doesn't translate to
the Farside->Searx link, but potentially could with some changes.
* Expand conversion of config<->url params
Config settings can now be translated to and from URL params using a
predetermined set of "safe" keys (i.e. config settings that easily
translate to URL params).
* Allow jumping instances via Farside when ratelimited
When instances are ratelimited (when a captcha is returned instead of
the user's search results) the user can now hop to a new instance via
Farside, a new backend service that redirects users to working instances
of a particular frontend. In this case, it presents a user with a
Farside link to a new Whoogle (or Searx) instance instead, so that the
user can resume their search.
For the generated Farside->Whoogle link, the generated link includes the
user's current Whoogle configuration settings as URL params, to ensure a
more seamless transition between instances. This doesn't translate to
the Farside->Searx link, but potentially could with some changes.
Closes#554Closes#559
Previously had hardcoded POST requests for all requests that didn't use
the header template (which currently is only the image tab).
Also refactored how the Filter class works. It now requires a valid
Config model to be provided, which is then set up as a class var that
the filtering functions can use as needed, rather than setting specific
values from the config as individual values (which was confusing and
sloppy).
Fixes#561
This introduces a new approach to handling user sessions, which should
allow for users to set more reliable config settings on public instances.
Previously, when a user with cookies disabled would update their config,
this would modify the app's default config file, which would in turn
cause new users to inherit these settings when visiting the app for the
first time and cause users to inherit these settings when their current
session cookie expired (which was after 30 days by default I believe).
There was also some half-baked logic for determining on the backend
whether or not a user had cookies disabled, which lead to some issues
with out of control session file creation by Flask.
Now, when a user visits the site, their initial request is forwarded to
a session/<session id> endpoint, and during that subsequent request
their current session id is matched against the one found in the url. If
the ids match, the user has cookies enabled. If not, their original
request is modified with a 'cookies_disabled' query param that tells
Flask not to bother trying to set up a new session for that user, and
instead just use the app's fallback Fernet key for encryption and the
default config.
Since attempting to create a session for a user with cookies disabled
creates a new session file, there is now also a clean-up routine included
in the new session decorator, which will remove all sessions that don't
include a valid key in the dict. NOTE!!! This means that current user
sessions on public instances will be cleared once this update is merged
in. In the long run that's a good thing though, since this will allow session
mgmt to be a lot more reliable overall for users regardless of their cookie
preference.
Individual user sessions still use a unique Fernet key for encrypting queries,
but users with cookies disabled will use the default app key for encryption
and decryption.
Sessions are also now (semi)permanent and have a lifetime of 1 year.
DDG style bang searches can now have the bang (!) at the end of
the search (i.e. "bologna w!" will now redirect to wikipedia just like
"bologna !w" would)
* Add support for Lingva translations in results
Searches that contain the word "translate" and are normal search queries
(i.e. not news/images/video/etc) now create an iframe to a Lingva url to
translate the user's search using their configured search language.
The Lingva url can be configured using the WHOOGLE_ALT_TL env var, or
will fall back to the official Lingva instance url (lingva.ml).
For more info, visit https://github.com/TheDavidDelta/lingva-translate
* Add basic test for lingva results
* Allow user specified lingva instances through csp frame-src
* Fix pep8 issue
* Replace hardcoded strings using translation json file
This introduces a new "translations.json" file under app/static/settings
that is loaded on app init and uses the user config value for interface
language to determine the appropriate strings to use in Whoogle-specific
elements of the UI (primarily only on the home page).
* Verify interface lang can be used for localization
Check the configured interface language against the available
localization dict before attempting to use, otherwise fall back to
english.
Also expanded language names in the languages json file.
* Add test for validating translation language keys
Also adds Spanish translation to json (the only non-English language I
can add and reasonably validate on my own).
* Validate all translations against original keyset, update readme
Readme has been updated to include basic contributing guidelines for
both code and translations.
* Block websites in search results via user config
Adds a new config field "Block" to specify a comma separated list of
websites to block in search results. This is applied for all searches.
* Add test for blocking sites from search results
* Document WHOOGLE_CONFIG_BLOCK usage
* Strip '-site:' filters from query in header template
The 'behind the scenes' site filter applied for blocked sites was
appearing in the query field when navigating between search categories
(all -> images -> news, etc). This prevents the filter from appearing in
all except "images", since the image category uses a separate header.
This should eventually be addressed when the image page can begin using
the standard whoogle header, but until then, the filter will still
appear for image searches.
* Add option to disable changing of configuration
Introduces a test to ensure the correct response code is found when
attempting to update the config when disabled, and ensure default config
is unchanged when posting a new config dict.
Attempting to update the config using the API when disabled now returns
a 403 code + redirect.
Co-authored-by: Ben Busby <benbusby@protonmail.com>
Config boolean environment variables need to be cast to ints, since
they are set or unset using 0 and 1. Previously they were interpreted as
(pseudocode) read_var(name, default=False), which meant that setting
CONFIG_VAR=0 would enable that variable since Python reads environment
variables as strings, and '0' is truthy. This updates the previous logic
to (still pseudocode) int(read_var(name, default='0')).
Fixes#279
This moves away from the previous (messy) approach of using two separate
keys for decrypting text and element URLs separately and regenerating
them for new searches. The current implementation of sessions is not very
reliable, which lead to keys being regenerated too soon, which would
break page navigation. Until that can be addressed, the single
key per session approach should work a lot better.
Fixes#250Fixes#90
* Add custom CSS field to config
This allows users to set/customize an instance's theme and appearance to
their liking. The config CSS field is prepopulated with all default CSS
variable values to allow quick editing.
Note that this can be somewhat of a "footgun" if someone updates the
CSS to hide all fields/search/etc. Should probably add some sort of
bandaid "admin" feature for public instances to employ until the whole
cookie/session issue is investigated further.
* Symlink all app static files to test dir
* Refactor app/misc/*.json -> app/static/settings/*.json
The country/language json files are used for user config settings, so
the "misc" name didn't really make sense. Also moved these to the static
folder to make testing easier.
* Fix light theme variables in dark theme css
* Minor style tweaking
The app/utils/*_utils weren't named very well, and all have been updated
to have more accurate names.
Function and class documention for the utils have been updated as well,
as part of the effort to improve overall documentation for the project.
Pip installs of whoogle search were missing access to the misc/ folder,
which previously contained the language and country json files. These
have been moved to app/misc, and the previous root level misc/ was
renamed to config/ (since it now only contains the tor config files).
Bump to 0.3.1.
Moves the language and country dicts from the config model to json files
that are loaded during app init and stored in the app config dict. This
substantially improves the readability of the config model and allows
for much more sensible loading of the language/country options.
The BeautifulSoup constructur in gen_nojs needed to explicitly set
features='lxml' to silence a warning from the library.
Also temporarily disabled the site alts test since the results are too
unreliable. This should be moved to a unit test instead.
* Add ability to configure site alts w/ env vars
Site alternatives (i.e. twitter.com -> nitter.net) can now be configured
using environment variables:
WHOOGLE_ALT_TW='nitter.net' # twitter alt
WHOOGLE_ALT_YT='invidio.us' # youtube alt
WHOOGLE_ALT_IG='bibliogram.art/u' # instagram alt
Updated testing to confirm results have been modified.
* Add site alt vars to docker settings and readme
Dark mode, country, interface language, and search language configs
can now be set in the search query by appending each option as a
url parameter.
Supported args are: 'dark', 'lang_search', 'lang_interface', and 'ctry'
Ex: /search?q=%s&dark=1&lang_search=lang_en...
These config settings persist across page navigation and switching
result type, but will be reset if the main search bar is used.
See #144
Initialization of the app now includes generation of a ddg-bang json
file, which is used for all bang style searches afterwards.
Also added search suggestion handling for bang json lookup. Queries
beginning with "!" now reference the bang json file to pull all keys
that match.
Updated test suite to include basic tests for bang functionality.
Updated gitignore to exclude bang subdir.
Full implementation of social media alt redirects (twitter/youtube/instagram -> nitter/invidious/bibliogram) depending on configuration.
Verbatim search and option to ignore search autocorrect are now supported as well.
Also cleaned up the javascript side of whoogle config so that it now
uses arrays of available fields for parsing config values instead of manually assigning each
one to a variable.
This doesn't include support for Google Maps -> Open Street Maps, that
seems a bit more involved than the social media redirects were, so it
should likely be a separate effort.
Updated to ensure a child span element is available before running a
test to verify the correct time range for the result. Need to come up
with a better way of ensuring uniform results across multiple tests,
since otherwise periodic changes in the returned results can cause tests
to fail.
Adding support to choose separately the language of search and the one for the interface (allowing a default givent by google).
Co-authored-by: Joao <ramos.joao@protonmail.com>
* Major refactor of requests and session management
- Switches from pycurl to requests library
- Allows for less janky decoding, especially with non-latin character
sets
- Adds session level management of user configs
- Allows for each session to set its own config (people are probably
going to complain about this, though not sure if it'll be the same
number of people who are upset that their friends/family have to share
their config)
- Updates key gen/regen to more aggressively swap out keys after each
request
* Added ability to save/load configs by name
- New PUT method for config allows changing config with specified name
- New methods in js controller to handle loading/saving of configs
* Result formatting and removal of unused elements
- Fixed question section formatting from results page (added appropriate
padding and made questions styled as italic)
- Removed user agent display from main config settings
* Minor change to button label
* Fixed issue with "de-pickling" of flask session
Having a gitignore-everything ("*") file within a flask session folder seems to cause a
weird bug where the state of the app becomes unusable from continuously
trying to prune files listed in the gitignore (and it can't prune '*').
* Switched to pickling saved configs
* Updated ad/sponsored content filter and conf naming
Configs are now named with a .conf extension to allow for easier manual
cleanup/modification of named config files
Sponsored content now removed by basic string matching of span content
* Version bump to 0.2.0
* Fixed request.send return style
Basic autocomplete/search suggestion functionality added
* Adds new GET and POST routes for '/autocomplete' that accept a string query and returns an array of suggestions
* Adds new autoscript.js file for handling queries on the main page and results view
* Updated requests class to include autocomplete method
* Updated opensearch template to handle search suggestions
* Added header template to allow for autocomplete on results view
* Updated readme to mention autocomplete feature
* Added country and safe search config options
* Updated handling of parser error in results test
* Improved handling of default country
* Added 1px empty gif fallback as a replacement for images that fail to load
* Putting '! ' at the beginning of the query now redirects to the first search result
Signed-off-by: Paul Rothrock <paul@movetoiceland.com>
* Moved get_first_url outside of filter class
Signed-off-by: Paul Rothrock <paul@movetoiceland.com>
For datetime spans in time-filtered search results, anything less than 7
characters or more than 15 can be guaranteed to not be properly
formatted dates (either "mm dd yyyy" or "xx days/months/weeks ago")
Was previously checking for non-inclusive max number of days (i.e.
filtering by past month would return a failed test if the result was
from exactly 31 days ago)