whoogle-search

Commit Graph

Author	SHA1	Message	Date
Ben Busby	0310f0f542	Use app init enc key by default for all queries This can be updated later to allow users with cookies enabled to use a key that is unique to their session (if they want, not mandatory), but for now it makes more sense to just use a single key for all queries from all users. This should eliminate a lot of issues that users have reported where they are unable to decrypt queries or page elements due to an expired/renewed session key.	2022-12-05 12:14:14 -07:00
Joao A. Candido Ramos	d05ec08abf	Remove wildcard imports (#791 )	2022-06-24 10:51:15 -06:00
Ben Busby	ef98d85dc5	Ensure searches with a leading slash are treated as queries A user reported a bug where searches with a leading slash (in this case: "/e/OS apps" were interpreted as a Google specific link when clicking the next page of results. This was due to the behavior that Google's search results exhibit, where internal links for pages like support.google.com are delivered with params like "?q=/support" rather than a direct link. This fixes that scenario by checking the "q" param value against the user's original query to ensure they don't match before assuming that the result is intended as a redirect. Fixes #776	2022-06-03 14:03:57 -06:00
Ben Busby	0048c2f9aa	Update remaining alternative frontends to use Farside Wikipedia, imgur, and translate alternatives were all still using hardcoded URLs when replaced with their respective alternative frontend. This updates them to use farside instead.	2022-03-21 10:08:52 -06:00
Ben Busby	d33e8241dc	Fix "my ip" search regression Removes dependency on class names for creating the "my ip" info card in the results list for searches pertaining to the user's public IP. Adds test to prevent this from happening again. Note to anyone reading this and looking to contribute: please avoid using hardcoded class names at all costs. This approach of creating/removing content just results in issues if/when Google decides to introduce/remove class names from the result page. Fixes #657	2022-02-14 11:40:11 -07:00
Ben Busby	119437a07c	Fix test for blocking site from results Previously the logic for testing site blocking was essentially "assert blocked_site not part of result_site". This caused test failures, since site blocking does not extend to subdomains for the blocked site. The reversed logic makes more sense with what the test was trying to accomplish.	2021-12-19 11:22:47 -07:00
Ben Busby	634d179568	Use farside.link for frontend alternatives in results (#560 ) * Integrate Farside into Whoogle When instances are ratelimited (when a captcha is returned instead of the user's search results) the user can now hop to a new instance via Farside, a new backend service that redirects users to working instances of a particular frontend. In this case, it presents a user with a Farside link to a new Whoogle (or Searx) instance instead, so that the user can resume their search. For the generated Farside->Whoogle link, the generated link includes the user's current Whoogle configuration settings as URL params, to ensure a more seamless transition between instances. This doesn't translate to the Farside->Searx link, but potentially could with some changes. * Expand conversion of config<->url params Config settings can now be translated to and from URL params using a predetermined set of "safe" keys (i.e. config settings that easily translate to URL params). * Allow jumping instances via Farside when ratelimited When instances are ratelimited (when a captcha is returned instead of the user's search results) the user can now hop to a new instance via Farside, a new backend service that redirects users to working instances of a particular frontend. In this case, it presents a user with a Farside link to a new Whoogle (or Searx) instance instead, so that the user can resume their search. For the generated Farside->Whoogle link, the generated link includes the user's current Whoogle configuration settings as URL params, to ensure a more seamless transition between instances. This doesn't translate to the Farside->Searx link, but potentially could with some changes. Closes #554 Closes #559	2021-12-08 17:27:33 -07:00
Ben Busby	10a15e06e1	Fix incorrect request type for image searches Previously had hardcoded POST requests for all requests that didn't use the header template (which currently is only the image tab). Also refactored how the Filter class works. It now requires a valid Config model to be provided, which is then set up as a class var that the filtering functions can use as needed, rather than setting specific values from the config as individual values (which was confusing and sloppy). Fixes #561	2021-12-06 21:39:50 -07:00
Ben Busby	e06ff85579	Improve public instance session management (#480 ) This introduces a new approach to handling user sessions, which should allow for users to set more reliable config settings on public instances. Previously, when a user with cookies disabled would update their config, this would modify the app's default config file, which would in turn cause new users to inherit these settings when visiting the app for the first time and cause users to inherit these settings when their current session cookie expired (which was after 30 days by default I believe). There was also some half-baked logic for determining on the backend whether or not a user had cookies disabled, which lead to some issues with out of control session file creation by Flask. Now, when a user visits the site, their initial request is forwarded to a session/<session id> endpoint, and during that subsequent request their current session id is matched against the one found in the url. If the ids match, the user has cookies enabled. If not, their original request is modified with a 'cookies_disabled' query param that tells Flask not to bother trying to set up a new session for that user, and instead just use the app's fallback Fernet key for encryption and the default config. Since attempting to create a session for a user with cookies disabled creates a new session file, there is now also a clean-up routine included in the new session decorator, which will remove all sessions that don't include a valid key in the dict. NOTE!!! This means that current user sessions on public instances will be cleared once this update is merged in. In the long run that's a good thing though, since this will allow session mgmt to be a lot more reliable overall for users regardless of their cookie preference. Individual user sessions still use a unique Fernet key for encrypting queries, but users with cookies disabled will use the default app key for encryption and decryption. Sessions are also now (semi)permanent and have a lifetime of 1 year.	2021-11-17 19:35:30 -07:00
Ben Busby	bcb1d8ecc9	Add lingva translation support in search (#360 ) * Add support for Lingva translations in results Searches that contain the word "translate" and are normal search queries (i.e. not news/images/video/etc) now create an iframe to a Lingva url to translate the user's search using their configured search language. The Lingva url can be configured using the WHOOGLE_ALT_TL env var, or will fall back to the official Lingva instance url (lingva.ml). For more info, visit https://github.com/TheDavidDelta/lingva-translate * Add basic test for lingva results * Allow user specified lingva instances through csp frame-src * Fix pep8 issue	2021-06-15 10:14:42 -04:00
Ben Busby	c8da53d4b0	Block websites from search results via user config (#304 ) * Block websites in search results via user config Adds a new config field "Block" to specify a comma separated list of websites to block in search results. This is applied for all searches. * Add test for blocking sites from search results * Document WHOOGLE_CONFIG_BLOCK usage * Strip '-site:' filters from query in header template The 'behind the scenes' site filter applied for blocked sites was appearing in the query field when navigating between search categories (all -> images -> news, etc). This prevents the filter from appearing in all except "images", since the image category uses a separate header. This should eventually be addressed when the image page can begin using the standard whoogle header, but until then, the filter will still appear for image searches.	2021-05-07 11:45:53 -04:00
Ben Busby	df0b7afa50	Switch to single Fernet key per session This moves away from the previous (messy) approach of using two separate keys for decrypting text and element URLs separately and regenerating them for new searches. The current implementation of sessions is not very reliable, which lead to keys being regenerated too soon, which would break page navigation. Until that can be addressed, the single key per session approach should work a lot better. Fixes #250 Fixes #90	2021-04-05 11:00:56 -04:00
Ben Busby	f8dfc78539	Improve naming of _utils files, update fn/class doc The app/utils/_utils weren't named very well, and all have been updated to have more accurate names. Function and class documention for the utils have been updated as well, as part of the effort to improve overall documentation for the project.	2021-04-05 11:00:56 -04:00
Ben Busby	375f4ee9fd	PEP-8: Fix formatting issues, add CI workflow (#161 ) Enforces PEP-8 formatting for all python code Adds a github action build for checking pep8 formatting using pycodestyle	2020-12-17 16:06:47 -05:00
Ben Busby	5b5c2588ed	Fix nojs lxml constructor The BeautifulSoup constructur in gen_nojs needed to explicitly set features='lxml' to silence a warning from the library. Also temporarily disabled the site alts test since the results are too unreliable. This should be moved to a unit test instead.	2020-12-11 19:21:32 -05:00
Ben Busby	6c429e6dd1	Allow setting site alts using environment vars (#155 ) * Add ability to configure site alts w/ env vars Site alternatives (i.e. twitter.com -> nitter.net) can now be configured using environment variables: WHOOGLE_ALT_TW='nitter.net' # twitter alt WHOOGLE_ALT_YT='invidio.us' # youtube alt WHOOGLE_ALT_IG='bibliogram.art/u' # instagram alt Updated testing to confirm results have been modified. * Add site alt vars to docker settings and readme	2020-12-05 17:01:21 -05:00
Ben Busby	975ece8cd0	Privacy respecting alternatives in results view (#106 ) Full implementation of social media alt redirects (twitter/youtube/instagram -> nitter/invidious/bibliogram) depending on configuration. Verbatim search and option to ignore search autocorrect are now supported as well. Also cleaned up the javascript side of whoogle config so that it now uses arrays of available fields for parsing config values instead of manually assigning each one to a variable. This doesn't include support for Google Maps -> Open Street Maps, that seems a bit more involved than the social media redirects were, so it should likely be a separate effort.	2020-07-26 11:53:59 -06:00
Ben Busby	6ef7ab663a	Small update to results time period test Updated to ensure a child span element is available before running a test to verify the correct time range for the result. Need to come up with a better way of ensuring uniform results across multiple tests, since otherwise periodic changes in the returned results can cause tests to fail.	2020-06-28 10:52:53 -06:00
Ben Busby	b6fb4723f9	Project refactor (#85 ) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune ''). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style	2020-06-02 12:54:47 -06:00
Ben Busby	09c53b52af	Feature: country and safe search config options (#71 ) * Added country and safe search config options * Updated handling of parser error in results test * Improved handling of default country * Added 1px empty gif fallback as a replacement for images that fail to load	2020-05-23 14:27:23 -06:00
Ben Busby	b15368ac28	Updated recent results test w/ +5 day tolerance	2020-05-20 11:07:01 -06:00
Ben Busby	1cbe394e6f	Updated tests, fixed a few bugs Added opensearch routes test and individual tests for searching via GET and POST separately. Fixed incorrect assignment in gen_query.	2020-04-28 18:59:33 -06:00
Ben Busby	dd077954bf	Fixed search results test For datetime spans in time-filtered search results, anything less than 7 characters or more than 15 can be guaranteed to not be properly formatted dates (either "mm dd yyyy" or "xx days/months/weeks ago")	2020-04-26 18:11:02 -06:00
Ben Busby	9c7b4c1444	Fixed bad test assertion Was previously checking for non-inclusive max number of days (i.e. filtering by past month would return a failed test if the result was from exactly 31 days ago)	2020-04-24 18:10:57 -06:00
Ben Busby	024552f2df	Minor refactor of filter class, updated tests, fixed html/css, added ua to config	2020-04-16 10:01:02 -06:00
Ben Busby	0d17cc8ef6	Modified result length test	2020-04-15 17:54:38 -06:00
Ben Busby	ab59682cad	Small var fix in testing search results	2020-04-15 17:48:49 -06:00
Ben Busby	b5b6e64177	Added testing and ci build, refactored filter class, refactored project structure	2020-04-15 17:41:53 -06:00

28 Commits (baa8bd0eb41f19c9470fb54df1397236456e72be)