whoogle-search

Commit Graph

Author	SHA1	Message	Date
Ben Busby	f8dfc78539	Improve naming of _utils files, update fn/class doc The app/utils/_utils weren't named very well, and all have been updated to have more accurate names. Function and class documention for the utils have been updated as well, as part of the effort to improve overall documentation for the project.	2021-04-05 11:00:56 -04:00
Ben Busby	fdd4ee590f	Hotfix: Set EU consent cookie to pending for all requests See discussion on #243	2021-04-02 12:32:59 -04:00
Ben Busby	440c4e9c50	Remove lxml dependency The lxml dependency in the project was fairly unnecessary, and made the initial build time for the project considerably slower. This replaces all instances of lxml with either the default html parser (for bs4 constructors) or the built in xml.etree package (for search suggestion parsing).	2020-12-29 18:43:42 -05:00
Ben Busby	375f4ee9fd	PEP-8: Fix formatting issues, add CI workflow (#161 ) Enforces PEP-8 formatting for all python code Adds a github action build for checking pep8 formatting using pycodestyle	2020-12-17 16:06:47 -05:00
Ben Busby	0ef098069e	Add tor and http/socks proxy support (#137 ) * Add tor and http/socks proxy support Allows users to enable/disable tor from the config menu, which will forward all requests through Tor. Also adds support for setting environment variables for alternative proxy support. Setting the following variables will forward requests through the proxy: - WHOOGLE_PROXY_USER (optional) - WHOOGLE_PROXY_PASS (optional) - WHOOGLE_PROXY_TYPE (required) - Can be "http", "socks4", or "socks5" - WHOOGLE_PROXY_LOC (required) - Format: "<ip address>:<port>" See #30 * Refactor acquire_tor_conn -> acquire_tor_identity Also updated travis CI to set up tor * Add check for Tor socket on init, improve Tor error handling Initializing the app sends a heartbeat request to Tor to check for availability, and updates the home page config options accordingly. This heartbeat is sent on every request, to ensure Tor support can be reconfigured without restarting the entire app. If Tor support is enabled, and a subsequent request fails, then a new TorError exception is raised, and the Tor feature is disabled until a valid connection is restored. The max attempts has been updated to 10, since 5 seemed a bit too low for how quickly the attempts go by. * Change send_tor_signal arg type, update function doc send_tor_signal now accepts a stem.Signal arg (a bit cleaner tbh). Also added the doc string for the "disable" attribute in TorError. * Fix tor identity logic in Request.send * Update proxy init, change proxyloc var name Proxy is now only initialized if both type and location are specified, as neither have a default fallback and both are required. I suppose the type could fall back to http, but seems safer this way. Also refactored proxyurl -> proxyloc for the runtime args in order to match the Dockerfile args. * Add tor/proxy support for Docker builds, fix opensearch/init The Dockerfile is now updated to include support for Tor configuration, with a working torrc file included in the repo. An issue with opensearch was fixed as well, which was uncovered during testing and was simple enough to fix here. Likewise, DDG bang gen was updated to only ever happen if the file didn't exist previously, as testing with the file being regenerated every time was tedious. * Add missing "@" for socks proxy requests	2020-10-28 20:47:42 -04:00
Ben Busby	975ece8cd0	Privacy respecting alternatives in results view (#106 ) Full implementation of social media alt redirects (twitter/youtube/instagram -> nitter/invidious/bibliogram) depending on configuration. Verbatim search and option to ignore search autocorrect are now supported as well. Also cleaned up the javascript side of whoogle config so that it now uses arrays of available fields for parsing config values instead of manually assigning each one to a variable. This doesn't include support for Google Maps -> Open Street Maps, that seems a bit more involved than the social media redirects were, so it should likely be a separate effort.	2020-07-26 11:53:59 -06:00
Joao A. Candido Ramos	bf4bf1ff2c	Split interface and results language config (#89 ) Adding support to choose separately the language of search and the one for the interface (allowing a default givent by google). Co-authored-by: Joao <ramos.joao@protonmail.com>	2020-06-27 14:23:17 -06:00
Ben Busby	f86a44b637	Removed no-cache enforcement, minor styling/formatting improvements	2020-06-11 12:14:57 -06:00
Ben Busby	4324fcd8f8	Added better multilingual support, updated filter Results page now includes method for switching to "All Languages" from whichever language is specified as the primary in the config (see #74). Also removes the non-Whoogle links from the page footer, leaving only the page navigation controls Added support for the date range filter on the results page, though I'd still recommend using the ":past <unit>" query instead.	2020-06-07 14:06:49 -06:00
Ben Busby	b6fb4723f9	Project refactor (#85 ) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune ''). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style	2020-06-02 12:54:47 -06:00
Ben Busby	21012f5265	Feature: autocomplete/search suggestions (#72 ) Basic autocomplete/search suggestion functionality added * Adds new GET and POST routes for '/autocomplete' that accept a string query and returns an array of suggestions * Adds new autoscript.js file for handling queries on the main page and results view * Updated requests class to include autocomplete method * Updated opensearch template to handle search suggestions * Added header template to allow for autocomplete on results view * Updated readme to mention autocomplete feature	2020-05-24 14:03:11 -06:00
Ben Busby	09c53b52af	Feature: country and safe search config options (#71 ) * Added country and safe search config options * Updated handling of parser error in results test * Improved handling of default country * Added 1px empty gif fallback as a replacement for images that fail to load	2020-05-23 14:27:23 -06:00
Ben Busby	a11ceb0a57	Feature: language config (#27 ) * Added language configuration support Main page now has a dropdown for selecting preferred language of results. Refactored config to be its own model with language constants. * Added more language support Interface language is now updated using the "hl" arg Fixed chinese traditional and simplified values Updated decoding of characters to gb2312 * Updated to use conditional decoding dependent on language * Updated filter to not rely on valid config to work properly	2020-05-12 17:15:53 -06:00
Ben Busby	445019d204	Fixed RAM usage bug Pushing straight to master since this is an extremely simple fix, with a pretty large performance benefit. The Phyme library used for generating a User Agent rhyme was consuming an absolute unit of memory. Now that it's removed, it's using about 10x less memory, at the cost of User Agents being not as funny anymore.	2020-05-12 00:45:56 -06:00
Ben Busby	0300eab6df	Updated formatting and setup instructions Switched encoding from utf-8 to unicode-escape in an effort to support multiple languages besides English. Updated image results page formatting to fix bad image links (added TODO for adding full res image link for each image result). Updated README to include libcurl and libssl install instructions for manual setup.	2020-05-03 19:32:47 -06:00
Ben Busby	3e404cb524	Restructured valid params checking, added empty query redirect	2020-04-29 18:53:58 -06:00
Ben Busby	1cbe394e6f	Updated tests, fixed a few bugs Added opensearch routes test and individual tests for searching via GET and POST separately. Fixed incorrect assignment in gen_query.	2020-04-28 18:59:33 -06:00
Ben Busby	0c0ebb8917	Added POST search, encrypted query strings, refactoring The implementation of POST search support comes with a few benefits. The most apparent is the avoidance of search queries appearing in web server logs -- instead of the prior GET approach (i.e. /search?q=my+search+query), using POST requests with the query stored in the request body creates logs that simply appear as "/search". Since a lot of relative links are generated in the results page, I came up with a way to generate a unique key at run time that is used to encrypt any query strings before sending to the user. This benefits both regular text queries as well as fetching of image links and means that web logs will only show an encrypted string where a link or query string might slip through. Unfortunately, GET search requests still need to be supported, as it doesn't seem that Firefox (on iOS) supports loading search engines by their opensearch.xml file, but instead relies on manual entry of a search query string. Once this is updated, I'll probably remove GET request search support.	2020-04-28 18:19:34 -06:00
Ben Busby	4180aedd87	Added image proxying, refactored filter class Images were previously directly fetched from google search results, which was a potential privacy hazard. All image sources are now modified to be passed through shoogle's routing first, which will then fetch raw image data and pass it through to the user. Filter class was refactored to split the primary clean method into smaller, more manageable submethods.	2020-04-27 20:21:36 -06:00
Ben Busby	a7005c012e	Refactoring of user requests and routing Curl requests and user agent related functionality was moved to its own request class. Routes was refactored to only include strictly routing related functionality. Filter class was cleaned up (had routing/request related logic in here, which didn't make sense)	2020-04-23 20:59:43 -06:00

20 Commits (892b646a4e17a122f1dd299b549fa1f7a3569bdf)