whoogle-search

Commit Graph

Author	SHA1	Message	Date
Ben Busby	9317d9217f	Support proxying results through Whoogle (aka "anonymous view") (#682 ) * Expand `/window` endpoint to behave like a proxy The `/window` endpoint was previously used as a type of proxy, but only for removing Javascript from the result page. This expands the existing functionality to allow users to proxy search result pages (with or without Javascript) through their Whoogle instance. * Implement filtering of remote content from css * Condense NoJS feature into Anonymous View Enabling NoJS now removes Javascript from the Anonymous View, rather than creating a separate option. * Exclude 'data:' urls from filter, add translations The 'data:' url must be allowed in results to view certain elements on the page, such as stars for review based results. Add translations for the remaining languages. * Add cssutils to requirements	2022-04-13 11:29:07 -06:00
Ben Busby	797372ecaa	Ignore blank alts if site alt config is enabled If the alt for a particular service is blank, the original source is used instead. Example: 1. Site alts enabled in config 2. User wants wikipedia links, not wikiless 3. WHOOGLE_ALT_WIKI set to "" 4. All available alt links redirected to farside, except wikipedia Fixes #704	2022-03-30 14:46:33 -06:00
Ben Busby	f7e3650728	Only remove G links in footer Links that were directed at G domains were previously removed universally, when really they only needed to be removed from the footer to reduce possible confusion caused by mixed Whoogle and G links. Fixes #656	2022-03-01 12:48:33 -07:00
DUO Labs	b2c048af92	Fix `collapse_sections` for `MINIMAL_MODE` (#654 )	2022-02-11 14:44:08 -07:00
DUO Labs	7c5094d37b	Check for soup body in `remove_site_blocks` (#651 ) Fixes error with `remove_site_blocks` in the Images tab	2022-02-11 14:42:11 -07:00
DUO Labs	502067addc	Clean "Show more results" of all site blocks (#646 )	2022-02-08 10:57:00 -07:00
Joao A. Candido Ramos	11099f7b1d	Use consistent header for all result types (#535 ) Introduces a header for switching between result types (i.e. "All", "News", etc) that is consistent between the different result types. Previously, image results had a tab header that was formatted in a drastically different manner, which was jarring when switching from a different result page to the Images page. Created a G class enum to reference class names returned in search results. As noted in the class doc, this should only be used/updated as a last resort, as class names change frequently. For some instances, such as replacing the tbm tab, it's a lot easier to just replace by header name than attempting to replace it based on how the element is structured. Also updated a few styles to revert the latest styling changes being applied by Google. Co-authored-by: jacr13 <ramos.joao@protonmail.com> Co-authored-by: Ben Busby <contact@benbusby.com>	2022-02-07 10:47:25 -07:00
DUO Labs	500942cb99	Update minimal mode for new Google formatting (#637 ) Google's latest formatting changes broke the modifications made when enabling `WHOOGLE_MINIMAL`. This updates the result filtering to work with the new changes. Fixes #634	2022-02-02 12:57:05 -07:00
Nitish Yadav	fc50359752	Improve formatting of collapsible infobox (#612 )	2022-01-18 13:47:35 -07:00
Ben Busby	10a15e06e1	Fix incorrect request type for image searches Previously had hardcoded POST requests for all requests that didn't use the header template (which currently is only the image tab). Also refactored how the Filter class works. It now requires a valid Config model to be provided, which is then set up as a class var that the filtering functions can use as needed, rather than setting specific values from the config as individual values (which was confusing and sloppy). Fixes #561	2021-12-06 21:39:50 -07:00
Ben Busby	e06ff85579	Improve public instance session management (#480 ) This introduces a new approach to handling user sessions, which should allow for users to set more reliable config settings on public instances. Previously, when a user with cookies disabled would update their config, this would modify the app's default config file, which would in turn cause new users to inherit these settings when visiting the app for the first time and cause users to inherit these settings when their current session cookie expired (which was after 30 days by default I believe). There was also some half-baked logic for determining on the backend whether or not a user had cookies disabled, which lead to some issues with out of control session file creation by Flask. Now, when a user visits the site, their initial request is forwarded to a session/<session id> endpoint, and during that subsequent request their current session id is matched against the one found in the url. If the ids match, the user has cookies enabled. If not, their original request is modified with a 'cookies_disabled' query param that tells Flask not to bother trying to set up a new session for that user, and instead just use the app's fallback Fernet key for encryption and the default config. Since attempting to create a session for a user with cookies disabled creates a new session file, there is now also a clean-up routine included in the new session decorator, which will remove all sessions that don't include a valid key in the dict. NOTE!!! This means that current user sessions on public instances will be cleared once this update is merged in. In the long run that's a good thing though, since this will allow session mgmt to be a lot more reliable overall for users regardless of their cookie preference. Individual user sessions still use a unique Fernet key for encrypting queries, but users with cookies disabled will use the default app key for encryption and decryption. Sessions are also now (semi)permanent and have a lifetime of 1 year.	2021-11-17 19:35:30 -07:00
Fabian Schilling	9ad1d60a47	Improve URL parsing for full size images (#521 ) Skip URLs that are not two-element lists Fixes #520	2021-11-02 16:22:24 -06:00
Ben Busby	90441b2668	Add WHOOGLE_MINIMAL to docs, tweak min mode logic Activating minimal mode should also remove all collapsed sections, if any are found. WHOOGLE_MINIMAL now documented in readme and app.json (for heroku).	2021-10-26 10:38:20 -06:00
DUO Labs	543f2b2a01	Add a "minimal mode" for condensing results (#485 ) If WHOOGLE_MINIMAL is set, all non-link results are removed from the view.	2021-10-26 10:35:12 -06:00
Ben Busby	8f70236403	Update domains used for scribe.rip replacements The levelup.gitconnected.com site is a Medium site that can also be replaced with scribe.rip whenever privacy respecting site alternatives are enabled in the config. Also modified how link descriptions are updated when that config is enabled (before it was missing replacements on quite a few descriptions).	2021-10-23 23:23:37 -06:00
Yadomin	284a8102c8	Block by result title or url using regex (#473 ) Allows blocking search results using a regex filter for either result title or result url	2021-10-20 20:01:04 -06:00
Laurent le Beau-Martin	1a3790c7b1	Only open external links in a new tab (#380 )	2021-08-24 09:06:41 -06:00
Ben Busby	38c38a772f	Find valid parent element when collapsing result content Previously if a result element marked for collapsing didn't have a valid "parent" element, the collapsing was skipped altogether. This loops through child elements until a valid parent is found (or if one isn't found, the element will not be collapsed).	2021-07-04 15:20:19 -04:00
Ben Busby	afd01820bb	Collapse long result sections into details/summary elements Sections such as "People also asked" and "related searches" typically take up a lot of room on the results page, and don't always have the most useful information. This checks for result elements with more than 7 child divs, extracts the section title, and wraps all elements in a "details" element that can be expanded/collapsed by the user. Note that this functionality existed previously (albeit not implemented as well), but due to changes in how Google returns searches (switching from using <h2> elements for section headers to <span> or <div> elements), the approach to collapsing these sections needed to be updated.	2021-06-23 18:59:57 -04:00
Ben Busby	d894bd347d	Handle error when parsing image result url	2021-06-16 10:40:18 -04:00
Ben Busby	614dceeb70	Add fallback interface/search lang + cleanup Since the interface language defaults to IP geolocation by google, the default language is now set to english. Still not sure if this is the best solution, but at least temporarily should clear up some confusion for users with instances deployed in countries outside of their own. Also performed some minor cleanup: - Updated name of strip_blocked_sites to clean_query - Added clean_query to list of jinja template functions - Ensured site block list doesn't contain duplicate filters	2021-06-04 11:09:30 -04:00
Ben Busby	cbe32a081e	Hotfix: extract only 'q' element from query string Occasionally the search results will contain links with arguments such as 'dq', which was being erroneously used in attempts to extract the 'q' element from query strings. This enforces that only links with '?q=' or '&q=' (elements with a standalone 'q' arg) will have the element extracted. I also refactored the naming of this element once extracted to be just 'q'. Although this seems counterintuitive, it makes a little more sense since this element is the one we're extracting. It's a vague url arg name, but it is what it is. Bump version to 0.5.2 for hotfix release	2021-05-29 12:22:37 -04:00
Ben Busby	43faaee77f	Hotfix: remove site filter for maps links The new site filter breaks links to Maps results, so filter.py needed to be updated to handle these links as a unique case. A new method was introduced to easily remove any "-site:..." filters from the query, which is now also used to format queries in the header template rather than manually removing the blocked site list within the template itself. Bumps version to 0.5.1 for releasing the bugfix Fixes #329	2021-05-27 12:01:57 -04:00
Joao A. Candido Ramos	448efb8f2a	Add "view image" functionality (#268 ) * add view image option * prevent whoogle links from opening in a new tab. * remove view image template on mobile requests * change loop values to be more robust to the number of images * Update app/templates/imageresults.html * fix "Basically the .cvifge class needs width: 100%; in order to expand the search input to fit the form width." * Update app/templates/imageresults.html * remove hardcoded string from template * Add view image config var to app.json * Add view image config var to whoogle.env Co-authored-by: jacr13 <ramos.joao@protonmail.com> Co-authored-by: Ben Busby <benbusby@protonmail.com>	2021-05-21 11:19:45 -04:00
Ben Busby	1030118d0b	Expand custom css theming support Also adds new default dark theme designed by @gripped.	2021-04-09 11:00:02 -04:00
Ben Busby	0b9600b564	Expand custom css variables and functionality Squashed commit of the following: commit 37e22d2945b077a94d9997d064f4355ff8819bae Author: Ben Busby <benbusby@protonmail.com> Date: Mon Apr 5 10:27:05 2021 -0400 Pass user config to logo template commit 2406fee05c3e221112fbe802fbf2ecca1df99127 Author: Ben Busby <benbusby@protonmail.com> Date: Mon Apr 5 10:24:54 2021 -0400 Fix incorrect contrast text in dark theme commit 91dd677e22c2e99819123154e03e9f519f95a9bd Author: Ben Busby <benbusby@protonmail.com> Date: Fri Apr 2 17:21:38 2021 -0400 Remove inline onclicks, fix svg sizing commit 91bbf9c0fae36febd6a6a0d8e6a560babe8622d5 Merge: 72637df b1227bd Author: Ben Busby <benbusby@protonmail.com> Date: Fri Apr 2 15:35:37 2021 -0400 Merge remote-tracking branch 'origin/develop' into custom-css-tweaks commit 72637df213f4b9e83e4b58fe76973de02f63ec8e Author: Ben Busby <benbusby@protonmail.com> Date: Fri Apr 2 11:38:38 2021 -0400 Use svg logo w/ custom styling on results pages commit 666a7ceac4a6e4d3fe1975dcee91e6094b66149e Author: Ben Busby <benbusby@protonmail.com> Date: Fri Apr 2 11:10:37 2021 -0400 Split whoogle-accent into whoogle-element-bg and whoogle-logo See discussion on #247	2021-04-05 11:00:56 -04:00
Ben Busby	df0b7afa50	Switch to single Fernet key per session This moves away from the previous (messy) approach of using two separate keys for decrypting text and element URLs separately and regenerating them for new searches. The current implementation of sessions is not very reliable, which lead to keys being regenerated too soon, which would break page navigation. Until that can be addressed, the single key per session approach should work a lot better. Fixes #250 Fixes #90	2021-04-05 11:00:56 -04:00
Ben Busby	8ad8e66d37	Improve static typing throughout repo Eventually this should be part of a separate mypy ci build, but right now it's just a general guideline. Future commits and PRs should be validated for static typing wherever possible. For reference, the testing commands used for this commit were: mypy --ignore-missing-imports --pretty --disallow-untyped-calls app/ mypy --ignore-missing-imports --pretty --disallow-untyped-calls test/	2021-04-05 11:00:56 -04:00
Ben Busby	f8dfc78539	Improve naming of _utils files, update fn/class doc The app/utils/_utils weren't named very well, and all have been updated to have more accurate names. Function and class documention for the utils have been updated as well, as part of the effort to improve overall documentation for the project.	2021-04-05 11:00:56 -04:00
Ben Busby	64567a63ea	Ensure G logo doesn't appear in mobile img results Adds a separate check to remove all images sourced from www.gstatic.com, which is where the mobile logo in particular is coming from.	2021-04-05 11:00:56 -04:00
Ben Busby	440c4e9c50	Remove lxml dependency The lxml dependency in the project was fairly unnecessary, and made the initial build time for the project considerably slower. This replaces all instances of lxml with either the default html parser (for bs4 constructors) or the built in xml.etree package (for search suggestion parsing).	2020-12-29 18:43:42 -05:00
Ben Busby	6e7ec9918a	Move language/country settings to app config Moves the language and country dicts from the config model to json files that are loaded during app init and stored in the app config dict. This substantially improves the readability of the config model and allows for much more sensible loading of the language/country options.	2020-12-17 16:42:05 -05:00
Ben Busby	375f4ee9fd	PEP-8: Fix formatting issues, add CI workflow (#161 ) Enforces PEP-8 formatting for all python code Adds a github action build for checking pep8 formatting using pycodestyle	2020-12-17 16:06:47 -05:00
Ben Busby	b695179c79	Add ability to collapse "people also ask" This adds a step in the filter process to wrap the "people also ask" section in a <details> element, which automatically collapses the contents of the section. Clicking/tapping the details element expands the view as normal. See #113	2020-12-15 11:09:48 -05:00
Ben Busby	e6db3112f7	Fix pagination bug for pages > 3 The pagination footer on the results page after page 2 has three actions (beginning, next, previous). The footer filter was updated to remove items with more than three actions to fix this. See #131	2020-12-07 20:38:57 -05:00
Ben Busby	72cbc342af	Add ability to set temp config in search query Dark mode, country, interface language, and search language configs can now be set in the search query by appending each option as a url parameter. Supported args are: 'dark', 'lang_search', 'lang_interface', and 'ctry' Ex: /search?q=%s&dark=1&lang_search=lang_en... These config settings persist across page navigation and switching result type, but will be reset if the main search bar is used. See #144	2020-11-11 00:40:49 -05:00
bugbounce	1148a7fb8d	Use relative links instead of absolute (#139 ) * Use relative links instead of absolute This allows for hosting under a subpath. For example if you want to host whoogle at example.com/whoogle, it should work better with a reverse proxy. * Use relative link for opensearch.xml	2020-10-29 11:09:31 -04:00
Ben Busby	f3bb1e22b4	Fix improper header styling, remove shopping tab links The header template was using Google's classes for the "Whoogle" logo, which meant keeping up with their list of colors used in the logo. The template was updated to only ever use the Whoogle logo color. Accordingly, the logo specific styling in filter.py was removed, since it is no longer needed. Also removes all links to the shopping tab, as it seems that the majority of the links to items are Google specific links (usually google.com/aclk links without any discernible param for determining the true location for the link). The shopping page should be addressed separately with unique filtering/formatting. Further tracking of this task will be followed in #136.	2020-10-25 13:52:30 -04:00
Ben Busby	9afe5f81bd	Updated dark theme (#121 ) * Implemented new dark theme Now uses a dedicated css file for all dark theme color changes, rather than replacing color codes directly. Color theme is from discussion in #60. * Minor link color update	2020-09-14 15:29:58 -04:00
Ben Busby	975ece8cd0	Privacy respecting alternatives in results view (#106 ) Full implementation of social media alt redirects (twitter/youtube/instagram -> nitter/invidious/bibliogram) depending on configuration. Verbatim search and option to ignore search autocorrect are now supported as well. Also cleaned up the javascript side of whoogle config so that it now uses arrays of available fields for parsing config values instead of manually assigning each one to a variable. This doesn't include support for Google Maps -> Open Street Maps, that seems a bit more involved than the social media redirects were, so it should likely be a separate effort.	2020-07-26 11:53:59 -06:00
Ben Busby	f7380ae15d	Improving ad filtering for non-English languages	2020-06-11 13:21:40 -06:00
Ben Busby	4324fcd8f8	Added better multilingual support, updated filter Results page now includes method for switching to "All Languages" from whichever language is specified as the primary in the config (see #74). Also removes the non-Whoogle links from the page footer, leaving only the page navigation controls Added support for the date range filter on the results page, though I'd still recommend using the ":past <unit>" query instead.	2020-06-07 14:06:49 -06:00
Ben Busby	b6fb4723f9	Project refactor (#85 ) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune ''). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style	2020-06-02 12:54:47 -06:00
Ben Busby	71ba00785f	Quick improvement to ad removal	2020-05-29 13:21:53 -06:00
Ben Busby	78939e7fb4	Reworked google url routing	2020-05-26 10:47:40 -06:00
Ben Busby	98d639883c	Fixing styling/url/safe mode inconsistencies	2020-05-26 10:39:19 -06:00
Ben Busby	21012f5265	Feature: autocomplete/search suggestions (#72 ) Basic autocomplete/search suggestion functionality added * Adds new GET and POST routes for '/autocomplete' that accept a string query and returns an array of suggestions * Adds new autoscript.js file for handling queries on the main page and results view * Updated requests class to include autocomplete method * Updated opensearch template to handle search suggestions * Added header template to allow for autocomplete on results view * Updated readme to mention autocomplete feature	2020-05-24 14:03:11 -06:00
Ben Busby	3dbe51e9e7	Removing google's filter card from results	2020-05-24 12:53:21 -06:00
Ben Busby	c51f186419	Added version footer, minor PEP 8 refactoring	2020-05-20 11:02:30 -06:00
Paul Rothrock	0e39b8f97b	Added "I'm feeling lucky" function (#46 ) * Putting '! ' at the beginning of the query now redirects to the first search result Signed-off-by: Paul Rothrock <paul@movetoiceland.com> * Moved get_first_url outside of filter class Signed-off-by: Paul Rothrock <paul@movetoiceland.com>	2020-05-18 10:28:23 -06:00

1 2

72 Commits (94b4eb08a2867a0c0a64187766cb327504ebfa43)