whoogle-search

Commit Graph

Author	SHA1	Message	Date
Ben Busby	9f68c843d6	Specify links that should trigger div removal from results There are certain links (such as the age verification link mentioned in issue #1083) that should trigger removal of the entire container div on the results page, rather than just hiding the link itself. This introduces a new `unsupported_g_divs` list that holds links that will trigger a removal of the result div on the result page. Fixes #1083	2023-11-01 14:30:23 -06:00
Ben Busby	81b7fd1876	Encrypt site icon requests Paths to favicons are now encrypted with the user's Fernet key, the same as any other external result page element	2023-10-11 17:18:25 -06:00
Ben Busby	a7e937f7c6	Skip scrollers when applying site icons to results Scroller results (like the "latest from ___" or "top stories" results) shouldn't have a site icon associated with them. This extracts the class that those types of results have and skips over the process of inserting an icon.	2023-10-11 15:58:52 -06:00
Ben Busby	c2873190c9	Display audio controls, refactor site icon placement Audio controls are now always shown by default (mostly found in searches that contain word pronunciation guides). Site icons were moved to the left side of the results.	2023-10-11 15:41:48 -06:00
Ben Busby	330ae964f3	Only sanitize result content on main result page The other result tabs (images/maps/videos/news) don't have text content that needs sanitizing. Fixes #1080	2023-10-11 11:09:09 -06:00
Ben Busby	4292ec7f63	Add icons for each search result This appends an icon element to each search result, using the result domain's "/favicon.ico" path. Note that some sites do not have a standard /favicon.ico, but have a unique path to a specifically sized favicon instead. Worse still, some sites use javascript to load their favicon, which would make it even more difficult for Whoogle to figure out. For now this approach is fine, but can be expanded upon in the future if desired.	2023-10-11 11:05:53 -06:00
Ben Busby	c36396e9cb	Sanitize valid html in result text content This inspects the text content of each individual result div and strips out valid 'script' or 'iframe' tags from the result. Closes #1076	2023-10-10 16:38:13 -06:00
Ahmad Alkadri	4a0089686e	Fix: `keep_blank_values = True` to handle blank `q` input (#1052 )	2023-08-21 14:53:10 -06:00
Andiru	29992985bc	Fix incorrect link replacements (#1016 ) Fix link/result description getting replaced when alternative is disabled (set to empty string) Replace medium.com links with value from constant	2023-06-26 15:47:43 -06:00
Abhishek M J	349b87ec18	Fix unsupported_g_pages in result list (#996 ) Closes #995	2023-05-01 10:23:57 -06:00
xatier	b1e468ff01	Fix bug in title/url blocking regex (#969 ) Fix the exception `AttributeError: 'Filter' object has no attribute 'block_url'` introduced in this commit [1]. `self.block_title` and `self.block_url` were members of the Filter object[2], but not anymore after commit [1]. This bug can be reproduced with setting WHOOGLE_CONFIG_BLOCK_URL to a non-empty string. [1] `10a15e06e1` [2] `284a8102c8`	2023-03-14 11:22:53 -06:00
Ben Busby	fb8a2ea325	Include prefs arg in footer navigation Navigating between pages of results now includes the user's preferences string, which allows them to retain their config for a particular instance between result pages. Fixes #960	2023-02-21 09:57:44 -07:00
Ben Busby	991fe6d910	Exclude subdomain in Medium->Scribe redirects Medium redirects needed further cleanup to account for instances where a link contains a subdomain that would not make sense in a Farside redirect link. Fixes #947	2023-02-04 16:36:16 -07:00
MoistCat	08aa1ab8f1	Handle missing result div in filter (#911 ) Changed "find_all()[0]" for find; which yields only one result. Added check to ensure result_div exists before searching for results.	2022-12-29 15:17:34 -07:00
Ben Busby	fd85f1573a	Refactor site alt link replacement Replacing result links and text when site alts are enabled is now part of its own function, and handles replacement of link location and link description separately. Fixes #880	2022-12-05 13:28:29 -07:00
João	1aad47f2af	Fix bad internal redirection for google links (#850 )	2022-09-20 11:10:27 -06:00
Ben Busby	73dd5b80b5	Remove google prefs link for mismatched language queries Queries performed in a different language than what is configured contain a result div that prompts the user to configure their language preferences using google's preferences page. Since we want all language configuration to occur on Whoogle only, we can safely remove this result div. Fixes #444 Fixes #386	2022-08-01 13:46:06 -06:00
Ben Busby	78614877f2	Fix redirect for misspelled queries starting with `/` Fixes #818	2022-08-01 12:12:55 -06:00
Joao A. Candido Ramos	0d2d5fff5d	Fixes handling of maps (#792 ) * fixes map url, e.g. when no q parameter is given * move maps_args from results to filter where it is used	2022-06-27 12:33:08 -06:00
Joao A. Candido Ramos	d05ec08abf	Remove wildcard imports (#791 )	2022-06-24 10:51:15 -06:00
Joao A. Candido Ramos	ddb8931e68	Fix image links not being opened in new tab (#790 ) The majority of image links and links that are not handle by whoogle are not opening in new tabs, this allow links that are not related to the application to open in new tabs.	2022-06-24 10:50:14 -06:00
Ben Busby	65796fd1a5	Counter latest result page style changes Google updated their styling of the result page, which broke some components of Whoogle's result page styling (namely the result div backgrounds for dark mode). The GClasses class has been updated to keep track of what class names have been updated to, and roll them back to a value that works for Whoogle. A function was added that loops through new class names and replaces them with their older counterparts.	2022-06-09 16:35:02 -06:00
Ben Busby	ef98d85dc5	Ensure searches with a leading slash are treated as queries A user reported a bug where searches with a leading slash (in this case: "/e/OS apps" were interpreted as a Google specific link when clicking the next page of results. This was due to the behavior that Google's search results exhibit, where internal links for pages like support.google.com are delivered with params like "?q=/support" rather than a direct link. This fixes that scenario by checking the "q" param value against the user's original query to ensure they don't match before assuming that the result is intended as a redirect. Fixes #776	2022-06-03 14:03:57 -06:00
Joao A. Candido Ramos	fb6627a9cc	Remove duplicated handling of /url result links (#769 ) It appears that result links beginning with '/url' were mistakenly commited with an inefficient filtering process in its place. With the way the code is structured, this less effective '/url' link filter took precedence over the previous link filter, and also caused users with the "open link in new tab" config enabled to no longer have access to that feature. Fixes #769	2022-05-25 11:37:34 -06:00
invis-z	9bcd9931f7	Replace leading slash for image links (#762 ) The leading slash was previously removed without noticing it was part of a string replacement in #734. This caused the href of "View Image" contain a leading "/" which is wrong.	2022-05-25 11:18:17 -06:00
Ben Busby	fb600d6fc8	Improve G page distinction between footer and results Pages in the Whoogle footer that by default route to Google pages were previously being removed, but caused results that also routed to similar pages to no longer be accessible. This was due to the removal of the '/url' endpoint that Google uses for each result. To fix this, the result link is now parsed so that the domain of the result can be checked against the disallowed G page list. Since results are delivered in a "/url?q=<domain>" format -- even for pages to Google's own products -- and the footer links are formatted as "<product>.google.com", footer links are removed and result links are parsed correctly. Fixes #747	2022-05-16 09:53:48 -06:00
invis-z	b4d9f1f5e5	Remove "/" before endpoints & tags (#734 ) Removes the leading slash before imgres and other endpoints Fix #733	2022-04-27 14:25:14 -06:00
Ben Busby	a9b675cd24	Strip trailing slash on root url in filter If a trailing slash is defined here, it causes the Whoogle instance to redirect these element requests back to the home page, causing unwanted behavior.	2022-04-20 14:55:19 -06:00
gdm85	6d362ca5c7	Add support for relative search results (#715 ) * Relativization of search results * Fix JavaScript error when opening images * Replace single-letter logo and remove sign-in link * Add `WHOOGLE_URL_PREFIX` env var to support relative path redirection The `WHOOGLE_URL_PREFIX` var can now be set to fix internal app redirects, such as the `/session` redirect performed on the first visit to the Whoogle home page. Co-authored-by: Ben Busby <contact@benbusby.com>	2022-04-18 15:27:45 -06:00
Ben Busby	9317d9217f	Support proxying results through Whoogle (aka "anonymous view") (#682 ) * Expand `/window` endpoint to behave like a proxy The `/window` endpoint was previously used as a type of proxy, but only for removing Javascript from the result page. This expands the existing functionality to allow users to proxy search result pages (with or without Javascript) through their Whoogle instance. * Implement filtering of remote content from css * Condense NoJS feature into Anonymous View Enabling NoJS now removes Javascript from the Anonymous View, rather than creating a separate option. * Exclude 'data:' urls from filter, add translations The 'data:' url must be allowed in results to view certain elements on the page, such as stars for review based results. Add translations for the remaining languages. * Add cssutils to requirements	2022-04-13 11:29:07 -06:00
Ben Busby	797372ecaa	Ignore blank alts if site alt config is enabled If the alt for a particular service is blank, the original source is used instead. Example: 1. Site alts enabled in config 2. User wants wikipedia links, not wikiless 3. WHOOGLE_ALT_WIKI set to "" 4. All available alt links redirected to farside, except wikipedia Fixes #704	2022-03-30 14:46:33 -06:00
Ben Busby	f7e3650728	Only remove G links in footer Links that were directed at G domains were previously removed universally, when really they only needed to be removed from the footer to reduce possible confusion caused by mixed Whoogle and G links. Fixes #656	2022-03-01 12:48:33 -07:00
DUO Labs	b2c048af92	Fix `collapse_sections` for `MINIMAL_MODE` (#654 )	2022-02-11 14:44:08 -07:00
DUO Labs	7c5094d37b	Check for soup body in `remove_site_blocks` (#651 ) Fixes error with `remove_site_blocks` in the Images tab	2022-02-11 14:42:11 -07:00
DUO Labs	502067addc	Clean "Show more results" of all site blocks (#646 )	2022-02-08 10:57:00 -07:00
Joao A. Candido Ramos	11099f7b1d	Use consistent header for all result types (#535 ) Introduces a header for switching between result types (i.e. "All", "News", etc) that is consistent between the different result types. Previously, image results had a tab header that was formatted in a drastically different manner, which was jarring when switching from a different result page to the Images page. Created a G class enum to reference class names returned in search results. As noted in the class doc, this should only be used/updated as a last resort, as class names change frequently. For some instances, such as replacing the tbm tab, it's a lot easier to just replace by header name than attempting to replace it based on how the element is structured. Also updated a few styles to revert the latest styling changes being applied by Google. Co-authored-by: jacr13 <ramos.joao@protonmail.com> Co-authored-by: Ben Busby <contact@benbusby.com>	2022-02-07 10:47:25 -07:00
DUO Labs	500942cb99	Update minimal mode for new Google formatting (#637 ) Google's latest formatting changes broke the modifications made when enabling `WHOOGLE_MINIMAL`. This updates the result filtering to work with the new changes. Fixes #634	2022-02-02 12:57:05 -07:00
Nitish Yadav	fc50359752	Improve formatting of collapsible infobox (#612 )	2022-01-18 13:47:35 -07:00
Ben Busby	10a15e06e1	Fix incorrect request type for image searches Previously had hardcoded POST requests for all requests that didn't use the header template (which currently is only the image tab). Also refactored how the Filter class works. It now requires a valid Config model to be provided, which is then set up as a class var that the filtering functions can use as needed, rather than setting specific values from the config as individual values (which was confusing and sloppy). Fixes #561	2021-12-06 21:39:50 -07:00
Ben Busby	e06ff85579	Improve public instance session management (#480 ) This introduces a new approach to handling user sessions, which should allow for users to set more reliable config settings on public instances. Previously, when a user with cookies disabled would update their config, this would modify the app's default config file, which would in turn cause new users to inherit these settings when visiting the app for the first time and cause users to inherit these settings when their current session cookie expired (which was after 30 days by default I believe). There was also some half-baked logic for determining on the backend whether or not a user had cookies disabled, which lead to some issues with out of control session file creation by Flask. Now, when a user visits the site, their initial request is forwarded to a session/<session id> endpoint, and during that subsequent request their current session id is matched against the one found in the url. If the ids match, the user has cookies enabled. If not, their original request is modified with a 'cookies_disabled' query param that tells Flask not to bother trying to set up a new session for that user, and instead just use the app's fallback Fernet key for encryption and the default config. Since attempting to create a session for a user with cookies disabled creates a new session file, there is now also a clean-up routine included in the new session decorator, which will remove all sessions that don't include a valid key in the dict. NOTE!!! This means that current user sessions on public instances will be cleared once this update is merged in. In the long run that's a good thing though, since this will allow session mgmt to be a lot more reliable overall for users regardless of their cookie preference. Individual user sessions still use a unique Fernet key for encrypting queries, but users with cookies disabled will use the default app key for encryption and decryption. Sessions are also now (semi)permanent and have a lifetime of 1 year.	2021-11-17 19:35:30 -07:00
Fabian Schilling	9ad1d60a47	Improve URL parsing for full size images (#521 ) Skip URLs that are not two-element lists Fixes #520	2021-11-02 16:22:24 -06:00
Ben Busby	90441b2668	Add WHOOGLE_MINIMAL to docs, tweak min mode logic Activating minimal mode should also remove all collapsed sections, if any are found. WHOOGLE_MINIMAL now documented in readme and app.json (for heroku).	2021-10-26 10:38:20 -06:00
DUO Labs	543f2b2a01	Add a "minimal mode" for condensing results (#485 ) If WHOOGLE_MINIMAL is set, all non-link results are removed from the view.	2021-10-26 10:35:12 -06:00
Ben Busby	8f70236403	Update domains used for scribe.rip replacements The levelup.gitconnected.com site is a Medium site that can also be replaced with scribe.rip whenever privacy respecting site alternatives are enabled in the config. Also modified how link descriptions are updated when that config is enabled (before it was missing replacements on quite a few descriptions).	2021-10-23 23:23:37 -06:00
Yadomin	284a8102c8	Block by result title or url using regex (#473 ) Allows blocking search results using a regex filter for either result title or result url	2021-10-20 20:01:04 -06:00
Laurent le Beau-Martin	1a3790c7b1	Only open external links in a new tab (#380 )	2021-08-24 09:06:41 -06:00
Ben Busby	38c38a772f	Find valid parent element when collapsing result content Previously if a result element marked for collapsing didn't have a valid "parent" element, the collapsing was skipped altogether. This loops through child elements until a valid parent is found (or if one isn't found, the element will not be collapsed).	2021-07-04 15:20:19 -04:00
Ben Busby	afd01820bb	Collapse long result sections into details/summary elements Sections such as "People also asked" and "related searches" typically take up a lot of room on the results page, and don't always have the most useful information. This checks for result elements with more than 7 child divs, extracts the section title, and wraps all elements in a "details" element that can be expanded/collapsed by the user. Note that this functionality existed previously (albeit not implemented as well), but due to changes in how Google returns searches (switching from using <h2> elements for section headers to <span> or <div> elements), the approach to collapsing these sections needed to be updated.	2021-06-23 18:59:57 -04:00
Ben Busby	d894bd347d	Handle error when parsing image result url	2021-06-16 10:40:18 -04:00
Ben Busby	614dceeb70	Add fallback interface/search lang + cleanup Since the interface language defaults to IP geolocation by google, the default language is now set to english. Still not sure if this is the best solution, but at least temporarily should clear up some confusion for users with instances deployed in countries outside of their own. Also performed some minor cleanup: - Updated name of strip_blocked_sites to clean_query - Added clean_query to list of jinja template functions - Ensured site block list doesn't contain duplicate filters	2021-06-04 11:09:30 -04:00

1 2 3

101 Commits (70dc750c7aa5af43af0780a7b16e7a207f70bcb4)