whoogle-search

Commit Graph

Author	SHA1	Message	Date
xatier	b1e468ff01	Fix bug in title/url blocking regex (#969 ) Fix the exception `AttributeError: 'Filter' object has no attribute 'block_url'` introduced in this commit [1]. `self.block_title` and `self.block_url` were members of the Filter object[2], but not anymore after commit [1]. This bug can be reproduced with setting WHOOGLE_CONFIG_BLOCK_URL to a non-empty string. [1] `10a15e06e1` [2] `284a8102c8`	2023-03-14 11:22:53 -06:00
Ben Busby	fb8a2ea325	Include prefs arg in footer navigation Navigating between pages of results now includes the user's preferences string, which allows them to retain their config for a particular instance between result pages. Fixes #960	2023-02-21 09:57:44 -07:00
Ben Busby	991fe6d910	Exclude subdomain in Medium->Scribe redirects Medium redirects needed further cleanup to account for instances where a link contains a subdomain that would not make sense in a Farside redirect link. Fixes #947	2023-02-04 16:36:16 -07:00
MoistCat	08aa1ab8f1	Handle missing result div in filter (#911 ) Changed "find_all()[0]" for find; which yields only one result. Added check to ensure result_div exists before searching for results.	2022-12-29 15:17:34 -07:00
Ben Busby	fd85f1573a	Refactor site alt link replacement Replacing result links and text when site alts are enabled is now part of its own function, and handles replacement of link location and link description separately. Fixes #880	2022-12-05 13:28:29 -07:00
João	1aad47f2af	Fix bad internal redirection for google links (#850 )	2022-09-20 11:10:27 -06:00
Ben Busby	73dd5b80b5	Remove google prefs link for mismatched language queries Queries performed in a different language than what is configured contain a result div that prompts the user to configure their language preferences using google's preferences page. Since we want all language configuration to occur on Whoogle only, we can safely remove this result div. Fixes #444 Fixes #386	2022-08-01 13:46:06 -06:00
Ben Busby	78614877f2	Fix redirect for misspelled queries starting with `/` Fixes #818	2022-08-01 12:12:55 -06:00
Joao A. Candido Ramos	0d2d5fff5d	Fixes handling of maps (#792 ) * fixes map url, e.g. when no q parameter is given * move maps_args from results to filter where it is used	2022-06-27 12:33:08 -06:00
Joao A. Candido Ramos	d05ec08abf	Remove wildcard imports (#791 )	2022-06-24 10:51:15 -06:00
Joao A. Candido Ramos	ddb8931e68	Fix image links not being opened in new tab (#790 ) The majority of image links and links that are not handle by whoogle are not opening in new tabs, this allow links that are not related to the application to open in new tabs.	2022-06-24 10:50:14 -06:00
Ben Busby	65796fd1a5	Counter latest result page style changes Google updated their styling of the result page, which broke some components of Whoogle's result page styling (namely the result div backgrounds for dark mode). The GClasses class has been updated to keep track of what class names have been updated to, and roll them back to a value that works for Whoogle. A function was added that loops through new class names and replaces them with their older counterparts.	2022-06-09 16:35:02 -06:00
Ben Busby	ef98d85dc5	Ensure searches with a leading slash are treated as queries A user reported a bug where searches with a leading slash (in this case: "/e/OS apps" were interpreted as a Google specific link when clicking the next page of results. This was due to the behavior that Google's search results exhibit, where internal links for pages like support.google.com are delivered with params like "?q=/support" rather than a direct link. This fixes that scenario by checking the "q" param value against the user's original query to ensure they don't match before assuming that the result is intended as a redirect. Fixes #776	2022-06-03 14:03:57 -06:00
Joao A. Candido Ramos	fb6627a9cc	Remove duplicated handling of /url result links (#769 ) It appears that result links beginning with '/url' were mistakenly commited with an inefficient filtering process in its place. With the way the code is structured, this less effective '/url' link filter took precedence over the previous link filter, and also caused users with the "open link in new tab" config enabled to no longer have access to that feature. Fixes #769	2022-05-25 11:37:34 -06:00
invis-z	9bcd9931f7	Replace leading slash for image links (#762 ) The leading slash was previously removed without noticing it was part of a string replacement in #734. This caused the href of "View Image" contain a leading "/" which is wrong.	2022-05-25 11:18:17 -06:00
Ben Busby	fb600d6fc8	Improve G page distinction between footer and results Pages in the Whoogle footer that by default route to Google pages were previously being removed, but caused results that also routed to similar pages to no longer be accessible. This was due to the removal of the '/url' endpoint that Google uses for each result. To fix this, the result link is now parsed so that the domain of the result can be checked against the disallowed G page list. Since results are delivered in a "/url?q=<domain>" format -- even for pages to Google's own products -- and the footer links are formatted as "<product>.google.com", footer links are removed and result links are parsed correctly. Fixes #747	2022-05-16 09:53:48 -06:00
invis-z	b4d9f1f5e5	Remove "/" before endpoints & tags (#734 ) Removes the leading slash before imgres and other endpoints Fix #733	2022-04-27 14:25:14 -06:00
Ben Busby	a9b675cd24	Strip trailing slash on root url in filter If a trailing slash is defined here, it causes the Whoogle instance to redirect these element requests back to the home page, causing unwanted behavior.	2022-04-20 14:55:19 -06:00
gdm85	6d362ca5c7	Add support for relative search results (#715 ) * Relativization of search results * Fix JavaScript error when opening images * Replace single-letter logo and remove sign-in link * Add `WHOOGLE_URL_PREFIX` env var to support relative path redirection The `WHOOGLE_URL_PREFIX` var can now be set to fix internal app redirects, such as the `/session` redirect performed on the first visit to the Whoogle home page. Co-authored-by: Ben Busby <contact@benbusby.com>	2022-04-18 15:27:45 -06:00
Ben Busby	9317d9217f	Support proxying results through Whoogle (aka "anonymous view") (#682 ) * Expand `/window` endpoint to behave like a proxy The `/window` endpoint was previously used as a type of proxy, but only for removing Javascript from the result page. This expands the existing functionality to allow users to proxy search result pages (with or without Javascript) through their Whoogle instance. * Implement filtering of remote content from css * Condense NoJS feature into Anonymous View Enabling NoJS now removes Javascript from the Anonymous View, rather than creating a separate option. * Exclude 'data:' urls from filter, add translations The 'data:' url must be allowed in results to view certain elements on the page, such as stars for review based results. Add translations for the remaining languages. * Add cssutils to requirements	2022-04-13 11:29:07 -06:00
Ben Busby	797372ecaa	Ignore blank alts if site alt config is enabled If the alt for a particular service is blank, the original source is used instead. Example: 1. Site alts enabled in config 2. User wants wikipedia links, not wikiless 3. WHOOGLE_ALT_WIKI set to "" 4. All available alt links redirected to farside, except wikipedia Fixes #704	2022-03-30 14:46:33 -06:00
Ben Busby	f7e3650728	Only remove G links in footer Links that were directed at G domains were previously removed universally, when really they only needed to be removed from the footer to reduce possible confusion caused by mixed Whoogle and G links. Fixes #656	2022-03-01 12:48:33 -07:00
DUO Labs	b2c048af92	Fix `collapse_sections` for `MINIMAL_MODE` (#654 )	2022-02-11 14:44:08 -07:00
DUO Labs	7c5094d37b	Check for soup body in `remove_site_blocks` (#651 ) Fixes error with `remove_site_blocks` in the Images tab	2022-02-11 14:42:11 -07:00
DUO Labs	502067addc	Clean "Show more results" of all site blocks (#646 )	2022-02-08 10:57:00 -07:00
Joao A. Candido Ramos	11099f7b1d	Use consistent header for all result types (#535 ) Introduces a header for switching between result types (i.e. "All", "News", etc) that is consistent between the different result types. Previously, image results had a tab header that was formatted in a drastically different manner, which was jarring when switching from a different result page to the Images page. Created a G class enum to reference class names returned in search results. As noted in the class doc, this should only be used/updated as a last resort, as class names change frequently. For some instances, such as replacing the tbm tab, it's a lot easier to just replace by header name than attempting to replace it based on how the element is structured. Also updated a few styles to revert the latest styling changes being applied by Google. Co-authored-by: jacr13 <ramos.joao@protonmail.com> Co-authored-by: Ben Busby <contact@benbusby.com>	2022-02-07 10:47:25 -07:00
DUO Labs	500942cb99	Update minimal mode for new Google formatting (#637 ) Google's latest formatting changes broke the modifications made when enabling `WHOOGLE_MINIMAL`. This updates the result filtering to work with the new changes. Fixes #634	2022-02-02 12:57:05 -07:00
Nitish Yadav	fc50359752	Improve formatting of collapsible infobox (#612 )	2022-01-18 13:47:35 -07:00
Ben Busby	10a15e06e1	Fix incorrect request type for image searches Previously had hardcoded POST requests for all requests that didn't use the header template (which currently is only the image tab). Also refactored how the Filter class works. It now requires a valid Config model to be provided, which is then set up as a class var that the filtering functions can use as needed, rather than setting specific values from the config as individual values (which was confusing and sloppy). Fixes #561	2021-12-06 21:39:50 -07:00
Ben Busby	e06ff85579	Improve public instance session management (#480 ) This introduces a new approach to handling user sessions, which should allow for users to set more reliable config settings on public instances. Previously, when a user with cookies disabled would update their config, this would modify the app's default config file, which would in turn cause new users to inherit these settings when visiting the app for the first time and cause users to inherit these settings when their current session cookie expired (which was after 30 days by default I believe). There was also some half-baked logic for determining on the backend whether or not a user had cookies disabled, which lead to some issues with out of control session file creation by Flask. Now, when a user visits the site, their initial request is forwarded to a session/<session id> endpoint, and during that subsequent request their current session id is matched against the one found in the url. If the ids match, the user has cookies enabled. If not, their original request is modified with a 'cookies_disabled' query param that tells Flask not to bother trying to set up a new session for that user, and instead just use the app's fallback Fernet key for encryption and the default config. Since attempting to create a session for a user with cookies disabled creates a new session file, there is now also a clean-up routine included in the new session decorator, which will remove all sessions that don't include a valid key in the dict. NOTE!!! This means that current user sessions on public instances will be cleared once this update is merged in. In the long run that's a good thing though, since this will allow session mgmt to be a lot more reliable overall for users regardless of their cookie preference. Individual user sessions still use a unique Fernet key for encrypting queries, but users with cookies disabled will use the default app key for encryption and decryption. Sessions are also now (semi)permanent and have a lifetime of 1 year.	2021-11-17 19:35:30 -07:00
Fabian Schilling	9ad1d60a47	Improve URL parsing for full size images (#521 ) Skip URLs that are not two-element lists Fixes #520	2021-11-02 16:22:24 -06:00
Ben Busby	90441b2668	Add WHOOGLE_MINIMAL to docs, tweak min mode logic Activating minimal mode should also remove all collapsed sections, if any are found. WHOOGLE_MINIMAL now documented in readme and app.json (for heroku).	2021-10-26 10:38:20 -06:00
DUO Labs	543f2b2a01	Add a "minimal mode" for condensing results (#485 ) If WHOOGLE_MINIMAL is set, all non-link results are removed from the view.	2021-10-26 10:35:12 -06:00
Ben Busby	8f70236403	Update domains used for scribe.rip replacements The levelup.gitconnected.com site is a Medium site that can also be replaced with scribe.rip whenever privacy respecting site alternatives are enabled in the config. Also modified how link descriptions are updated when that config is enabled (before it was missing replacements on quite a few descriptions).	2021-10-23 23:23:37 -06:00
Yadomin	284a8102c8	Block by result title or url using regex (#473 ) Allows blocking search results using a regex filter for either result title or result url	2021-10-20 20:01:04 -06:00
Laurent le Beau-Martin	1a3790c7b1	Only open external links in a new tab (#380 )	2021-08-24 09:06:41 -06:00
Ben Busby	38c38a772f	Find valid parent element when collapsing result content Previously if a result element marked for collapsing didn't have a valid "parent" element, the collapsing was skipped altogether. This loops through child elements until a valid parent is found (or if one isn't found, the element will not be collapsed).	2021-07-04 15:20:19 -04:00
Ben Busby	afd01820bb	Collapse long result sections into details/summary elements Sections such as "People also asked" and "related searches" typically take up a lot of room on the results page, and don't always have the most useful information. This checks for result elements with more than 7 child divs, extracts the section title, and wraps all elements in a "details" element that can be expanded/collapsed by the user. Note that this functionality existed previously (albeit not implemented as well), but due to changes in how Google returns searches (switching from using <h2> elements for section headers to <span> or <div> elements), the approach to collapsing these sections needed to be updated.	2021-06-23 18:59:57 -04:00
Ben Busby	d894bd347d	Handle error when parsing image result url	2021-06-16 10:40:18 -04:00
Ben Busby	614dceeb70	Add fallback interface/search lang + cleanup Since the interface language defaults to IP geolocation by google, the default language is now set to english. Still not sure if this is the best solution, but at least temporarily should clear up some confusion for users with instances deployed in countries outside of their own. Also performed some minor cleanup: - Updated name of strip_blocked_sites to clean_query - Added clean_query to list of jinja template functions - Ensured site block list doesn't contain duplicate filters	2021-06-04 11:09:30 -04:00
Ben Busby	cbe32a081e	Hotfix: extract only 'q' element from query string Occasionally the search results will contain links with arguments such as 'dq', which was being erroneously used in attempts to extract the 'q' element from query strings. This enforces that only links with '?q=' or '&q=' (elements with a standalone 'q' arg) will have the element extracted. I also refactored the naming of this element once extracted to be just 'q'. Although this seems counterintuitive, it makes a little more sense since this element is the one we're extracting. It's a vague url arg name, but it is what it is. Bump version to 0.5.2 for hotfix release	2021-05-29 12:22:37 -04:00
Ben Busby	43faaee77f	Hotfix: remove site filter for maps links The new site filter breaks links to Maps results, so filter.py needed to be updated to handle these links as a unique case. A new method was introduced to easily remove any "-site:..." filters from the query, which is now also used to format queries in the header template rather than manually removing the blocked site list within the template itself. Bumps version to 0.5.1 for releasing the bugfix Fixes #329	2021-05-27 12:01:57 -04:00
Joao A. Candido Ramos	448efb8f2a	Add "view image" functionality (#268 ) * add view image option * prevent whoogle links from opening in a new tab. * remove view image template on mobile requests * change loop values to be more robust to the number of images * Update app/templates/imageresults.html * fix "Basically the .cvifge class needs width: 100%; in order to expand the search input to fit the form width." * Update app/templates/imageresults.html * remove hardcoded string from template * Add view image config var to app.json * Add view image config var to whoogle.env Co-authored-by: jacr13 <ramos.joao@protonmail.com> Co-authored-by: Ben Busby <benbusby@protonmail.com>	2021-05-21 11:19:45 -04:00
Ben Busby	1030118d0b	Expand custom css theming support Also adds new default dark theme designed by @gripped.	2021-04-09 11:00:02 -04:00
Ben Busby	0b9600b564	Expand custom css variables and functionality Squashed commit of the following: commit 37e22d2945b077a94d9997d064f4355ff8819bae Author: Ben Busby <benbusby@protonmail.com> Date: Mon Apr 5 10:27:05 2021 -0400 Pass user config to logo template commit 2406fee05c3e221112fbe802fbf2ecca1df99127 Author: Ben Busby <benbusby@protonmail.com> Date: Mon Apr 5 10:24:54 2021 -0400 Fix incorrect contrast text in dark theme commit 91dd677e22c2e99819123154e03e9f519f95a9bd Author: Ben Busby <benbusby@protonmail.com> Date: Fri Apr 2 17:21:38 2021 -0400 Remove inline onclicks, fix svg sizing commit 91bbf9c0fae36febd6a6a0d8e6a560babe8622d5 Merge: 72637df b1227bd Author: Ben Busby <benbusby@protonmail.com> Date: Fri Apr 2 15:35:37 2021 -0400 Merge remote-tracking branch 'origin/develop' into custom-css-tweaks commit 72637df213f4b9e83e4b58fe76973de02f63ec8e Author: Ben Busby <benbusby@protonmail.com> Date: Fri Apr 2 11:38:38 2021 -0400 Use svg logo w/ custom styling on results pages commit 666a7ceac4a6e4d3fe1975dcee91e6094b66149e Author: Ben Busby <benbusby@protonmail.com> Date: Fri Apr 2 11:10:37 2021 -0400 Split whoogle-accent into whoogle-element-bg and whoogle-logo See discussion on #247	2021-04-05 11:00:56 -04:00
Ben Busby	df0b7afa50	Switch to single Fernet key per session This moves away from the previous (messy) approach of using two separate keys for decrypting text and element URLs separately and regenerating them for new searches. The current implementation of sessions is not very reliable, which lead to keys being regenerated too soon, which would break page navigation. Until that can be addressed, the single key per session approach should work a lot better. Fixes #250 Fixes #90	2021-04-05 11:00:56 -04:00
Ben Busby	8ad8e66d37	Improve static typing throughout repo Eventually this should be part of a separate mypy ci build, but right now it's just a general guideline. Future commits and PRs should be validated for static typing wherever possible. For reference, the testing commands used for this commit were: mypy --ignore-missing-imports --pretty --disallow-untyped-calls app/ mypy --ignore-missing-imports --pretty --disallow-untyped-calls test/	2021-04-05 11:00:56 -04:00
Ben Busby	f8dfc78539	Improve naming of _utils files, update fn/class doc The app/utils/_utils weren't named very well, and all have been updated to have more accurate names. Function and class documention for the utils have been updated as well, as part of the effort to improve overall documentation for the project.	2021-04-05 11:00:56 -04:00
Ben Busby	64567a63ea	Ensure G logo doesn't appear in mobile img results Adds a separate check to remove all images sourced from www.gstatic.com, which is where the mobile logo in particular is coming from.	2021-04-05 11:00:56 -04:00
Ben Busby	440c4e9c50	Remove lxml dependency The lxml dependency in the project was fairly unnecessary, and made the initial build time for the project considerably slower. This replaces all instances of lxml with either the default html parser (for bs4 constructors) or the built in xml.etree package (for search suggestion parsing).	2020-12-29 18:43:42 -05:00

1 2

91 Commits (f970b62f12708edad04d63fddbf7e5babd161879)