Using `format` for formatting bang queries caused a KeyError for some
searches, such as !hd (HUDOC). In that example, the URL returned in the
bangs json was `http://...#{%22fulltext%22:[%22{}%22]...`, where
standard formatting would not work due to the misidentification of
"fulltext" as a formatting key.
The logic has been updated to just replace the first occurence of "{}"
in the URL returned by the bangs dict.
Fixes#513
Due to how the response is now reformed into a new bsoup object when
bolding search query terms, creating an ip card for "my ip" searches
threw an error due to how the new bsoup object was initialized for the
"my ip" card. This passes the response in as a string instead.
Fixes#504
DDG style bang searches can now have the bang (!) at the end of
the search (i.e. "bologna w!" will now redirect to wikipedia just like
"bologna !w" would)
Since the request class is loaded prior to values being read from the
user's dotenv, the WHOOGLE_RESULT_PER_PAGE var wasn't being used for
searches.
This moves the definition of the base search url to be intialized in the
request class to address this issue.
Fixes#497
variables.css doesn't need to be loaded by any template, since
WHOOGLE_CONFIG_STYLE loads those values by default when not set
explicitly. Loading the stylesheet caused the logo colors to be
persistent unless set individually.
Sorry @gripped for sneaking all of this unnecessary color in...
Fixes#492
This modifies the search result page by bold-ing all appearances
of any word in the original query. If portions of the query are in
quotes (i.e. "ice cream"), only exact matches of the sequence of
words will be made bold.
Co-authored-by: Ben Busby <noreply+git@benbusby.com>
Activating minimal mode should also remove all collapsed sections, if
any are found.
WHOOGLE_MINIMAL now documented in readme and app.json (for heroku).
I've gotten a bit bored of the current light/dark themes, so I'm
switching the default theme over to the Doppelganger theme, which is a
better template/jumping off point for users to use when creating custom
themes since it also provides examples for coloring each of the Whoogle
logo letters.
The levelup.gitconnected.com site is a Medium site that can also be
replaced with scribe.rip whenever privacy respecting site alternatives
are enabled in the config.
Also modified how link descriptions are updated when that config is
enabled (before it was missing replacements on quite a few
descriptions).
This introduces a new UI element for displaying the client IP
address when a search for "my ip" is used.
Note that this does not show the IP address seen by Google
if Whoogle is deployed remotely. It uses `request.remote_addr`
to display the client IP address in the UI, not the actual address
of the server (which is what Google sees in requests sent from
remote Whoogle instances).
scribe.rip is a privacy respecting front end for medium.com. This
feature allows medium.com results to be replaced with scribe.rip links,
and works for both regular medium.com domains as well as user specific
subdomains (i.e. user.medium.com).
[scribe.rip website](https://scribe.rip)
[scribe.rip source code](https://git.sr.ht/~edwardloveall/scribe)
Co-authored-by: Ben Busby <noreply+git@benbusby.com>
Used in header templates for navigating back to the home page when
behind a reverse proxy config where the app is running from a subpath of
a domain (i.e. "https://something/whoogle/")
Fixes#403
There are a few conventional choices but this one should be friendly
and generally accepted by local reader.
Previous version is still comprehensible but lesser users (perhaps
used in Japanese documents) and may give local users a pause.
Restricting form-action to 'self' in the content security policy
prevented Chrome (and likely other browsers) from using !bangs on the
home page.
Fixes#408
Previously if a result element marked for collapsing didn't have a valid
"parent" element, the collapsing was skipped altogether. This loops
through child elements until a valid parent is found (or if one isn't
found, the element will not be collapsed).
On app init, short hashes are generated from file checksums to use for
cache busting. These hashes are added into the full file name and used
to symlink to the actual file contents. These symlinks are loaded in the
jinja templates for each page, and can tell the browser to load a new
file if the hash changes.
This is only in place for css and js files, but can be extended in the
future for other file types if needed.
Introduces a new config element and environment variable
(WHOOGLE_CONFIG_THEME) for setting the theme of the app. Rather than
just having either light or dark, this allows a user to have their
instance use their current system light/dark preference to determine the
theme to use.
As a result, the dark mode setting (and WHOOGLE_CONFIG_DARK) have been
deprecated, but will still work as expected until a system theme has
been chosen.
Sections such as "People also asked" and "related searches" typically
take up a lot of room on the results page, and don't always have the
most useful information. This checks for result elements with more than
7 child divs, extracts the section title, and wraps all elements in a
"details" element that can be expanded/collapsed by the user.
Note that this functionality existed previously (albeit not implemented
as well), but due to changes in how Google returns searches (switching
from using <h2> elements for section headers to <span> or <div>
elements), the approach to collapsing these sections needed to be
updated.
* Add support for Lingva translations in results
Searches that contain the word "translate" and are normal search queries
(i.e. not news/images/video/etc) now create an iframe to a Lingva url to
translate the user's search using their configured search language.
The Lingva url can be configured using the WHOOGLE_ALT_TL env var, or
will fall back to the official Lingva instance url (lingva.ml).
For more info, visit https://github.com/TheDavidDelta/lingva-translate
* Add basic test for lingva results
* Allow user specified lingva instances through csp frame-src
* Fix pep8 issue
A recent issue brought up a good point about how the latest changes to
setting default language to english break functionality for bilingual
users. The change was likely not the best solution for users who were
being affected by IP geolocation on their instances -- the right
solution for that would be to configure the interface/search language to
their preference instead.
The requests library requires both 'http' and 'https' values in any
included proxy dict, and whoogle was previously copying the http proxy
to https for simplicity. The assumption was that if the underlying
request wasn't able to connect via https, it would default to http
(otherwise why have the requirement to specify both?)
This led to connectivity issues for users with http only proxies as of
the latest urllib and requests package versions, which are a lot more
strict with connections over https. With the latest versions, if an
https connection cannot be made, the library returns an error.
As a result, the new proxy dict must look something like this for plain
http proxies:
{'http': 'http://domain.tld:port', 'https': 'http://domain.tld:port'}
where both http and https are identical, but both are still required.
Since the interface language defaults to IP geolocation by google, the
default language is now set to english. Still not sure if this is the
best solution, but at least temporarily should clear up some confusion
for users with instances deployed in countries outside of their own.
Also performed some minor cleanup:
- Updated name of strip_blocked_sites to clean_query
- Added clean_query to list of jinja template functions
- Ensured site block list doesn't contain duplicate filters
Occasionally the search results will contain links with arguments such
as 'dq', which was being erroneously used in attempts to extract the 'q'
element from query strings. This enforces that only links with '?q=' or
'&q=' (elements with a standalone 'q' arg) will have the element
extracted.
I also refactored the naming of this element once extracted to be just
'q'. Although this seems counterintuitive, it makes a little more sense
since this element is the one we're extracting. It's a vague url arg
name, but it is what it is.
Bump version to 0.5.2 for hotfix release
The new site filter breaks links to Maps results, so filter.py needed
to be updated to handle these links as a unique case. A new method was
introduced to easily remove any "-site:..." filters from the query,
which is now also used to format queries in the header template rather
than manually removing the blocked site list within the template itself.
Bumps version to 0.5.1 for releasing the bugfix
Fixes#329
* Replace hardcoded strings using translation json file
This introduces a new "translations.json" file under app/static/settings
that is loaded on app init and uses the user config value for interface
language to determine the appropriate strings to use in Whoogle-specific
elements of the UI (primarily only on the home page).
* Verify interface lang can be used for localization
Check the configured interface language against the available
localization dict before attempting to use, otherwise fall back to
english.
Also expanded language names in the languages json file.
* Add test for validating translation language keys
Also adds Spanish translation to json (the only non-English language I
can add and reasonably validate on my own).
* Validate all translations against original keyset, update readme
Readme has been updated to include basic contributing guidelines for
both code and translations.
* add view image option
* prevent whoogle links from opening in a new tab.
* remove view image template on mobile requests
* change loop values to be more robust to the number of images
* Update app/templates/imageresults.html
* fix "Basically the .cvifge class needs width: 100%; in order to expand the search input to fit the form width."
* Update app/templates/imageresults.html
* remove hardcoded string from template
* Add view image config var to app.json
* Add view image config var to whoogle.env
Co-authored-by: jacr13 <ramos.joao@protonmail.com>
Co-authored-by: Ben Busby <benbusby@protonmail.com>
The wget method seemed to have a possible issue with creating endless
index.html copies (despite being specified to output to console only),
so this has been updated to use curl instead.
Also uses new non-authenticated "healthz" route to perform the
healthcheck.
Fix#316Fix#313
The previous method of removing all site filters from the search query
removed the last letter of the search. This only applies the substring
filter if any site filters are present in the query.
Fixes#306
* Block websites in search results via user config
Adds a new config field "Block" to specify a comma separated list of
websites to block in search results. This is applied for all searches.
* Add test for blocking sites from search results
* Document WHOOGLE_CONFIG_BLOCK usage
* Strip '-site:' filters from query in header template
The 'behind the scenes' site filter applied for blocked sites was
appearing in the query field when navigating between search categories
(all -> images -> news, etc). This prevents the filter from appearing in
all except "images", since the image category uses a separate header.
This should eventually be addressed when the image page can begin using
the standard whoogle header, but until then, the filter will still
appear for image searches.
* Add option to disable changing of configuration
Introduces a test to ensure the correct response code is found when
attempting to update the config when disabled, and ensure default config
is unchanged when posting a new config dict.
Attempting to update the config using the API when disabled now returns
a 403 code + redirect.
Co-authored-by: Ben Busby <benbusby@protonmail.com>
The search language is now set using the WHOOGLE_CONFIG_SEARCH_LANGUAGE
environment variable. Interface language is still set using
WHOOGLE_CONFIG_LANGUAGE.
Fixes#260
Enforces 0 margin for the search input form on the result page, which
removes the weird gap that is seen by default.
Also made minor changes to the border styling. Desktop searches now have
a single bottom border in dark mode rather than an all around border,
and the border around the mobile search result input was removed
entirely.
This was unfortunately a bit more complex than just adding an HTML reset
button, since reset buttons only "reset" input content to its original
value rather than clearing it. This doesn't work for Whoogle's needs,
since inputs on search result pages are auto populated with the search
content as their default value.
A reset button was introduced anyways, but is controlled by a few lines
of javascript to allow completely clearing the search input. The button
will only appear on mobile searches.
At the moment, it isn't particularly pretty, but is functional. It uses
just a plain "x" character and is always visible on mobile search result
pages. This leaves plenty of room for improvement moving forward.
Fixes#291
The recent change to cast bool config vars as ints to handle a '0' or
'1' value was shortsighted, since it doesn't allow for instances where
the variable is set to an empty value (or '' or any invalid/non-int
value).
This introduces a read_config_bool method for reading values that should
be a '0' or '1', but will default to False if not a digit (otherwise the
value will be cast as bool(int(value)) if "value" is a digit str).
Fixes#288
Config boolean environment variables need to be cast to ints, since
they are set or unset using 0 and 1. Previously they were interpreted as
(pseudocode) read_var(name, default=False), which meant that setting
CONFIG_VAR=0 would enable that variable since Python reads environment
variables as strings, and '0' is truthy. This updates the previous logic
to (still pseudocode) int(read_var(name, default='0')).
Fixes#279
Both light and dark themes have been updated to remove the leftover
hardcoded values (mostly related to the search suggestion styling).
See discussion in #247.
The logging from imported modules (stem, in particular) has caused quite
a few users to assume there are errors where there aren't any. The logs
from stem also aren't helpful, as everything in the library works as
expected despite the implication from the logs that it is not working.
Randomizing the "Mozilla" portion of the user agent changed the
character encoding to GB2312. Setting it to plain "Mozilla" enforces
UTF-8 encoding.
Bump to version 0.4.1 for release of bug fix
Fixes#267