* Block websites in search results via user config
Adds a new config field "Block" to specify a comma separated list of
websites to block in search results. This is applied for all searches.
* Add test for blocking sites from search results
* Document WHOOGLE_CONFIG_BLOCK usage
* Strip '-site:' filters from query in header template
The 'behind the scenes' site filter applied for blocked sites was
appearing in the query field when navigating between search categories
(all -> images -> news, etc). This prevents the filter from appearing in
all except "images", since the image category uses a separate header.
This should eventually be addressed when the image page can begin using
the standard whoogle header, but until then, the filter will still
appear for image searches.
This moves away from the previous (messy) approach of using two separate
keys for decrypting text and element URLs separately and regenerating
them for new searches. The current implementation of sessions is not very
reliable, which lead to keys being regenerated too soon, which would
break page navigation. Until that can be addressed, the single
key per session approach should work a lot better.
Fixes#250Fixes#90
The app/utils/*_utils weren't named very well, and all have been updated
to have more accurate names.
Function and class documention for the utils have been updated as well,
as part of the effort to improve overall documentation for the project.
The BeautifulSoup constructur in gen_nojs needed to explicitly set
features='lxml' to silence a warning from the library.
Also temporarily disabled the site alts test since the results are too
unreliable. This should be moved to a unit test instead.
* Add ability to configure site alts w/ env vars
Site alternatives (i.e. twitter.com -> nitter.net) can now be configured
using environment variables:
WHOOGLE_ALT_TW='nitter.net' # twitter alt
WHOOGLE_ALT_YT='invidio.us' # youtube alt
WHOOGLE_ALT_IG='bibliogram.art/u' # instagram alt
Updated testing to confirm results have been modified.
* Add site alt vars to docker settings and readme
Full implementation of social media alt redirects (twitter/youtube/instagram -> nitter/invidious/bibliogram) depending on configuration.
Verbatim search and option to ignore search autocorrect are now supported as well.
Also cleaned up the javascript side of whoogle config so that it now
uses arrays of available fields for parsing config values instead of manually assigning each
one to a variable.
This doesn't include support for Google Maps -> Open Street Maps, that
seems a bit more involved than the social media redirects were, so it
should likely be a separate effort.
Updated to ensure a child span element is available before running a
test to verify the correct time range for the result. Need to come up
with a better way of ensuring uniform results across multiple tests,
since otherwise periodic changes in the returned results can cause tests
to fail.
* Major refactor of requests and session management
- Switches from pycurl to requests library
- Allows for less janky decoding, especially with non-latin character
sets
- Adds session level management of user configs
- Allows for each session to set its own config (people are probably
going to complain about this, though not sure if it'll be the same
number of people who are upset that their friends/family have to share
their config)
- Updates key gen/regen to more aggressively swap out keys after each
request
* Added ability to save/load configs by name
- New PUT method for config allows changing config with specified name
- New methods in js controller to handle loading/saving of configs
* Result formatting and removal of unused elements
- Fixed question section formatting from results page (added appropriate
padding and made questions styled as italic)
- Removed user agent display from main config settings
* Minor change to button label
* Fixed issue with "de-pickling" of flask session
Having a gitignore-everything ("*") file within a flask session folder seems to cause a
weird bug where the state of the app becomes unusable from continuously
trying to prune files listed in the gitignore (and it can't prune '*').
* Switched to pickling saved configs
* Updated ad/sponsored content filter and conf naming
Configs are now named with a .conf extension to allow for easier manual
cleanup/modification of named config files
Sponsored content now removed by basic string matching of span content
* Version bump to 0.2.0
* Fixed request.send return style
* Added country and safe search config options
* Updated handling of parser error in results test
* Improved handling of default country
* Added 1px empty gif fallback as a replacement for images that fail to load
For datetime spans in time-filtered search results, anything less than 7
characters or more than 15 can be guaranteed to not be properly
formatted dates (either "mm dd yyyy" or "xx days/months/weeks ago")
Was previously checking for non-inclusive max number of days (i.e.
filtering by past month would return a failed test if the result was
from exactly 31 days ago)