Commit Graph

787 Commits (aaf90b52bb919fa3c6146c807a6419f676301ca6)

Author SHA1 Message Date
Ben Busby a58f70ca7e
Fix wikipedia->wikiless domain replacement
Was previously using wikipedia.com not wikipedia.org, causing wikiless
replacements to not occur.

Fixes #686
2022-03-21 10:01:21 -06:00
Ben Busby 2a0ad8796c
Switch to defusedxml for xml parsing
xml.etree.ElementTree.fromstring is considered insecure, see:
https://docs.python.org/3/library/xml.etree.elementtree.html

The defusedxml package contains several Python-only workarounds and
fixes for denial of service and other vulnerabilities in Python's XML
libraries: https://github.com/tiran/defusedxml

Fixes #670
2022-03-01 12:54:32 -07:00
Ben Busby f7e3650728
Only remove G links in footer
Links that were directed at G domains were previously removed
universally, when really they only needed to be removed from the footer
to reduce possible confusion caused by mixed Whoogle and G links.

Fixes #656
2022-03-01 12:48:33 -07:00
Ben Busby 69f845a047
Add test for empty bang behavior
Also fix pep8 issue
2022-03-01 12:13:40 -07:00
Ben Busby 809520ec70
Fallback to home page for empty bang searches
Bang searches without an actual query (i.e. just searching "!gh") will
now redirect to the home page. I guess people do this for some reason
and don't like that it redirects to the correct bang result URL, but
without an actual search term.

Fixes #595
2022-03-01 12:06:59 -07:00
Ben Busby b28fa86e33
Update ad filter
Recent changes to ads in search results caused Whoogle to display ads
for certain searches. In particular, ads recently started appearing
grouped into one div, as opposed to a singular ad per div. This was
accompanied by the div label "ads" (instead of just "ad"), which threw
off the existing ad filter. The ad keyword blacklist has been updated
accordingly, and has been enhanced to only check against alpha chars for
each label.

This only seems to have affected English language searches, and only for
very specific searches.
2022-02-25 23:02:58 -07:00
jan Anja 5069838e69
Configure setup() using setup.cfg (#667)
Dependencies are not read from requirements.txt intentionally, so only
direct dependencies without version pinning are included.

Setuptools documentation:
https://setuptools.pypa.io/en/latest/userguide/declarative_config.html
2022-02-25 15:29:54 -07:00
Albony Cal c3634a5135
Upgrade Python image in Dockerfile (#669)
Vulnerable Python image upgraded to python:3.11.0a5-alpine
2022-02-23 09:33:46 -07:00
Ben Busby e72d8437f7
[Docker] Split config dir creation/set permissions
If the config dir already exists, setting the mode (`-m 777`) doesn't
actually work as it should. This change splits the command into two
separate commands for directory creation and enabling the directory to
be writable by all.

Fixes #658
2022-02-21 09:33:30 -07:00
Ben Busby 9984158ec1
Ensure valid str->float conv in currency calc
Currency amounts returned by google seem to randomly include unicode
chars ('\xa0' noted in #642) which broke the currency calculator
included in the project. This ensures that only strings that can be
converted to float are ever used in the conversion.

Fixes #642
2022-02-17 16:33:44 -07:00
Nitish Yadav 0e711beca7
Give `Accept-Language` div its own class (#659)
Fixes accidental assignment of "get-only" class to the
"Accept-Language" config option
2022-02-16 09:23:38 -07:00
Ben Busby 23402e27e1
Check for updates using 24 hour time delta
Rather than only checking for an available update on app init, the check
for updates now performs the check once every 24 hours on the first
request sent after that period.

This also now catches the requests.exceptions.ConnectionError that is
thrown if the app is initialized without an active internet connection.

Fixes #649
2022-02-14 12:19:02 -07:00
Ben Busby d33e8241dc
Fix "my ip" search regression
Removes dependency on class names for creating the "my ip" info card in
the results list for searches pertaining to the user's public IP.

Adds test to prevent this from happening again.

Note to anyone reading this and looking to contribute: please avoid
using hardcoded class names at all costs. This approach of
creating/removing content just results in issues if/when Google decides
to introduce/remove class names from the result page.

Fixes #657
2022-02-14 11:40:11 -07:00
DUO Labs b2c048af92
Fix `collapse_sections` for `MINIMAL_MODE` (#654) 2022-02-11 14:44:08 -07:00
DUO Labs 7c5094d37b
Check for soup body in `remove_site_blocks` (#651)
Fixes error with `remove_site_blocks` in the Images tab
2022-02-11 14:42:11 -07:00
Ben Busby c6c9965335
Add new public instances to txt list [skip ci]
Missing from #650
2022-02-10 12:32:57 -07:00
Kainoa Kanter 4eafe0a5b0
Add gowogle.voring.me as public instance (#650)
Also removes fosshost instance from readme

From @benbusby:
I'm unable to get in touch with fosshost support about the whoogle
instance being unavailable, and am no longer interested in
maintaining the instance due to the lack of communication.
2022-02-10 12:30:33 -07:00
Ben Busby 070c327642
Add public instance to instance list [skip ci]
https://whoogle.esmailelbob.xyz

Amendment to #647
2022-02-08 11:22:07 -07:00
Esmail EL BoB 558a627a73
Add new instance to readme [skip ci] (#647)
https://whoogle.esmailelbob.xyz
2022-02-08 11:20:23 -07:00
DUO Labs 502067addc
Clean "Show more results" of all site blocks (#646) 2022-02-08 10:57:00 -07:00
Joao A. Candido Ramos 11099f7b1d
Use consistent header for all result types (#535)
Introduces a header for switching between result types (i.e. "All", "News",
etc) that is consistent between the different result types. Previously, image
results had a tab header that was formatted in a drastically different manner,
which was jarring when switching from a different result page to the Images
page.

Created a G class enum to reference class names returned in search
results. As noted in the class doc, this should only be used/updated as
a last resort, as class names change frequently. For some instances,
such as replacing the tbm tab, it's a lot easier to just replace by
header name than attempting to replace it based on how the element is
structured.

Also updated a few styles to revert the latest styling changes being
applied by Google.

Co-authored-by: jacr13 <ramos.joao@protonmail.com>
Co-authored-by: Ben Busby <contact@benbusby.com>
2022-02-07 10:47:25 -07:00
සයුරි | Sayuri 4aa94a5d75
Fix Sinhala translation for farside search (#594) 2022-02-04 16:16:56 -07:00
DUO Labs 500942cb99
Update minimal mode for new Google formatting (#637)
Google's latest formatting changes broke the modifications made when enabling
`WHOOGLE_MINIMAL`. This updates the result filtering to work with the new
changes.

Fixes #634
2022-02-02 12:57:05 -07:00
Ben Busby b393e68d1d
Fix incorrect min-width for mobile screen sizes
min-width was previously set to 736px for all screen sizes, which forced
content off screen for smaller devices such as mobile phones. This
modifies the search stylesheet to only apply a min-width style to
devices > 800px wide.
2022-02-01 20:36:53 -07:00
Ben Busby 63301efb28
Push images to ghcr.io
Alternative container registries like ghcr.io are a good option for anyone
seeking to avoid things like docker hub's latest changes to rate limiting
2022-02-01 18:02:59 -07:00
Ben Busby e3394e29dd
Amend body width formatting in search css
`min-width` is a better field to override than `max-width`, since some
users prefer full width results.
2022-02-01 17:24:12 -07:00
Ben Busby 9ba73331aa
Override new Google search result formatting
There have been some recent formatting changes made by Google for search
results that do not look good (especially for dark themes). This
mostly overrides those styles to resemble the original Whoogle
result formatting.
2022-02-01 17:15:48 -07:00
Ben Busby 33f56bb0cb
Read `WHOOGLE_CONFIG_DISABLE` var as bool in app init
Fixes #636, which pointed out that the var was being interpreted as
"active" (config hidden) regardless of the value that was set.
2022-02-01 15:29:22 -07:00
Ben Busby fef280a0c9
Add note for fosshost instance [skip ci]
The fosshost team decommissioned the region that Whoogle was hosted in,
but hasn't provided an option to transfer the domain record to the new VM. Until
that is fixed, the instance is inaccessible.
2022-02-01 12:39:10 -07:00
Ben Busby df6aa59fbf
Run buildx workflow on new tag
Fixes #630
2022-02-01 10:55:41 -07:00
Ben Busby 3918c60d87
Remove broken public instance [skip ci]
search.exonip.de now redirects to startpage

Fixes #635
2022-02-01 10:11:59 -07:00
Ben Busby 1af4566991
Bump version to 0.7.1 2022-01-26 10:41:41 -07:00
Ben Busby 4dd2c581ac
Add nightly container vuln scan
Introduces a new 'scan' workflow for scanning the main branch container for
vulnerabilities nightly. By default, this will fail for any 'medium' or higher
vulnerability. 

Fixes #613
2022-01-25 13:52:43 -07:00
Ben Busby 9cbd7bd9d3
Remove bash dependency
Depending on bash wasn't strictly necessary, as the two minimal scripts
in the repo were both nearly POSIX anyways.

Aside from simplifying the repo's dependencies a little bit, this also
helps reduce the overall Docker image size as an added bonus.
2022-01-25 13:07:21 -07:00
Ben Busby 2e3c647591
Use `test` image tag for docker-compose tests
Also adds the ability to overwrite the image in docker-compose.yml,
which allows the CI build to use the same image for all docker tests.
The default is still 'benbusby/whoogle-search' though.
2022-01-25 12:42:24 -07:00
Ben Busby 863cbb2b8d
Remove trailing whitespace 2022-01-25 12:31:19 -07:00
Ben Busby 72e5a227c8
Move bangs init to bg thread
Initializing the DDG bangs when running whoogle for the first time
creates an indeterminate amount of delay before the app becomes usable,
which makes usability tests (particularly w/ Docker) unreliable. This
moves the bang json init to a background thread and writes a temporary
empty dict to the bangs json file until the full bangs json can be used.
2022-01-25 12:28:06 -07:00
Ben Busby 6d178342ee
Refactor Docker CI workflows
Split previous docker test CI into one for PRs and one for triggering
the main buildx workflow that deploys new images to Docker Hub.

Note that this needs to be further refactored soon to use reusable
workflows. The main portion of docker/docker-compose tests is duplicated
between the new main + test workflows.
2022-01-25 11:42:29 -07:00
nakoo 0b70962e0c
Fix docker-compose.yml permission errors (#623) 2022-01-25 11:06:46 -07:00
ras07 ecb4277e69
Run container as non-root `whoogle` user (#617)
Creates a non-root user ("whoogle"), and runs the container as that user.
2022-01-21 13:51:51 -07:00
ras07 09a0039a38
Make `/config` directory writable by all (#616)
The `/config` directory needs to be writable by all in order to run the container
as a non-root user.
2022-01-21 12:16:51 -07:00
Nitish Yadav fc50359752
Improve formatting of collapsible infobox (#612) 2022-01-18 13:47:35 -07:00
DUO Labs 257e3f33ef
Skip loading autocomplete.js if `WHOOGLE_AUTOCOMPLETE=0` (#611)
Bypasses autocomplete.js if `WHOOGLE_AUTOCOMPLETE` is set to 0
2022-01-18 13:39:56 -07:00
Ben Busby 4dd01cdfda
Fix Dockerfile syntax errors 2022-01-14 10:05:24 -07:00
DUO Labs 74cb48086c
Introduce site alts for imgur and wikipedia (#609)
* Add `WHOOGLE_ALT_IMG` for a replacement for imgur.

* Add `WHOOGLE_ALT_WIKI` for Wikipedia
2022-01-14 09:59:03 -07:00
Ben Busby ded787547a
Exclude opensearch route from session validation
Fixes #588
2022-01-11 10:50:35 -07:00
domokosdcs0 31f4c00aee
Add new instance [skip ci] (#604)
https://whoogle.dcs0.hu
2022-01-11 10:06:57 -07:00
Ben Busby f4b65be876
Catch invalid XML in suggestion response
As reported in #593, the XML response body returned for search
suggestions can apparently contain invalid XML elements. This catches
the error and returns an empty suggestion list instead of erroring.

Fixes #593
2021-12-28 11:38:18 -07:00
Ben Busby 362b6a75c8
Include plaintext instance list in repo [skip ci]
Including a list of instances that are easily machine-readable allows
services such as Farside (https://github.com/benbusby/farside) to read
these and have an up to date list of valid instances.
2021-12-23 17:24:11 -07:00
Ben Busby 8c92b381a2
Remove default country param
The country URL param ('gl') is no longer set to 'US' by default, and is
omitted from the search entirely unless explicitly set by the user. This
change was made in an attempt to cut back on the number of captchas
experienced by certain users self-hosting who experienced a decreased
amount of captchas when this configuration setting was removed.

Fixes #558
2021-12-23 17:01:49 -07:00