whoogle-search/test/test_results.py

from bs4 import BeautifulSoup
from app.filter import Filter
from app.utils.session import generate_user_key
from datetime import datetime
from dateutil.parser import *


def get_search_results(data):
    secret_key = generate_user_key()
    soup = Filter(user_key=secret_key).clean(
        BeautifulSoup(data, 'html.parser'))

    main_divs = soup.find('div', {'id': 'main'})
    assert len(main_divs) > 1

    result_divs = []
    for div in main_divs:
        # Result divs should only have 1 inner div
        if (len(list(div.children)) != 1
                or not div.findChild()
                or 'div' not in div.findChild().name):
            continue

        result_divs.append(div)

    return result_divs


def test_get_results(client):
    rv = client.get('/search?q=test')
    assert rv._status_code == 200

    # Depending on the search, there can be more
    # than 10 result divs
    assert len(get_search_results(rv.data)) >= 10
    assert len(get_search_results(rv.data)) <= 15


def test_post_results(client):
    rv = client.post('/search', data=dict(q='test'))
    assert rv._status_code == 200

    # Depending on the search, there can be more
    # than 10 result divs
    assert len(get_search_results(rv.data)) >= 10
    assert len(get_search_results(rv.data)) <= 15


# TODO: Unit test the site alt method instead -- the results returned
# are too unreliable for this test in particular.
# def test_site_alts(client):
    # rv = client.post('/search', data=dict(q='twitter official account'))
    # assert b'twitter.com/Twitter' in rv.data

    # client.post('/config', data=dict(alts=True))
    # assert json.loads(client.get('/config').data)['alts']

    # rv = client.post('/search', data=dict(q='twitter official account'))
    # assert b'twitter.com/Twitter' not in rv.data
    # assert b'nitter.net/Twitter' in rv.data


def test_recent_results(client):
    times = {
        'past year': 365,
        'past month': 31,
        'past week': 7
    }

    for time, num_days in times.items():
        rv = client.post('/search', data=dict(q='test :' + time))
        result_divs = get_search_results(rv.data)

        current_date = datetime.now()
        for div in [_ for _ in result_divs if _.find('span')]:
            date_span = div.find('span').decode_contents()
            if not date_span or len(date_span) > 15 or len(date_span) < 7:
                continue

            try:
                date = parse(date_span)
                # Date can have a little bit of wiggle room
                assert (current_date - date).days <= (num_days + 5)
            except ParserError:
                pass
Added testing and ci build, refactored filter class, refactored project structure 2020-04-16 02:41:53 +03:00			`from bs4 import BeautifulSoup`
			`from app.filter import Filter`
Switch to single Fernet key per session This moves away from the previous (messy) approach of using two separate keys for decrypting text and element URLs separately and regenerating them for new searches. The current implementation of sessions is not very reliable, which lead to keys being regenerated too soon, which would break page navigation. Until that can be addressed, the single key per session approach should work a lot better. Fixes #250 Fixes #90 2021-04-01 07:23:30 +03:00			`from app.utils.session import generate_user_key`
Added testing and ci build, refactored filter class, refactored project structure 2020-04-16 02:41:53 +03:00			`from datetime import datetime`
			`from dateutil.parser import *`


			`def get_search_results(data):`
Switch to single Fernet key per session This moves away from the previous (messy) approach of using two separate keys for decrypting text and element URLs separately and regenerating them for new searches. The current implementation of sessions is not very reliable, which lead to keys being regenerated too soon, which would break page navigation. Until that can be addressed, the single key per session approach should work a lot better. Fixes #250 Fixes #90 2021-04-01 07:23:30 +03:00			`secret_key = generate_user_key()`
			`soup = Filter(user_key=secret_key).clean(`
PEP-8: Fix formatting issues, add CI workflow (#161) Enforces PEP-8 formatting for all python code Adds a github action build for checking pep8 formatting using pycodestyle 2020-12-18 00:06:47 +03:00			`BeautifulSoup(data, 'html.parser'))`
Added testing and ci build, refactored filter class, refactored project structure 2020-04-16 02:41:53 +03:00
			`main_divs = soup.find('div', {'id': 'main'})`
			`assert len(main_divs) > 1`

			`result_divs = []`
			`for div in main_divs:`
			`# Result divs should only have 1 inner div`
PEP-8: Fix formatting issues, add CI workflow (#161) Enforces PEP-8 formatting for all python code Adds a github action build for checking pep8 formatting using pycodestyle 2020-12-18 00:06:47 +03:00			`if (len(list(div.children)) != 1`
			`or not div.findChild()`
			`or 'div' not in div.findChild().name):`
Added testing and ci build, refactored filter class, refactored project structure 2020-04-16 02:41:53 +03:00			`continue`

			`result_divs.append(div)`

			`return result_divs`


Updated tests, fixed a few bugs Added opensearch routes test and individual tests for searching via GET and POST separately. Fixed incorrect assignment in gen_query. 2020-04-29 03:59:33 +03:00			`def test_get_results(client):`
Added testing and ci build, refactored filter class, refactored project structure 2020-04-16 02:41:53 +03:00			`rv = client.get('/search?q=test')`
			`assert rv._status_code == 200`

Modified result length test 2020-04-16 02:54:38 +03:00			`# Depending on the search, there can be more`
			`# than 10 result divs`
			`assert len(get_search_results(rv.data)) >= 10`
			`assert len(get_search_results(rv.data)) <= 15`
Added testing and ci build, refactored filter class, refactored project structure 2020-04-16 02:41:53 +03:00

Updated tests, fixed a few bugs Added opensearch routes test and individual tests for searching via GET and POST separately. Fixed incorrect assignment in gen_query. 2020-04-29 03:59:33 +03:00			`def test_post_results(client):`
			`rv = client.post('/search', data=dict(q='test'))`
			`assert rv._status_code == 200`

			`# Depending on the search, there can be more`
			`# than 10 result divs`
			`assert len(get_search_results(rv.data)) >= 10`
			`assert len(get_search_results(rv.data)) <= 15`


Fix nojs lxml constructor The BeautifulSoup constructur in gen_nojs needed to explicitly set features='lxml' to silence a warning from the library. Also temporarily disabled the site alts test since the results are too unreliable. This should be moved to a unit test instead. 2020-12-12 03:21:32 +03:00			`# TODO: Unit test the site alt method instead -- the results returned`
			`# are too unreliable for this test in particular.`
			`# def test_site_alts(client):`
			`# rv = client.post('/search', data=dict(q='twitter official account'))`
			`# assert b'twitter.com/Twitter' in rv.data`

			`# client.post('/config', data=dict(alts=True))`
			`# assert json.loads(client.get('/config').data)['alts']`

			`# rv = client.post('/search', data=dict(q='twitter official account'))`
			`# assert b'twitter.com/Twitter' not in rv.data`
			`# assert b'nitter.net/Twitter' in rv.data`
Allow setting site alts using environment vars (#155) * Add ability to configure site alts w/ env vars Site alternatives (i.e. twitter.com -> nitter.net) can now be configured using environment variables: WHOOGLE_ALT_TW='nitter.net' # twitter alt WHOOGLE_ALT_YT='invidio.us' # youtube alt WHOOGLE_ALT_IG='bibliogram.art/u' # instagram alt Updated testing to confirm results have been modified. * Add site alt vars to docker settings and readme 2020-12-06 01:01:21 +03:00

Added testing and ci build, refactored filter class, refactored project structure 2020-04-16 02:41:53 +03:00			`def test_recent_results(client):`
			`times = {`
Updated tests, fixed a few bugs Added opensearch routes test and individual tests for searching via GET and POST separately. Fixed incorrect assignment in gen_query. 2020-04-29 03:59:33 +03:00			`'past year': 365,`
			`'past month': 31,`
			`'past week': 7`
Added testing and ci build, refactored filter class, refactored project structure 2020-04-16 02:41:53 +03:00			`}`

			`for time, num_days in times.items():`
Updated tests, fixed a few bugs Added opensearch routes test and individual tests for searching via GET and POST separately. Fixed incorrect assignment in gen_query. 2020-04-29 03:59:33 +03:00			`rv = client.post('/search', data=dict(q='test :' + time))`
Added testing and ci build, refactored filter class, refactored project structure 2020-04-16 02:41:53 +03:00			`result_divs = get_search_results(rv.data)`

			`current_date = datetime.now()`
Small update to results time period test Updated to ensure a child span element is available before running a test to verify the correct time range for the result. Need to come up with a better way of ensuring uniform results across multiple tests, since otherwise periodic changes in the returned results can cause tests to fail. 2020-06-28 19:52:53 +03:00			`for div in [_ for _ in result_divs if _.find('span')]:`
Added testing and ci build, refactored filter class, refactored project structure 2020-04-16 02:41:53 +03:00			`date_span = div.find('span').decode_contents()`
Fixed search results test For datetime spans in time-filtered search results, anything less than 7 characters or more than 15 can be guaranteed to not be properly formatted dates (either "mm dd yyyy" or "xx days/months/weeks ago") 2020-04-27 03:11:02 +03:00			`if not date_span or len(date_span) > 15 or len(date_span) < 7:`
Added testing and ci build, refactored filter class, refactored project structure 2020-04-16 02:41:53 +03:00			`continue`

			`try:`
			`date = parse(date_span)`
PEP-8: Fix formatting issues, add CI workflow (#161) Enforces PEP-8 formatting for all python code Adds a github action build for checking pep8 formatting using pycodestyle 2020-12-18 00:06:47 +03:00			`# Date can have a little bit of wiggle room`
			`assert (current_date - date).days <= (num_days + 5)`
Added testing and ci build, refactored filter class, refactored project structure 2020-04-16 02:41:53 +03:00			`except ParserError:`
Feature: country and safe search config options (#71) * Added country and safe search config options * Updated handling of parser error in results test * Improved handling of default country * Added 1px empty gif fallback as a replacement for images that fail to load 2020-05-23 23:27:23 +03:00			`pass`