tests/test_e2e.py is the main test suite. Tests typically call parse(url, ...) to fetch a live page, then extract(text) and assert on keys in the returned dict. Some tests pass custom headers or cookies.
Every extraction method (each named entry in schemes in schemes.py) should have at least one end-to-end test in tests/test_e2e.py that exercises a real URL (or the public JSON endpoint Maigret uses) and asserts on extracted fields.
- Add the test in the same commit as a new or changed scheme when possible (see CONTRIBUTING.md).
- Name the test function
test_<something>_e2eor follow the existingtest_<site>pattern. - In the docstring, put the exact scheme name(s) from
schemes.py(one per line) sorevision.pycan link tests to methods inMETHODS.md. - If a site blocks GitHub Actions (captchas, geo, bot walls), mark the test
@pytest.mark.github_failedor@pytest.mark.rate_limitedand document why — the test still counts for local runs and for coverage intent; CI uses-m 'not github_failed and not rate_limited'.
Where a live call is too flaky, add a fast offline check in a small module test (e.g. tests/test_socid_improvements.py with a saved HTML/JSON snippet) in addition to the e2e policy above, not as a full substitute.
Cookie-based scenarios may use files under tests/ (e.g. *.cookies); the default CI run excludes tests whose names match cookies (see below).
Defined in pyproject.toml ([tool.pytest.ini_options]):
| Marker | Meaning |
|---|---|
github_failed |
Request or site behavior often fails from GitHub Actions runners (blocks, geo, etc.). Excluded in CI. |
rate_limited |
Anti-bot, captcha, or rate limits. Excluded in CI. |
requires_cookies |
Needs authenticated cookies. Documented for selective runs. |
Use @pytest.mark.skip for temporarily broken tests; reasons appear in revision.py output when regenerating METHODS.md.
From the root README:
python3 -m pytest tests/test_e2e.py -n 10 -k 'not cookies' -m 'not github_failed and not rate_limited'-n 10— Requires pytest-xdist (listed intest-requirements.txt) for parallel workers. Omit-n 10if you did not install xdist.- Filters match what CI runs, plus optional parallelism for speed.
Minimal dependency set for tests is in test-requirements.txt (pytest, rerun plugin, xdist). Runtime library deps remain in requirements.txt.
Helper script that turns lines of the form key: value into assert info.get("key") == "value" patterns in test_e2e.py (macOS sed syntax). Use after pasting CLI output into the test file as documented in CONTRIBUTING.md.
.github/workflows/python-package.yml runs on pushes and pull requests to master:
- Python 3.10, 3.11, 3.12
- flake8 — syntax/undefined-name checks; complexity/length as warnings (
setup.cfgignoresE501for line length) - mypy — type checking with
mypy socid_extractor/(stub overrides inpyproject.toml) - pytest —
pytest -k 'not cookies' -m 'not github_failed and not rate_limited' --reruns 3 --reruns-delay 30(pytest-rerunfailures for flaky network tests)
Publishing to PyPI on release is handled by .github/workflows/python-publish.yml using python -m build.
Run from the repository root:
python revision.pyIt:
- Reads pytest marker descriptions from
pyproject.toml - Loads tests from
tests/test_e2e.pyand schemes fromsocid_extractor/schemes.py - Associates tests with scheme names via docstrings (method name per line) or heuristic name matching
- Overwrites
METHODS.mdwith a table of methods, test links, and notes (markers, skip reasons) - Prints how many methods have no matching test
Keep docstrings in tests aligned with scheme names in schemes when you want accurate coverage reporting.
setup.cfg configures flake8 with ignore = E501 (line length). CI still runs additional flake8 passes as defined in the workflow file.