RSPEED-2943: add observability metrics for auth, authorization, and quota#1640
RSPEED-2943: add observability metrics for auth, authorization, and quota#1640major wants to merge 3 commits into
Conversation
WalkthroughAdds bounded-duration Prometheus metrics and normalization helpers, records auth/authorization/quota check metrics with timing, wires these helpers into authentication/authorization flows and response/infer endpoints, and extends unit tests to cover metric recording and label-bounding behavior. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
2221254 to
7668c4b
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 535-562: The _check_infer_quota function currently logs raw
strings ("skipped","failure","success") and only catches HTTPException; change
it to use the exported recording constants (e.g.,
recording.QUOTA_RESULT_SKIPPED, recording.QUOTA_RESULT_FAILURE,
recording.QUOTA_RESULT_SUCCESS) in the three record_quota_check calls, and add a
broad except Exception block around check_tokens_available that records
recording.QUOTA_RESULT_ERROR with the elapsed time before re-raising the
exception; keep the existing HTTPException except as-is but use the constant
there too. Use the existing symbols _check_infer_quota, check_tokens_available,
and recording.record_quota_check to locate and modify the code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: e9a8ec74-e971-491a-8e93-5dd57f734cd2
📒 Files selected for processing (6)
src/app/endpoints/responses.pysrc/app/endpoints/rlsapi_v1.pysrc/metrics/__init__.pysrc/metrics/recording.pytests/unit/app/endpoints/test_responses.pytests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Pylinter
- GitHub Check: mypy
- GitHub Check: unit_tests (3.13)
- GitHub Check: build-pr
- GitHub Check: unit_tests (3.12)
- GitHub Check: integration_tests (3.12)
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (6)
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/unit/**/*.py: Use pytest for all unit and integration tests
Usepytest-mockfor AsyncMock objects in unit tests
Use markerpytest.mark.asynciofor async tests
Files:
tests/unit/app/endpoints/test_responses.pytests/unit/metrics/test_recording.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Do not use unittest - pytest is the standard testing framework for this project
Files:
tests/unit/app/endpoints/test_responses.pytests/unit/metrics/test_recording.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax:str | int
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes use ABC with@abstractmethoddecorators
Complete type annotations for all class attributes, use specific types, notAny
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection
Files:
src/app/endpoints/rlsapi_v1.pysrc/metrics/__init__.pysrc/app/endpoints/responses.pysrc/metrics/recording.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI dependencies:
from fastapi import APIRouter, HTTPException, Request, status, Depends
Files:
src/app/endpoints/rlsapi_v1.pysrc/app/endpoints/responses.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/rlsapi_v1.pysrc/app/endpoints/responses.py
src/**/__init__.py
📄 CodeRabbit inference engine (AGENTS.md)
Package
__init__.pyfiles contain brief package descriptions
Files:
src/metrics/__init__.py
🧠 Learnings (2)
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.
Applied to files:
src/app/endpoints/rlsapi_v1.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/rlsapi_v1.pysrc/app/endpoints/responses.py
🔇 Additional comments (8)
tests/unit/app/endpoints/test_responses.py (1)
223-244: LGTM!The test correctly verifies that unexpected quota backend failures (RuntimeError) are re-raised after recording metrics with the appropriate arguments (endpoint_path="/v1/responses", quota_type="user_id", result="error", and a non-negative duration).
src/app/endpoints/responses.py (2)
141-170: LGTM!The quota check wrapper correctly:
- Times the quota check using
time.monotonic()- Records metrics with bounded labels before re-raising exceptions
- Uses the exported constants for type safety (
QUOTA_TYPE_USER_ID,QUOTA_RESULT_*)- Handles both expected (
HTTPException) and unexpected (Exception) quota backend failures
311-316: LGTM!Moving
endpoint_pathassignment earlier and using it consistently for quota metrics is clean.src/metrics/__init__.py (2)
11-24: LGTM!The bucket distribution covers sub-second latencies well (1ms to 5s), which is appropriate for quota backend checks. Including
float("inf")as the final bucket follows Prometheus histogram best practices.
76-92: LGTM!The quota metrics are well-designed:
- Labels are documented as bounded for cardinality safety
- Counter and Histogram share the same label set for consistent querying
- Descriptive metric names follow the existing
ls_prefix conventiontests/unit/metrics/test_recording.py (1)
165-218: LGTM!Excellent test coverage for the new
record_quota_checkfunction:
- The fixture cleanly patches the logger for failure assertions
- Parametrized test efficiently covers both counter and histogram failure paths
- The bounds test verifies that unexpected inputs are normalized to safe fallback values ("customer-123" → "user_id", "timeout" → "error")
src/metrics/recording.py (2)
17-55: LGTM!Well-designed label bounding:
- Constants provide a single source of truth for valid label values
frozensetensures immutability of the allowed sets- Normalization functions guarantee bounded cardinality for Prometheus metrics
- Fallback values ("user_id" and "error") are sensible defaults
155-177: LGTM!The
record_quota_checkhelper follows the established pattern from other recording functions:
- Normalizes inputs before recording to prevent cardinality explosion
- Updates both counter and histogram atomically
- Catches expected metric update exceptions and logs them without disrupting the request
7668c4b to
a45bbd9
Compare
tisnik
left a comment
There was a problem hiding this comment.
Please ignore docs/demos directory completely - the sources are incorrect there on purpose.
c64fde1 to
17bdda5
Compare
6971abe to
4811b0d
Compare
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 392-397: Replace the hard-coded string "/v1/responses" used for
quota labeling with the shared constant ENDPOINT_PATH_RESPONSES: update the
local variable endpoint_path (and any other occurrences around
_check_response_quota calls, e.g., the similar block at lines matching the
second occurrence) to reference ENDPOINT_PATH_RESPONSES instead of the literal;
ensure _check_response_quota(user_id, endpoint_path) is passed the constant so
metrics use the centralized constant name and stay in sync with constants.py.
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 578-615: The _check_infer_quota function currently has only a
single-line docstring; update it to a Google-style docstring including
Parameters (Args) for request: Request, auth: AuthTuple, and endpoint_path: str
with brief descriptions, a Returns section stating Optional[str] and what a
returned quota_id or None means, and a Raises section documenting that it may
raise HTTPException (when quota check fails) and other Exception types
(caught/re-raised during quota checking); also mention side effects (it records
quota metrics via recording.record_quota_check and calls check_tokens_available)
so readers understand behavior.
In `@src/authentication/api_key_token.py`:
- Around line 67-77: Change the extract_user_token exception handling in the
api_key_token flow to (1) compare the failure cause against well-defined
constants instead of a fragile hardcoded substring: import and use the cause
constants defined in utils (the same ones used by utils.extract_user_token) when
deciding the reason label, and (2) broaden the try/except to mirror jwk_token.py
by catching any Exception in addition to HTTPException so you can call
record_auth_metrics(AUTH_MOD_APIKEY_TOKEN, "failure", "unexpected_error",
start_time) for non-HTTP errors before re-raising; keep the existing pathway
that maps the specific HTTPException to "missing_header" vs "missing_token"
using the utils constants and then re-raise the original exception.
In `@src/authentication/k8s.py`:
- Around line 393-424: The helper _populate_kube_admin_uid currently mutates the
V1UserInfo parameter (user.uid); change it to return the resolved UID (str)
instead of setting user.uid in-place, and update the call site in __call__ to
assign the returned value to user.uid (or to a local user_uid variable) where it
is consumed; preserve the existing exception handling and metrics inside
_populate_kube_admin_uid but remove any assignment to user, and ensure all
callers (e.g., __call__) use the returned uid and perform any casting/assignment
there.
In `@src/authentication/noop_with_token.py`:
- Around line 72-80: The current except block derives the failure reason by
stringifying exc.detail; instead, derive it from structured data when available:
in the except HTTPException as exc handler, inspect exc.detail—if it's a dict
prefer exc.detail.get("reason") or exc.detail.get("code") (or
exc.detail.get("error")) to set reason, otherwise fall back to the previous
string-based mapping; then call record_auth_metrics(AUTH_MOD_NOOP_WITH_TOKEN,
"failure", reason, start_time). Update the logic around extract_user_token,
HTTPException, and record_auth_metrics to use the structured-datum first
approach so metric labels don't depend on exception wording.
In `@src/metrics/recording.py`:
- Around line 19-64: ALLOWED_AUTH_REASONS is a manually maintained frozenset and
can get out of sync with reason strings used elsewhere; add a clear comment
above ALLOWED_AUTH_REASONS documenting that it must mirror all auth reason
literals and reference AUTH_REASON_UNKNOWN as the fallback, then add a CI/test
assertion (e.g., a unit test or lint rule) that scans the codebase for all
occurrences of auth-reason string literals and fails if any literal is not
present in ALLOWED_AUTH_REASONS (or conversely, build ALLOWED_AUTH_REASONS from
a single enum/source-of-truth like an Action-style enum and update code to
import that enum) so future additions are either auto-included or rejected by
CI.
- Around line 409-417: The current record_quota_check implementation wraps both
metrics updates in one try/except so if the counter succeeds but the histogram
observe() raises, the counter is incremented while the histogram is silently
skipped and the log wrongly implies both failed; change this by splitting into
two independent try/except blocks around
metrics.quota_checks_total.labels(...).inc() and
metrics.quota_check_duration_seconds.labels(...).observe(...) respectively, and
on each except catch the same exceptions but log a metric-specific warning
(e.g., "Failed to increment quota_checks_total" and "Failed to observe
quota_check_duration_seconds") including exc_info=True; update the tests
(test_record_quota_check_updates_metrics_and_logs_errors) to assert the specific
warning messages for each failure path.
- Around line 160-164: The fallback in normalize_quota_type currently returns
QUOTA_TYPE_USER_ID for unknown inputs which conflates genuine user_id quotas
with unexpected types; change the fallback to a dedicated QUOTA_TYPE_UNKNOWN
constant (or the string "unknown") and update the function's docstring, any
tests that assert behavior for unknown quota types, and any callers/tests that
expect the old user_id fallback; ensure you reference ALLOWED_QUOTA_TYPES in the
check and replace the return of QUOTA_TYPE_USER_ID with QUOTA_TYPE_UNKNOWN
throughout the tests and docs so unknown quota types are bucketed explicitly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 18b1cea9-9e4e-49a6-8058-063575246370
📒 Files selected for processing (20)
src/app/endpoints/responses.pysrc/app/endpoints/rlsapi_v1.pysrc/authentication/api_key_token.pysrc/authentication/jwk_token.pysrc/authentication/k8s.pysrc/authentication/noop.pysrc/authentication/noop_with_token.pysrc/authentication/rh_identity.pysrc/authentication/utils.pysrc/authorization/middleware.pysrc/metrics/__init__.pysrc/metrics/recording.pytests/unit/app/endpoints/test_responses.pytests/unit/app/endpoints/test_rlsapi_v1.pytests/unit/authentication/test_jwk_token.pytests/unit/authentication/test_k8s.pytests/unit/authentication/test_rh_identity.pytests/unit/authentication/test_utils.pytests/unit/authorization/test_middleware.pytests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: build-pr
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
🧰 Additional context used
📓 Path-based instructions (4)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/authorization/test_middleware.pytests/unit/authentication/test_utils.pytests/unit/app/endpoints/test_responses.pytests/unit/authentication/test_jwk_token.pytests/unit/app/endpoints/test_rlsapi_v1.pytests/unit/authentication/test_rh_identity.pytests/unit/authentication/test_k8s.pytests/unit/metrics/test_recording.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/authentication/noop.pysrc/authentication/api_key_token.pysrc/authentication/utils.pysrc/authentication/jwk_token.pysrc/authentication/noop_with_token.pysrc/app/endpoints/responses.pysrc/authorization/middleware.pysrc/authentication/k8s.pysrc/app/endpoints/rlsapi_v1.pysrc/metrics/recording.pysrc/metrics/__init__.pysrc/authentication/rh_identity.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/app/**/*.py: FastAPI dependencies: Import fromfastapimodule forAPIRouter,HTTPException,Request,status,Depends
Use FastAPIHTTPExceptionwith appropriate status codes for API endpoints and handleAPIConnectionErrorfrom Llama Stack
Files:
src/app/endpoints/responses.pysrc/app/endpoints/rlsapi_v1.py
src/**/__init__.py
📄 CodeRabbit inference engine (AGENTS.md)
Package
__init__.pyfiles must contain brief package descriptions
Files:
src/metrics/__init__.py
🧠 Learnings (1)
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/responses.pysrc/app/endpoints/rlsapi_v1.py
🔇 Additional comments (12)
src/metrics/__init__.py (1)
25-179: LGTM — metric declarations, label sets, and bucket constants are well-formed.All six new metrics carry only bounded, non-PII labels (
auth_module,result,reason,action,endpoint,quota_type), bucket selections are appropriately sub-second for fast pre-request paths, and the addedFinal[Histogram]annotation onllm_inference_duration_secondsis a good tightening.tests/unit/authorization/test_middleware.py (1)
325-354: LGTM — resilience test is well-structured and covers the right failure mode.src/authentication/noop.py (1)
54-67: LGTM — timing and metric recording integrate cleanly with the existing flow.src/authentication/utils.py (1)
44-71: LGTM —record_auth_metricscorrectly centralises auth observability with proper error isolation.tests/unit/app/endpoints/test_responses.py (1)
282-302: LGTM — cleanly exercises the unexpected-error path and asserts all relevant call arguments.tests/unit/authentication/test_jwk_token.py (2)
507-527: LGTM —time.monotonicpatch is correctly scoped toauthentication.jwk_tokenand the start-time assertion properly validates what's forwarded torecord_auth_metrics.
559-609: LGTM —invalid_user_id_tokenfixture and exception-chain assertion cover a previously untested validation path cleanly.src/authentication/jwk_token.py (2)
146-230: New auth helpers are well-structured — LGTM.Error handling is thorough and exception ordering is correct:
_get_jwk_set_for_authcorrectly ordersaiohttp.ClientError→json.JSONDecodeError→JoseErrorwith no masking_validate_jwk_claimscorrectly placesExpiredTokenErrorbeforeJoseErrorin the hierarchy_get_required_claimcorrectly checks both type and emptiness of the claim value
258-317:__call__metrics wiring is correct — LGTM.
start_timeis correctly scoped to each request. Since every helper records metrics before raising, there is no risk of double-recording or metric skipping on any error path. The success metric at line 316 is only reached when all helpers return cleanly.src/authentication/k8s.py (1)
541-618:__call__metrics wiring looks correct — LGTM.Health-probe and metrics skips are correctly guarded, and every error path (missing token, token review error, invalid token, SAR failures, unauthorized) records a bounded metric before raising. The success metric at line 612 is only reached after all checks pass.
tests/unit/metrics/test_recording.py (2)
197-230:test_counter_recorders_update_metrics_and_log_errors— correct two-phase approach — LGTM.The success/no-warning assertion in phase 1 followed by
reset_mock()+side_effectassignment for phase 2 correctly avoids any stale state interaction:reset_mock()by default does not clearside_effect, but since noside_effectis set during phase 1, there is nothing to carry over.
396-428:test_record_quota_check_updates_metrics_and_logs_errorsverifies both failure points — LGTM.The
@pytest.mark.parametrize("failing_metric", ["counter", "histogram"])correctly exercises both the counter-raises and histogram-raises paths. Becauserecord_quota_checkuses a singletry/except, a counter failure means the histogram is never called, while a histogram failure means the counter succeeded. Both scenarios result in exactly onewarningcall — which this test correctly asserts.
Signed-off-by: Major Hayden <major@redhat.com>
Signed-off-by: Major Hayden <major@redhat.com>
Signed-off-by: Major Hayden <major@redhat.com>
0eccec0 to
0f1afb7
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Line 600: Replace the string literal "disabled" fallback for quota_type with
the shared constant recording.QUOTA_TYPE_DISABLED: change the assignment that
reads quota_type = configuration.rlsapi_v1.quota_subject or "disabled" to use
recording.QUOTA_TYPE_DISABLED instead, and ensure the recording symbol is
imported/available in the module so the constant can be referenced (update
imports if necessary).
In `@src/metrics/__init__.py`:
- Around line 25-68: Three identical tuples (AUTH_DURATION_BUCKETS,
AUTHORIZATION_DURATION_BUCKETS, QUOTA_CHECK_DURATION_BUCKETS) should be
consolidated: create a single MIDDLEWARE_DURATION_BUCKETS tuple containing the
repeated values, then set AUTH_DURATION_BUCKETS = AUTHORIZATION_DURATION_BUCKETS
= QUOTA_CHECK_DURATION_BUCKETS = MIDDLEWARE_DURATION_BUCKETS. Also update any
histogram registrations that currently pass the old tuples (e.g., the histograms
for auth_duration_seconds, authorization_duration_seconds, and
quota_check_duration_seconds) to reference MIDDLEWARE_DURATION_BUCKETS so all
three metrics use the shared constant.
In `@src/metrics/recording.py`:
- Around line 72-76: The function normalize_auth_module currently returns the
auth-result constant AUTH_RESULT_UNKNOWN; update it to return the correct
auth-module constant AUTH_MODULE_UNKNOWN (or the literal "unknown" if
AUTH_MODULE_UNKNOWN is not defined) in normalize_auth_module so auth-module
normalization uses the proper identifier rather than an auth-result constant;
ensure any necessary constant import or definition is present and keep the
return semantics unchanged.
- Around line 165-169: The function normalize_quota_type currently returns the
magic literal "unknown" on fallback; change it to return a named constant (e.g.,
QUOTA_TYPE_UNKNOWN) instead for consistency with other normalizers; update the
return to use the existing or newly added constant and ensure the constant value
equals "unknown" and is exported/defined alongside ALLOWED_QUOTA_TYPES so
callers can discover the fallback value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 34c2d62b-7808-4fdd-8d52-3b7d7a1d9075
📒 Files selected for processing (20)
src/app/endpoints/responses.pysrc/app/endpoints/rlsapi_v1.pysrc/authentication/api_key_token.pysrc/authentication/jwk_token.pysrc/authentication/k8s.pysrc/authentication/noop.pysrc/authentication/noop_with_token.pysrc/authentication/rh_identity.pysrc/authentication/utils.pysrc/authorization/middleware.pysrc/metrics/__init__.pysrc/metrics/recording.pytests/unit/app/endpoints/test_responses.pytests/unit/app/endpoints/test_rlsapi_v1.pytests/unit/authentication/test_jwk_token.pytests/unit/authentication/test_k8s.pytests/unit/authentication/test_rh_identity.pytests/unit/authentication/test_utils.pytests/unit/authorization/test_middleware.pytests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: build-pr
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci / group 2
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/authentication/noop.pysrc/app/endpoints/responses.pysrc/authentication/api_key_token.pysrc/authentication/noop_with_token.pysrc/authorization/middleware.pysrc/authentication/k8s.pysrc/authentication/utils.pysrc/metrics/__init__.pysrc/authentication/jwk_token.pysrc/metrics/recording.pysrc/app/endpoints/rlsapi_v1.pysrc/authentication/rh_identity.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/authentication/test_jwk_token.pytests/unit/app/endpoints/test_responses.pytests/unit/app/endpoints/test_rlsapi_v1.pytests/unit/authorization/test_middleware.pytests/unit/authentication/test_utils.pytests/unit/authentication/test_k8s.pytests/unit/metrics/test_recording.pytests/unit/authentication/test_rh_identity.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/app/**/*.py: FastAPI dependencies: Import fromfastapimodule forAPIRouter,HTTPException,Request,status,Depends
Use FastAPIHTTPExceptionwith appropriate status codes for API endpoints and handleAPIConnectionErrorfrom Llama Stack
Files:
src/app/endpoints/responses.pysrc/app/endpoints/rlsapi_v1.py
src/**/__init__.py
📄 CodeRabbit inference engine (AGENTS.md)
Package
__init__.pyfiles must contain brief package descriptions
Files:
src/metrics/__init__.py
🧠 Learnings (1)
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/responses.pysrc/app/endpoints/rlsapi_v1.py
🔇 Additional comments (39)
src/metrics/__init__.py (1)
122-179: LGTM!src/metrics/recording.py (3)
93-136: LGTM!
323-361: LGTM!
363-426: LGTM!src/authentication/utils.py (1)
15-77: LGTM!src/authentication/api_key_token.py (1)
68-103: LGTM!src/authentication/jwk_token.py (2)
146-231: LGTM!
281-317: LGTM!src/authentication/k8s.py (4)
384-390: LGTM!
393-440: LGTM!
442-521: LGTM!
574-632: LGTM!src/authentication/noop.py (1)
54-67: LGTM!src/authentication/noop_with_token.py (1)
69-99: LGTM!src/authentication/rh_identity.py (1)
9-10: LGTM!Also applies to: 15-15, 18-18, 30-32, 42-64, 106-112, 145-150, 177-183, 372-379, 385-385, 395-395, 407-407, 418-418, 423-433, 436-449, 465-465
src/authorization/middleware.py (1)
3-3: LGTM!Also applies to: 22-22, 32-55, 154-195
src/app/endpoints/responses.py (1)
141-170: LGTM!Also applies to: 392-393, 397-397, 449-449
src/app/endpoints/rlsapi_v1.py (1)
578-599: LGTM!Also applies to: 601-633, 793-804
tests/unit/app/endpoints/test_responses.py (1)
21-21: LGTM!Also applies to: 282-303
tests/unit/app/endpoints/test_rlsapi_v1.py (1)
1188-1204: LGTM!Also applies to: 1234-1264
tests/unit/authentication/test_jwk_token.py (3)
507-528: LGTM!
559-572: LGTM!
594-610: LGTM!tests/unit/authentication/test_k8s.py (2)
1204-1225: LGTM!
1141-1202: ⚡ Quick winThe test correctly reflects the implementation. The codebase uses a unified
record_auth_metricsfunction with differentiated reason codes for both authentication and authorization failures. This is an intentional design pattern—no separaterecord_authorization_metricsfunction exists, and authorization checks appropriately record metrics through the unified interface using the"authorization_check_error"reason code.tests/unit/authentication/test_rh_identity.py (3)
257-264: LGTM!Also applies to: 289-292
439-481: LGTM!
567-569: LGTM!Also applies to: 606-614, 651-651, 687-695
tests/unit/authentication/test_utils.py (3)
47-60: LGTM!
62-75: LGTM!
77-92: LGTM!tests/unit/authorization/test_middleware.py (1)
325-355: LGTM!tests/unit/metrics/test_recording.py (7)
12-21: LGTM!
197-231: LGTM!
244-251: LGTM!Also applies to: 266-266, 269-269
295-372: LGTM!
374-394: LGTM!
396-435: LGTM!
437-450: LGTM!
| exhausted its token allowance (HTTP 429). | ||
| """ | ||
| quota_id = _resolve_quota_subject(request, auth) | ||
| quota_type = configuration.rlsapi_v1.quota_subject or "disabled" |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Use the shared disabled quota label constant.
quota_type falls back to a string literal ("disabled"). Use recording.QUOTA_TYPE_DISABLED to prevent label drift if constants change.
♻️ Proposed fix
- quota_type = configuration.rlsapi_v1.quota_subject or "disabled"
+ quota_type = (
+ configuration.rlsapi_v1.quota_subject or recording.QUOTA_TYPE_DISABLED
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/app/endpoints/rlsapi_v1.py` at line 600, Replace the string literal
"disabled" fallback for quota_type with the shared constant
recording.QUOTA_TYPE_DISABLED: change the assignment that reads quota_type =
configuration.rlsapi_v1.quota_subject or "disabled" to use
recording.QUOTA_TYPE_DISABLED instead, and ensure the recording symbol is
imported/available in the module so the constant can be referenced (update
imports if necessary).
| AUTH_DURATION_BUCKETS: Final[tuple[float, ...]] = ( | ||
| 0.001, | ||
| 0.005, | ||
| 0.01, | ||
| 0.025, | ||
| 0.05, | ||
| 0.1, | ||
| 0.25, | ||
| 0.5, | ||
| 1.0, | ||
| 2.5, | ||
| 5.0, | ||
| float("inf"), | ||
| ) | ||
|
|
||
| AUTHORIZATION_DURATION_BUCKETS: Final[tuple[float, ...]] = ( | ||
| 0.001, | ||
| 0.005, | ||
| 0.01, | ||
| 0.025, | ||
| 0.05, | ||
| 0.1, | ||
| 0.25, | ||
| 0.5, | ||
| 1.0, | ||
| 2.5, | ||
| 5.0, | ||
| float("inf"), | ||
| ) | ||
|
|
||
| QUOTA_CHECK_DURATION_BUCKETS: Final[tuple[float, ...]] = ( | ||
| 0.001, | ||
| 0.005, | ||
| 0.01, | ||
| 0.025, | ||
| 0.05, | ||
| 0.1, | ||
| 0.25, | ||
| 0.5, | ||
| 1.0, | ||
| 2.5, | ||
| 5.0, | ||
| float("inf"), | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Consolidate the three identical bucket definitions.
AUTH_DURATION_BUCKETS, AUTHORIZATION_DURATION_BUCKETS, and QUOTA_CHECK_DURATION_BUCKETS are byte-identical. Extract them into a single MIDDLEWARE_DURATION_BUCKETS constant and reference it in all three metrics.
♻️ Proposed consolidation
-AUTH_DURATION_BUCKETS: Final[tuple[float, ...]] = (
+MIDDLEWARE_DURATION_BUCKETS: Final[tuple[float, ...]] = (
0.001,
0.005,
0.01,
0.025,
0.05,
0.1,
0.25,
0.5,
1.0,
2.5,
5.0,
float("inf"),
)
-AUTHORIZATION_DURATION_BUCKETS: Final[tuple[float, ...]] = (
- 0.001,
- 0.005,
- 0.01,
- 0.025,
- 0.05,
- 0.1,
- 0.25,
- 0.5,
- 1.0,
- 2.5,
- 5.0,
- float("inf"),
-)
-
-QUOTA_CHECK_DURATION_BUCKETS: Final[tuple[float, ...]] = (
- 0.001,
- 0.005,
- 0.01,
- 0.025,
- 0.05,
- 0.1,
- 0.25,
- 0.5,
- 1.0,
- 2.5,
- 5.0,
- float("inf"),
-)
+# Alias for backward compatibility and clarity
+AUTH_DURATION_BUCKETS = MIDDLEWARE_DURATION_BUCKETS
+AUTHORIZATION_DURATION_BUCKETS = MIDDLEWARE_DURATION_BUCKETS
+QUOTA_CHECK_DURATION_BUCKETS = MIDDLEWARE_DURATION_BUCKETSThen update the histogram declarations to reference the common constant:
auth_duration_seconds: Final[Histogram] = Histogram(
"ls_auth_duration_seconds",
"Authentication duration",
["auth_module", "result"],
- buckets=AUTH_DURATION_BUCKETS,
+ buckets=MIDDLEWARE_DURATION_BUCKETS,
)(Apply the same change to authorization_duration_seconds and quota_check_duration_seconds.)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/metrics/__init__.py` around lines 25 - 68, Three identical tuples
(AUTH_DURATION_BUCKETS, AUTHORIZATION_DURATION_BUCKETS,
QUOTA_CHECK_DURATION_BUCKETS) should be consolidated: create a single
MIDDLEWARE_DURATION_BUCKETS tuple containing the repeated values, then set
AUTH_DURATION_BUCKETS = AUTHORIZATION_DURATION_BUCKETS =
QUOTA_CHECK_DURATION_BUCKETS = MIDDLEWARE_DURATION_BUCKETS. Also update any
histogram registrations that currently pass the old tuples (e.g., the histograms
for auth_duration_seconds, authorization_duration_seconds, and
quota_check_duration_seconds) to reference MIDDLEWARE_DURATION_BUCKETS so all
three metrics use the shared constant.
| def normalize_auth_module(auth_module: str) -> str: | ||
| """Return a bounded authentication module label.""" | ||
| if auth_module in SUPPORTED_AUTHENTICATION_MODULES: | ||
| return auth_module | ||
| return AUTH_RESULT_UNKNOWN |
There was a problem hiding this comment.
normalize_auth_module returns wrong constant.
Line 76 returns AUTH_RESULT_UNKNOWN (value: "unknown"), which happens to have the correct string value, but semantically should return a dedicated AUTH_MODULE_UNKNOWN constant or the literal "unknown". This creates confusion since auth-result constants are being used for auth-module normalization.
🔧 Proposed fix
+AUTH_MODULE_UNKNOWN: Final[str] = "unknown"
+
AUTH_RESULT_SUCCESS: Final[str] = "success"
AUTH_RESULT_FAILURE: Final[str] = "failure"
AUTH_RESULT_SKIPPED: Final[str] = "skipped"
-AUTH_RESULT_UNKNOWN: Final[str] = "unknown"
AUTH_REASON_UNKNOWN: Final[str] = "unknown"
ALLOWED_AUTH_RESULTS: Final[frozenset[str]] = frozenset(
{
AUTH_RESULT_SUCCESS,
AUTH_RESULT_FAILURE,
AUTH_RESULT_SKIPPED,
}
)
def normalize_auth_module(auth_module: str) -> str:
"""Return a bounded authentication module label."""
if auth_module in SUPPORTED_AUTHENTICATION_MODULES:
return auth_module
- return AUTH_RESULT_UNKNOWN
+ return AUTH_MODULE_UNKNOWN🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/metrics/recording.py` around lines 72 - 76, The function
normalize_auth_module currently returns the auth-result constant
AUTH_RESULT_UNKNOWN; update it to return the correct auth-module constant
AUTH_MODULE_UNKNOWN (or the literal "unknown" if AUTH_MODULE_UNKNOWN is not
defined) in normalize_auth_module so auth-module normalization uses the proper
identifier rather than an auth-result constant; ensure any necessary constant
import or definition is present and keep the return semantics unchanged.
| def normalize_quota_type(quota_type: str) -> str: | ||
| """Return a bounded quota type label for Prometheus cardinality safety.""" | ||
| if quota_type in ALLOWED_QUOTA_TYPES: | ||
| return quota_type | ||
| return "unknown" |
There was a problem hiding this comment.
normalize_quota_type returns magic string instead of constant.
Line 169 returns the literal "unknown" instead of a named constant. This is inconsistent with other normalizers (e.g., AUTH_REASON_UNKNOWN) and makes the fallback behavior less discoverable.
🔧 Proposed fix
QUOTA_TYPE_USER_ID: Final[str] = "user_id"
QUOTA_TYPE_ORG_ID: Final[str] = "org_id"
QUOTA_TYPE_SYSTEM_ID: Final[str] = "system_id"
QUOTA_TYPE_DISABLED: Final[str] = "disabled"
+QUOTA_TYPE_UNKNOWN: Final[str] = "unknown"
QUOTA_RESULT_SUCCESS: Final[str] = "success"
QUOTA_RESULT_FAILURE: Final[str] = "failure"
QUOTA_RESULT_SKIPPED: Final[str] = "skipped"
QUOTA_RESULT_ERROR: Final[str] = "error"
ALLOWED_QUOTA_TYPES: Final[frozenset[str]] = frozenset(
{
QUOTA_TYPE_USER_ID,
QUOTA_TYPE_ORG_ID,
QUOTA_TYPE_SYSTEM_ID,
QUOTA_TYPE_DISABLED,
+ QUOTA_TYPE_UNKNOWN,
}
)
def normalize_quota_type(quota_type: str) -> str:
"""Return a bounded quota type label for Prometheus cardinality safety."""
if quota_type in ALLOWED_QUOTA_TYPES:
return quota_type
- return "unknown"
+ return QUOTA_TYPE_UNKNOWN🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/metrics/recording.py` around lines 165 - 169, The function
normalize_quota_type currently returns the magic literal "unknown" on fallback;
change it to return a named constant (e.g., QUOTA_TYPE_UNKNOWN) instead for
consistency with other normalizers; update the return to use the existing or
newly added constant and ensure the constant value equals "unknown" and is
exported/defined alongside ALLOWED_QUOTA_TYPES so callers can discover the
fallback value.
Description
Adds bounded Prometheus metrics across the three pre-request middleware layers on
/v1/responsesand/v1/infer:Each commit is self-contained and can be reviewed independently.
Type of change
Tools used to create PR
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit