Observability: Fix log levels and add tenant/request context to all logs#795
Observability: Fix log levels and add tenant/request context to all logs#795
Conversation
… across multiple modules - Changed logger.error to logger.warning in threads.py for thread validation and API key configuration checks. - Updated users.py to log user creation and update attempts with existing emails as warnings. - Adjusted middleware.py to include correlation ID in Sentry tags. - Modified provider validation and credential checks in providers.py to use warning level. - Changed logging in sentry_filters.py to include more HTTP methods. - Updated telemetry.py to add request log context for organization and project IDs. - Changed error logging to warning in various CRUD operations, including assistants, credentials, and evaluations. - Updated response handling in response.py to log OpenAI API errors as warnings. - Adjusted logging in services related to LLM and evaluations to use warning level for non-critical issues.
|
Important Review skippedAuto reviews are limited based on label configuration. 🏷️ Required labels (at least one) (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds request-scoped service/name and tenant log context, expands service-name-aware logging (including Cron and Celery workers), centralizes span traceability for LLM jobs, and systematically downgrades many error-level logs to warnings without changing control flow or API behavior. ChangesObservability Infrastructure & Logging Config
LLM Jobs Traceability
Global Log Severity Downgrades
Sequence Diagram(s)sequenceDiagram
participant API as API Server
participant Logger as ServiceName Logger
participant Telemetry as OpenTelemetry
participant Sentry as Sentry
participant Celery as Celery Worker
participant LLM as LLM Provider
API->>Logger: configure_logging(service_name)
API->>Telemetry: setup_telemetry(service_name)
API->>Logger: request start (cron? -> set service_name)
API->>Telemetry: set_request_log_context(org_id, project_id)
API->>Sentry: set_tag(correlation_id) (if present)
API->>Celery: start_chain_job (records span attrs)
Celery->>Telemetry: _set_traceability_attributes(llm.job_id,...)
Celery->>LLM: execute call (provider span)
LLM-->>Celery: response / error
Celery->>Telemetry: set span error / record exception (on failure)
Celery->>Logger: emit warning/error (downgraded where applicable)
Celery->>Sentry: capture_exception (if applicable)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
backend/app/crud/evaluations/dataset.py (1)
84-93:⚠️ Potential issue | 🟠 MajorHandle only the duplicate-name constraint as 409; let other integrity failures follow a separate path.
Right now every
IntegrityErroris treated as a duplicate name conflict, which can return misleading 409s for unrelated DB integrity problems.🔧 Proposed fix
except IntegrityError as e: session.rollback() - logger.warning( - f"[create_evaluation_dataset] Duplicate dataset name | name={name} | {e}", - ) - raise HTTPException( - status_code=409, - detail=f"Dataset with name '{name}' already exists in this " - "organization and project. Please choose a different name.", - ) + if "uq_evaluation_dataset_name_org_project" in str(e): + logger.warning( + f"[create_evaluation_dataset] Duplicate dataset name | name={name}" + ) + raise HTTPException( + status_code=409, + detail=f"Dataset with name '{name}' already exists in this " + "organization and project. Please choose a different name.", + ) + logger.error( + f"[create_evaluation_dataset] Integrity error while creating dataset | " + f"name={name} | org_id={organization_id} | project_id={project_id}", + exc_info=True, + ) + raise HTTPException(status_code=500, detail="Failed to save dataset metadata")backend/app/crud/document/document.py (1)
102-127: 🛠️ Refactor suggestion | 🟠 MajorAdd an explicit return type to
DocumentCrud.update.
DocumentCrud.updatereturns aDocumentbut has no return annotation. Please annotate it explicitly while this method is being touched.Suggested diff
- def update(self, document: Document): + def update(self, document: Document) -> Document:As per coding guidelines, "Always add type hints to all function parameters and return values in Python code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/document/document.py` around lines 102 - 127, Add an explicit return type annotation to the DocumentCrud.update method: change its signature from def update(self, document: Document) to def update(self, document: Document) -> Document so it clearly indicates it returns a Document; ensure any necessary imports or forward references for Document are present and update any type-checking comments if used.backend/app/crud/evaluations/langfuse.py (2)
271-275:⚠️ Potential issue | 🟠 MajorAvoid logging raw question text in failure logs
On Line 274, logging part of
item["question"]can leak sensitive user content into logs. Logquestion_id(or a masked value) instead.Proposed patch
- logger.warning( - f"[upload_dataset_to_langfuse] Failed to upload item | " - f"duplicate={duplicate_num + 1} | " - f"question={item['question'][:50]}... | {e}" - ) + logger.warning( + f"[upload_dataset_to_langfuse] Failed to upload item | " + f"duplicate={duplicate_num + 1} | " + f"question_id={question_id} | error={e}" + )As per coding guidelines, Prefix all log messages with the function name in square brackets:
logger.info(f"[function_name] Message {mask_string(sensitive_value)}").🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/evaluations/langfuse.py` around lines 271 - 275, The warning log in upload_dataset_to_langfuse currently includes raw question text (item['question']) which can leak sensitive data; change the logger.warning call in upload_dataset_to_langfuse to use a non-sensitive identifier (e.g., item['question_id'] or a masked version via mask_string(item['question'])) instead of item['question'] and ensure the message is prefixed with the function name in square brackets (e.g., logger.warning(f"[upload_dataset_to_langfuse] ...")). Keep the rest of the context (duplicate count and exception e) intact and use mask_string(...) for any sensitive-value display if no question_id is available.
392-399:⚠️ Potential issue | 🟠 MajorDistinguish 404 "run not found" errors from auth/network/server failures
Line 392 catches all exceptions indiscriminately and always raises
ValueErrorclaiming the run is missing. This masks authentication failures, network issues, and server errors. Check the exception'sstatus_codeattribute (both LangfuseNotFoundErrorandApiErrorexpose it) to differentiate: only convert 404 responses to "run not found"; re-raise other errors.Proposed fix
try: dataset_run = langfuse.api.datasets.get_run(dataset_name, run_name) except Exception as e: - logger.warning( - f"[fetch_trace_scores_from_langfuse] Run not found in Langfuse | " - f"dataset={dataset_name} | run={run_name} | error={e}" - ) - raise ValueError( - f"Run '{run_name}' not found in Langfuse dataset '{dataset_name}'" - ) + status_code = getattr(e, "status_code", None) or getattr( + getattr(e, "response", None), "status_code", None + ) + if status_code == 404: + logger.warning( + f"[fetch_trace_scores_from_langfuse] Run not found in Langfuse | " + f"dataset={dataset_name} | run={run_name}" + ) + raise ValueError( + f"Run '{run_name}' not found in Langfuse dataset '{dataset_name}'" + ) from e + logger.error( + f"[fetch_trace_scores_from_langfuse] Failed to fetch run | " + f"dataset={dataset_name} | run={run_name} | error={e}", + exc_info=True, + ) + raise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/evaluations/langfuse.py` around lines 392 - 399, The except block in fetch_trace_scores_from_langfuse currently catches all exceptions and always raises a ValueError claiming the run is missing; change it to inspect the caught exception (e) for HTTP status info (check e.status_code and/or isinstance(e, NotFoundError or ApiError) as exported by the Langfuse client) and only convert to ValueError when the status_code == 404 (or the exception is a NotFoundError); for any other status codes or exceptions (auth/network/server errors) re-raise the original exception after logging the error with the same context (dataset_name/run_name).backend/app/utils.py (1)
276-283:⚠️ Potential issue | 🟠 MajorKeep internal 500 paths at
errorlevel.These branches catch unexpected runtime failures and then raise
HTTPException(status_code=500). Logging them as warnings can hide true server faults from error-based alerting.Suggested patch
- logger.warning( + logger.error( f"[get_openai_client] Failed to configure OpenAI client. | project_id: {project_id} | error: {str(e)}", exc_info=True, ) @@ - logger.warning( + logger.error( f"[get_langfuse_client] Failed to configure Langfuse client. | project_id: {project_id} | error: {str(e)}", exc_info=True, )Also applies to: 316-323
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/utils.py` around lines 276 - 283, The logged 500-level error paths in utils.py are using logger.warning, which should be logger.error so server faults trigger error-based alerting; update the logging calls in the get_openai_client error handler (the logger.warning that includes "[get_openai_client] Failed to configure OpenAI client") to logger.error and preserve exc_info=True and the error detail format, and make the same change for the second occurrence mentioned around the other 500-raising block (the similar logger.warning at lines ~316-323) so all unexpected runtime failures that raise HTTPException(status_code=500) are logged at error level.
🧹 Nitpick comments (6)
backend/app/crud/document/document.py (1)
47-54: Simplify local exception flow by removing immediate raise-and-catch blocks.The code raises an exception and immediately catches the same type only to log and re-raise at multiple locations, adding nesting without changing behavior.
The review identified 3 instances, but the codebase contains at least 4 matches in this file alone:
- Lines 48-54:
ValueErrorfor negative skip- Lines 59-65:
ValueErrorfor negative limit- Lines 89-98:
ValueErrorfor mismatched document count (not mentioned in original review)- Lines 110-116:
PermissionErrorfor document ownershipAdditionally,
backend/app/crud/rag/open_ai.py:153-160contains the same pattern withValueErrorfor retries validation.Suggested refactoring approach
Remove the try-except wrapper and log before raising:
if skip < 0: - try: - raise ValueError(f"Negative skip: {skip}") - except ValueError as err: - logger.warning( - f"[DocumentCrud.read_many] Invalid skip value | {{'project_id': {self.project_id}, 'skip': {skip}, 'error': '{str(err)}'}}", - ) - raise + logger.warning( + f"[DocumentCrud.read_many] Invalid skip value | {{'project_id': {self.project_id}, 'skip': {skip}, 'error': 'Negative skip: {skip}'}}", + ) + raise ValueError(f"Negative skip: {skip}")Apply the same pattern to all 4 instances in document.py and the related instance in open_ai.py.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/document/document.py` around lines 47 - 54, Remove the raise-then-immediately-catch pattern and instead log then raise directly: in DocumentCrud.read_many (the negative skip and negative limit checks), the mismatched document count check, and the document ownership PermissionError check, replace the try/except blocks with a single logger.warning/error call (including self.project_id, skip/limit/count/ownership context and the error text) immediately followed by raising the appropriate exception (ValueError or PermissionError); apply the same change in the OpenAI retries validation function in backend/app/crud/rag/open_ai.py (the ValueError for retries) so each case logs context and then raises without wrapping the raise in a try/except.backend/app/core/telemetry.py (1)
69-75: Consider clearing tenant keys whenNoneis passed.Right now,
org_id/project_idare only overwritten, never removed. If this helper is invoked multiple times in one context, stale tenant values can persist.Proposed refactor
payload = dict(current) if org_id is not None: payload["org_id"] = str(org_id) + else: + payload.pop("org_id", None) if project_id is not None: payload["project_id"] = str(project_id) + else: + payload.pop("project_id", None) _log_context_var.set(payload)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/core/telemetry.py` around lines 69 - 75, The current logic on _log_context_var only overwrites org_id/project_id but never removes them, causing stale tenant IDs to persist; update the helper that builds payload (using current and payload) so that when org_id is None you remove payload["org_id"] if present (payload.pop("org_id", None)) and likewise remove payload["project_id"] when project_id is None, otherwise set them to str(org_id)/str(project_id); then call _log_context_var.set(payload) as before.backend/app/services/collections/providers/registry.py (1)
66-69: This unsupported-provider branch is currently unreachable.Line 47 already rejects unsupported providers, so this warning path won’t execute in the current control flow.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/collections/providers/registry.py` around lines 66 - 69, The warning + ValueError block at the end of get_llm_provider is unreachable because an earlier validation already rejects unsupported providers; remove this redundant branch (the logger.warning and raise ValueError) from get_llm_provider so the function only contains the actual provider handling logic, or if you intended a fallback here instead, move the unsupported-provider check out of the earlier validator and let get_llm_provider perform the validation and raise the error.backend/app/services/llm/jobs.py (1)
466-466: Consider adding fallback forcorrelation_id.get()for consistency.Other locations (lines 121, 171) use
correlation_id.get() or "N/A"as a fallback. Sinceexecute_llm_callmay run in Celery background tasks where the correlation ID middleware context might not be available, consider adding a similar fallback.Suggested fix
- trace_id = correlation_id.get() + trace_id = correlation_id.get() or "N/A"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/jobs.py` at line 466, The tracer correlation_id retrieval at trace_id = correlation_id.get() may return None in Celery tasks; update the assignment in the execute_llm_call scope to use a fallback like correlation_id.get() or "N/A" so trace_id is always a string; locate the trace_id variable in jobs.py (the execute_llm_call / related job function) and replace the raw get() call with the same fallback pattern used elsewhere (e.g., lines using correlation_id.get() or "N/A") to ensure consistent logging/trace handling in background tasks.backend/app/crud/rag/open_ai.py (1)
57-60: Consider retainingexc_info=Truefor debugging OpenAI cleanup failures.The log level downgrade to warning is appropriate, but removing
exc_info=Trueloses the stack trace which can be valuable for diagnosing intermittent cleanup failures.Suggested fix
except OpenAIError as err: logger.warning( f"[ResourceCleaner.call] OpenAI error during cleanup | {{'cleaner_type': '{self}', 'resource': '{resource}', 'error': '{str(err)}'}}", + exc_info=True, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/rag/open_ai.py` around lines 57 - 60, The logger.warning in ResourceCleaner.call currently logs OpenAIError without the stack trace; update the except OpenAIError as err handler so the logger.warning call for logger.warning(f"[ResourceCleaner.call] OpenAI error during cleanup | {{'cleaner_type': '{self}', 'resource': '{resource}', 'error': '{str(err)}'}}", ...) includes exc_info=True (or passes exc_info=err) to retain the exception traceback for debugging; keep the message format and warning level but add exc_info to the logger.warning invocation.backend/app/api/routes/model_evaluation.py (1)
146-149: Minor: Log prefix uses singularevaluate_modelbut function isevaluate_models.The log level downgrade is appropriate for input validation failures.
Optional fix for consistency
logger.warning( - f"[evaluate_model] No fine tuning IDs provided | project_id:{current_user.project_.id}" + f"[evaluate_models] No fine tuning IDs provided | project_id:{current_user.project_.id}" )As per coding guidelines, log messages should be prefixed with the function name in square brackets.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/model_evaluation.py` around lines 146 - 149, The warning log uses the wrong function name prefix; update the logger.warning call inside the evaluate_models function to use the correct prefix "[evaluate_models]" (keep the existing message and project id context via current_user.project_.id) so the log follows the convention of using the function name in square brackets; leave the raised HTTPException unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/api/routes/users.py`:
- Around line 53-55: The logs currently print raw email PII (e.g., in
logger.warning inside create_user_endpoint and similar spots); replace direct
email usage with the mask_string utility and ensure each log message is prefixed
with the function name in square brackets (e.g., use mask_string(user_in.email)
in the logger call in create_user_endpoint), and apply the same change to
update_user_me, register_user, and update_user_endpoint so all email logging
uses mask_string(sensitive_value) and follows the "[function_name] Message"
prefix convention.
In `@backend/app/core/util.py`:
- Line 43: The log call logger.warning(f"Failed to configure OpenAI client:
{str(e)}") is missing the required function-name prefix; update that warning to
include the current function's name in square brackets (e.g.
"[configure_openai_client] Failed to configure OpenAI client: {str(e)}") by
inserting the actual function name where this line resides before the message so
it follows the project log format.
In `@backend/app/crud/assistants.py`:
- Around line 86-92: Update the two logger.warning calls in
verify_vector_store_ids_exist to follow the logging guideline: prefix messages
with the function name in square brackets and do not log raw vector_store_id;
instead call mask_string(vector_store_id). For example, change the "Vector store
ID ... not found" and the "Failed to verify ..." warnings to use
logger.warning(f"[verify_vector_store_ids_exist] Vector store ID
{mask_string(vector_store_id)} not found") and
logger.warning(f"[verify_vector_store_ids_exist] Failed to verify vector store
ID {mask_string(vector_store_id)}: {e}") respectively, keeping the exception
variable e for context.
In `@backend/app/crud/evaluations/embeddings.py`:
- Around line 169-171: The log prefix is incorrect: replace the hardcoded
"[parse_embedding_batch_results]" prefix in the logger.warning call inside
parse_embedding_results with the actual function name
"[parse_embedding_results]" so the message reads like
logger.warning(f"[parse_embedding_results] Trace {trace_id} had error:
{error_msg}"), keeping the existing variables (trace_id, error_msg) and logger
usage intact for consistent log filtering.
In `@backend/app/crud/model_evaluation.py`:
- Around line 29-33: The conditional in create_model_evaluation is inverted:
change the check that currently reads against fine_tuning_job.fine_tuned_model
and fine_tuning_job.test_data_s3_object to instead detect a missing model or
missing test data (e.g., use "not fine_tuning_job.fine_tuned_model or
fine_tuning_job.test_data_s3_object is None"), and update the logger.warning
message to accurately state which resource is missing (reference
fine_tuning_job, request.fine_tuning_id and project_id) before raising the
HTTPException(404, ...) so the log and exception reflect the actual missing
resource(s).
In `@backend/app/services/response/response.py`:
- Around line 149-152: The log prefix is incorrect: replace the hardcoded
"[process_response_task]" in the logger.warning call inside generate_response
with the actual function name or a dynamic prefix that uses "generate_response";
locate the logger.warning invocation in generate_response (the one logging
OpenAI API errors with exc_info=True) and update the message prefix to
"[generate_response]" (or build it from the function name) so log messages
correctly reflect the function.
---
Outside diff comments:
In `@backend/app/crud/document/document.py`:
- Around line 102-127: Add an explicit return type annotation to the
DocumentCrud.update method: change its signature from def update(self, document:
Document) to def update(self, document: Document) -> Document so it clearly
indicates it returns a Document; ensure any necessary imports or forward
references for Document are present and update any type-checking comments if
used.
In `@backend/app/crud/evaluations/langfuse.py`:
- Around line 271-275: The warning log in upload_dataset_to_langfuse currently
includes raw question text (item['question']) which can leak sensitive data;
change the logger.warning call in upload_dataset_to_langfuse to use a
non-sensitive identifier (e.g., item['question_id'] or a masked version via
mask_string(item['question'])) instead of item['question'] and ensure the
message is prefixed with the function name in square brackets (e.g.,
logger.warning(f"[upload_dataset_to_langfuse] ...")). Keep the rest of the
context (duplicate count and exception e) intact and use mask_string(...) for
any sensitive-value display if no question_id is available.
- Around line 392-399: The except block in fetch_trace_scores_from_langfuse
currently catches all exceptions and always raises a ValueError claiming the run
is missing; change it to inspect the caught exception (e) for HTTP status info
(check e.status_code and/or isinstance(e, NotFoundError or ApiError) as exported
by the Langfuse client) and only convert to ValueError when the status_code ==
404 (or the exception is a NotFoundError); for any other status codes or
exceptions (auth/network/server errors) re-raise the original exception after
logging the error with the same context (dataset_name/run_name).
In `@backend/app/utils.py`:
- Around line 276-283: The logged 500-level error paths in utils.py are using
logger.warning, which should be logger.error so server faults trigger
error-based alerting; update the logging calls in the get_openai_client error
handler (the logger.warning that includes "[get_openai_client] Failed to
configure OpenAI client") to logger.error and preserve exc_info=True and the
error detail format, and make the same change for the second occurrence
mentioned around the other 500-raising block (the similar logger.warning at
lines ~316-323) so all unexpected runtime failures that raise
HTTPException(status_code=500) are logged at error level.
---
Nitpick comments:
In `@backend/app/api/routes/model_evaluation.py`:
- Around line 146-149: The warning log uses the wrong function name prefix;
update the logger.warning call inside the evaluate_models function to use the
correct prefix "[evaluate_models]" (keep the existing message and project id
context via current_user.project_.id) so the log follows the convention of using
the function name in square brackets; leave the raised HTTPException unchanged.
In `@backend/app/core/telemetry.py`:
- Around line 69-75: The current logic on _log_context_var only overwrites
org_id/project_id but never removes them, causing stale tenant IDs to persist;
update the helper that builds payload (using current and payload) so that when
org_id is None you remove payload["org_id"] if present (payload.pop("org_id",
None)) and likewise remove payload["project_id"] when project_id is None,
otherwise set them to str(org_id)/str(project_id); then call
_log_context_var.set(payload) as before.
In `@backend/app/crud/document/document.py`:
- Around line 47-54: Remove the raise-then-immediately-catch pattern and instead
log then raise directly: in DocumentCrud.read_many (the negative skip and
negative limit checks), the mismatched document count check, and the document
ownership PermissionError check, replace the try/except blocks with a single
logger.warning/error call (including self.project_id, skip/limit/count/ownership
context and the error text) immediately followed by raising the appropriate
exception (ValueError or PermissionError); apply the same change in the OpenAI
retries validation function in backend/app/crud/rag/open_ai.py (the ValueError
for retries) so each case logs context and then raises without wrapping the
raise in a try/except.
In `@backend/app/crud/rag/open_ai.py`:
- Around line 57-60: The logger.warning in ResourceCleaner.call currently logs
OpenAIError without the stack trace; update the except OpenAIError as err
handler so the logger.warning call for logger.warning(f"[ResourceCleaner.call]
OpenAI error during cleanup | {{'cleaner_type': '{self}', 'resource':
'{resource}', 'error': '{str(err)}'}}", ...) includes exc_info=True (or passes
exc_info=err) to retain the exception traceback for debugging; keep the message
format and warning level but add exc_info to the logger.warning invocation.
In `@backend/app/services/collections/providers/registry.py`:
- Around line 66-69: The warning + ValueError block at the end of
get_llm_provider is unreachable because an earlier validation already rejects
unsupported providers; remove this redundant branch (the logger.warning and
raise ValueError) from get_llm_provider so the function only contains the actual
provider handling logic, or if you intended a fallback here instead, move the
unsupported-provider check out of the earlier validator and let get_llm_provider
perform the validation and raise the error.
In `@backend/app/services/llm/jobs.py`:
- Line 466: The tracer correlation_id retrieval at trace_id =
correlation_id.get() may return None in Celery tasks; update the assignment in
the execute_llm_call scope to use a fallback like correlation_id.get() or "N/A"
so trace_id is always a string; locate the trace_id variable in jobs.py (the
execute_llm_call / related job function) and replace the raw get() call with the
same fallback pattern used elsewhere (e.g., lines using correlation_id.get() or
"N/A") to ensure consistent logging/trace handling in background tasks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6895a805-d798-4cdf-ba45-ea0179becebf
📒 Files selected for processing (39)
backend/app/api/deps.pybackend/app/api/routes/credentials.pybackend/app/api/routes/model_evaluation.pybackend/app/api/routes/organization.pybackend/app/api/routes/project.pybackend/app/api/routes/threads.pybackend/app/api/routes/users.pybackend/app/core/middleware.pybackend/app/core/providers.pybackend/app/core/sentry_filters.pybackend/app/core/telemetry.pybackend/app/core/util.pybackend/app/crud/assistants.pybackend/app/crud/config/version.pybackend/app/crud/credentials.pybackend/app/crud/document/document.pybackend/app/crud/evaluations/batch.pybackend/app/crud/evaluations/dataset.pybackend/app/crud/evaluations/embeddings.pybackend/app/crud/evaluations/langfuse.pybackend/app/crud/fine_tuning.pybackend/app/crud/model_evaluation.pybackend/app/crud/organization.pybackend/app/crud/project.pybackend/app/crud/rag/open_ai.pybackend/app/crud/stt_evaluations/dataset.pybackend/app/crud/tts_evaluations/dataset.pybackend/app/services/collections/providers/registry.pybackend/app/services/evaluations/validators.pybackend/app/services/llm/chain/chain.pybackend/app/services/llm/chain/executor.pybackend/app/services/llm/jobs.pybackend/app/services/llm/providers/eai.pybackend/app/services/llm/providers/oai.pybackend/app/services/llm/providers/sai.pybackend/app/services/response/response.pybackend/app/services/stt_evaluations/batch_job.pybackend/app/services/tts_evaluations/batch_job.pybackend/app/utils.py
💤 Files with no reviewable changes (2)
- backend/app/api/routes/project.py
- backend/app/api/routes/organization.py
| logger.warning( | ||
| f"[create_user_endpoint] Attempting to create user with existing email | email: {user_in.email}" | ||
| ) |
There was a problem hiding this comment.
Potential PII exposure: email addresses logged without masking.
Email addresses are PII and are being logged directly. Consider using mask_string to protect user privacy. This pattern repeats in lines 84, 151, and 202.
Suggested fix for this and similar occurrences
+from app.utils import mask_string
+
...
logger.warning(
- f"[create_user_endpoint] Attempting to create user with existing email | email: {user_in.email}"
+ f"[create_user_endpoint] Attempting to create user with existing email | email: {mask_string(user_in.email)}"
)Apply similar masking to:
- Line 84:
update_user_me - Line 151:
register_user - Line 202:
update_user_endpoint
As per coding guidelines: Prefix all log messages with the function name in square brackets: logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.warning( | |
| f"[create_user_endpoint] Attempting to create user with existing email | email: {user_in.email}" | |
| ) | |
| from app.utils import mask_string | |
| ... | |
| logger.warning( | |
| f"[create_user_endpoint] Attempting to create user with existing email | email: {mask_string(user_in.email)}" | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/api/routes/users.py` around lines 53 - 55, The logs currently
print raw email PII (e.g., in logger.warning inside create_user_endpoint and
similar spots); replace direct email usage with the mask_string utility and
ensure each log message is prefixed with the function name in square brackets
(e.g., use mask_string(user_in.email) in the logger call in
create_user_endpoint), and apply the same change to update_user_me,
register_user, and update_user_endpoint so all email logging uses
mask_string(sensitive_value) and follows the "[function_name] Message" prefix
convention.
| return client, True | ||
| except Exception as e: | ||
| logger.error(f"Failed to configure OpenAI client: {str(e)}") | ||
| logger.warning(f"Failed to configure OpenAI client: {str(e)}") |
There was a problem hiding this comment.
Add the required function-name prefix to this log message.
This line should follow the project log format ([function_name] ...).
Proposed fix
- logger.warning(f"Failed to configure OpenAI client: {str(e)}")
+ logger.warning(f"[configure_openai] Failed to configure OpenAI client: {str(e)}")As per coding guidelines: Prefix all log messages with the function name in square brackets: logger.info(f"[function_name] Message {mask_string(sensitive_value)}").
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/core/util.py` at line 43, The log call logger.warning(f"Failed to
configure OpenAI client: {str(e)}") is missing the required function-name
prefix; update that warning to include the current function's name in square
brackets (e.g. "[configure_openai_client] Failed to configure OpenAI client:
{str(e)}") by inserting the actual function name where this line resides before
the message so it follows the project log format.
| logger.warning(f"Vector store ID {vector_store_id} not found in OpenAI.") | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail=f"Vector store ID {vector_store_id} not found in OpenAI.", | ||
| ) | ||
| except openai.OpenAIError as e: | ||
| logger.error(f"Failed to verify vector store ID {vector_store_id}: {e}") | ||
| logger.warning(f"Failed to verify vector store ID {vector_store_id}: {e}") |
There was a problem hiding this comment.
Make warning logs guideline-compliant in verify_vector_store_ids_exist.
Line 86 and Line 92 should include the function-name prefix and avoid logging raw vector_store_id values.
Suggested fix
- logger.warning(f"Vector store ID {vector_store_id} not found in OpenAI.")
+ logger.warning(
+ f"[verify_vector_store_ids_exist] Vector store ID {mask_string(vector_store_id)} not found in OpenAI."
+ )
...
- logger.warning(f"Failed to verify vector store ID {vector_store_id}: {e}")
+ logger.warning(
+ f"[verify_vector_store_ids_exist] Failed to verify vector store ID {mask_string(vector_store_id)}: {e}"
+ )As per coding guidelines, "Prefix all log messages with the function name in square brackets: logger.info(f\"[function_name] Message {mask_string(sensitive_value)}\")."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.warning(f"Vector store ID {vector_store_id} not found in OpenAI.") | |
| raise HTTPException( | |
| status_code=400, | |
| detail=f"Vector store ID {vector_store_id} not found in OpenAI.", | |
| ) | |
| except openai.OpenAIError as e: | |
| logger.error(f"Failed to verify vector store ID {vector_store_id}: {e}") | |
| logger.warning(f"Failed to verify vector store ID {vector_store_id}: {e}") | |
| logger.warning( | |
| f"[verify_vector_store_ids_exist] Vector store ID {mask_string(vector_store_id)} not found in OpenAI." | |
| ) | |
| raise HTTPException( | |
| status_code=400, | |
| detail=f"Vector store ID {vector_store_id} not found in OpenAI.", | |
| ) | |
| except openai.OpenAIError as e: | |
| logger.warning( | |
| f"[verify_vector_store_ids_exist] Failed to verify vector store ID {mask_string(vector_store_id)}: {e}" | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/crud/assistants.py` around lines 86 - 92, Update the two
logger.warning calls in verify_vector_store_ids_exist to follow the logging
guideline: prefix messages with the function name in square brackets and do not
log raw vector_store_id; instead call mask_string(vector_store_id). For example,
change the "Vector store ID ... not found" and the "Failed to verify ..."
warnings to use logger.warning(f"[verify_vector_store_ids_exist] Vector store ID
{mask_string(vector_store_id)} not found") and
logger.warning(f"[verify_vector_store_ids_exist] Failed to verify vector store
ID {mask_string(vector_store_id)}: {e}") respectively, keeping the exception
variable e for context.
| logger.warning( | ||
| f"[parse_embedding_batch_results] Trace {trace_id} had error: {error_msg}" | ||
| ) |
There was a problem hiding this comment.
Use the actual function name in the log prefix.
At Line 170, the prefix is [parse_embedding_batch_results], but this log is inside parse_embedding_results. Please align the prefix with the function name for consistent log filtering.
Suggested fix
- logger.warning(
- f"[parse_embedding_batch_results] Trace {trace_id} had error: {error_msg}"
- )
+ logger.warning(
+ f"[parse_embedding_results] Trace {trace_id} had error: {error_msg}"
+ )As per coding guidelines, "Prefix all log messages with the function name in square brackets: logger.info(f\"[function_name] Message {mask_string(sensitive_value)}\")".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/crud/evaluations/embeddings.py` around lines 169 - 171, The log
prefix is incorrect: replace the hardcoded "[parse_embedding_batch_results]"
prefix in the logger.warning call inside parse_embedding_results with the actual
function name "[parse_embedding_results]" so the message reads like
logger.warning(f"[parse_embedding_results] Trace {trace_id} had error:
{error_msg}"), keeping the existing variables (trace_id, error_msg) and logger
usage intact for consistent log filtering.
| if fine_tuning_job.fine_tuned_model and fine_tuning_job.test_data_s3_object is None: | ||
| logger.error( | ||
| logger.warning( | ||
| f"[create_model_evaluation] No fine tuned model or test data found for the given fine tuning ID | fine_tuning_id={request.fine_tuning_id}, project_id={project_id}" | ||
| ) | ||
| raise HTTPException(404, "Fine tuned model not found") |
There was a problem hiding this comment.
Logic issue: condition appears inverted.
The condition fine_tuning_job.fine_tuned_model and fine_tuning_job.test_data_s3_object is None checks if the model EXISTS but test data is missing. However, the log message and exception say "No fine tuned model or test data found". The log message is misleading.
Suggested log message fix
if fine_tuning_job.fine_tuned_model and fine_tuning_job.test_data_s3_object is None:
logger.warning(
- f"[create_model_evaluation] No fine tuned model or test data found for the given fine tuning ID | fine_tuning_id={request.fine_tuning_id}, project_id={project_id}"
+ f"[create_model_evaluation] Test data not found for fine tuned model | fine_tuning_id={request.fine_tuning_id}, project_id={project_id}"
)
raise HTTPException(404, "Fine tuned model not found")Or verify if the condition logic itself needs correction.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/crud/model_evaluation.py` around lines 29 - 33, The conditional
in create_model_evaluation is inverted: change the check that currently reads
against fine_tuning_job.fine_tuned_model and fine_tuning_job.test_data_s3_object
to instead detect a missing model or missing test data (e.g., use "not
fine_tuning_job.fine_tuned_model or fine_tuning_job.test_data_s3_object is
None"), and update the logger.warning message to accurately state which resource
is missing (reference fine_tuning_job, request.fine_tuning_id and project_id)
before raising the HTTPException(404, ...) so the log and exception reflect the
actual missing resource(s).
| logger.warning( | ||
| f"[process_response_task] OpenAI API error: {error_message}", | ||
| exc_info=True, | ||
| ) |
There was a problem hiding this comment.
Log prefix mismatch: function is generate_response, not process_response_task.
The log message uses [process_response_task] but the enclosing function is generate_response. As per coding guidelines, log messages should be prefixed with the actual function name.
Suggested fix
logger.warning(
- f"[process_response_task] OpenAI API error: {error_message}",
+ f"[generate_response] OpenAI API error: {error_message}",
exc_info=True,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.warning( | |
| f"[process_response_task] OpenAI API error: {error_message}", | |
| exc_info=True, | |
| ) | |
| logger.warning( | |
| f"[generate_response] OpenAI API error: {error_message}", | |
| exc_info=True, | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/response/response.py` around lines 149 - 152, The log
prefix is incorrect: replace the hardcoded "[process_response_task]" in the
logger.warning call inside generate_response with the actual function name or a
dynamic prefix that uses "generate_response"; locate the logger.warning
invocation in generate_response (the one logging OpenAI API errors with
exc_info=True) and update the message prefix to "[generate_response]" (or build
it from the function name) so log messages correctly reflect the function.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…bs and configuring logging in main application
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/app/celery/celery_app.py (1)
24-27: ⚡ Quick winAvoid hardcoded
"kaapi-celery"in multiple call sites.Use a single settings-backed value for Celery service naming to prevent drift and allow env-based overrides.
Proposed refactor
# backend/app/core/config.py class Settings(BaseSettings): ... + CELERY_SERVICE_NAME: str = "kaapi-celery"# backend/app/celery/celery_app.py `@setup_logging.connect` def configure_celery_logging(**_: object) -> None: - configure_logging(service_name="kaapi-celery") + configure_logging(service_name=settings.CELERY_SERVICE_NAME) def _initialize_worker_observability() -> None: global _telemetry_initialized, _sentry_initialized, _flush_hook_registered - configure_logging(service_name="kaapi-celery") + configure_logging(service_name=settings.CELERY_SERVICE_NAME)Also applies to: 32-32
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/celery/celery_app.py` around lines 24 - 27, Replace the hardcoded "kaapi-celery" string with a settings-backed config and use it in both call sites: import your app settings (e.g., from settings import settings or from config import AppSettings) and expose a setting like CELERY_SERVICE_NAME (or SERVICE_NAME_CELERY) then update configure_celery_logging to call configure_logging(service_name=settings.CELERY_SERVICE_NAME) and the other occurrence (the second configure_logging call referenced in the review) to use the same settings.CELERY_SERVICE_NAME so the service name is configurable via environment/config.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@backend/app/celery/celery_app.py`:
- Around line 24-27: Replace the hardcoded "kaapi-celery" string with a
settings-backed config and use it in both call sites: import your app settings
(e.g., from settings import settings or from config import AppSettings) and
expose a setting like CELERY_SERVICE_NAME (or SERVICE_NAME_CELERY) then update
configure_celery_logging to call
configure_logging(service_name=settings.CELERY_SERVICE_NAME) and the other
occurrence (the second configure_logging call referenced in the review) to use
the same settings.CELERY_SERVICE_NAME so the service name is configurable via
environment/config.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 01dc28c0-568d-4a51-ac81-c4864a631395
📒 Files selected for processing (6)
backend/app/api/routes/cron.pybackend/app/celery/celery_app.pybackend/app/core/config.pybackend/app/core/logger.pybackend/app/core/middleware.pybackend/app/main.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/core/middleware.py
…nd streamline error handling
… across multiple modules
Summary
Target issue is #421
Explain the motivation for making this change. What existing problem does the pull request solve?
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Please add here if any other information is required for the reviewer.
Summary by CodeRabbit
New Features
Improvements