Revieko — Report Fields Reference (Summary / Full report / JSON)

This document is a field reference for Revieko reports: what each field means, where it appears (PR comment / Full report / JSON), how to interpret it, and common reasons behind “weird” values.

Read first: docs/PR_REVIEW_QUICKSTART.md (the 3–5 minute review path).

This document is strictly a field dictionary.

0) Context: what “reports” exist

In a GitHub PR, there are usually two output levels:

PR comment (Summary)

Short: statuses, top-N hotspots, per-file structural risk, links to the Full report.

Full report (Markdown / JSON)

A complete artifact: tables, additional sections, internal/service fields, analysis limits.

1) Notation and conventions

1.1 Risks and scales

_risk ∈ [0,100] — normalized scores (good for thresholds and “traffic lights”).
_risk_level ∈ {low, medium, high} — discretized levels for quick decisions.
_density ∈ [0,1] — fraction of “signal” tokens/effects in a chosen area.

1.2 Locations and line numbers

In Markdown reports, hotspot locations are usually:

path/to/file.py:LINE_START-LINE_END

In JSON, typically:

file_path
line_start / line_end (1-based)

In PR mode, lines refer to the new version of the file (after applying the diff).

1.3 Fields may be empty

The Effects / Taint / Control columns in hotspots may be empty if:

that channel is disabled in the config/version,
heuristics found nothing in that specific window,
analysis is partial and that chunk wasn’t analyzed.

2) Summary (PR comment): fields and meaning

A typical header:

Status: High risk
struct_risk: 69.49
hotspots: 10
ci_status: warn
control_risk_level: low
analysis: full|partial

Below is the breakdown.

2.1 Status (human-readable status)

Where: Summary / Full report (Markdown)
Type: string (usually “Low/Medium/High risk”)
Meaning: a human-friendly risk/policy label.

In practice, Status usually aligns with risk_level + ci_status, but it’s still just a label.

2.2 risk_level

Where: Full report (Markdown), often in JSON (depends on CLI/integration mode)
Type: enum: low | medium | high
Meaning: a discrete structural-risk level (for the “dig deeper or not” decision).

Typical interpretation:

low: structurally similar to the repo’s typical code.
medium: review hotspots.
high: noticeable structural anomaly — almost always warrants manual inspection.

2.3 struct_risk

Where: Summary / Full report (Markdown)
Type: float 0..100
Meaning: global structural risk for the analyzed object (PR/diff).

Notes:

In PR reports, this is the overall risk for the diff.
File-level detail is in Per-file structural risk.
In JSON, the equivalent is usually global_struct_risk.

2.4 hotspots (count)

Where: Summary (short comment)
Type: int
Meaning: how many hotspots are displayed in this comment.

Important: this is not “coverage” and not “total anomalies”. It’s the number of printed hotspots (top-N), limited by output policy.

2.5 ci_status

Where: Summary / Full report (Markdown), sometimes JSON
Type: enum: ok | warn | fail
Meaning: CI/integration signal (how to display/gate).

Typical policy:

ok: no thresholds exceeded.
warn: there’s something to look at (soft gate).
fail: strict mode (hard gate) — depends on configuration.

2.6 control_risk_level

Where: Summary / Full report (Markdown), JSON (if Control channel is enabled)
Type: enum: low | medium | high
Meaning: discrete level of “control complexity” (conditions / error paths / return tails).

Intuition:

low: branching/errors/return tails look normal.
high: conditions and/or error branches and/or tails are overheated → hard to read, easy to get wrong.

2.7 analysis: full|partial

Where: Summary / Full report (Markdown)
Type: enum: full | partial
Meaning: coverage of supported file types.

full → all supported code in the PR was analyzed (currently: Python).
partial → the PR includes changes in files CodeGuard doesn’t analyze (e.g., YAML/CSV/Markdown). Python files are still analyzed fully; “partial” refers to the PR as a whole.

Practical notes:

Don’t read partial as “Python was only half analyzed”.
Read partial as “some of the PR lies outside analysis scope and must be reviewed manually.”

3) Per-file structural risk (table)

Where: Summary / Full report (Markdown)
Meaning: each file’s contribution to overall structural risk.

Columns:

File — path
Risk — file structural risk (0..100)

How to read:

“One file carries everything” → start there.
Risk is spread out → the PR likely changes a layer/style in multiple places.

JSON equivalent: file_struct_risk: { "path": number, ... }

4) Hotspots: per-row fields and how to interpret them

A Hotspot is a local line range that looks atypical.

4.1 Hotspot fields in Markdown (table)

A typical row:

Column breakdown:

Location
Path and line range: file.py:START-END

Kind
Type: enum. Common values:

struct_pattern_break — a break in the typical structural pattern in this window (unusual IF/LOOP/RETURN markers, new “patterns”, style shift).
depth_spike — locally abnormal nesting depth.
control_outlier — window is overheated in control regimes (conditions/errors/return tails).
mixed — no single dominant factor, but the window still deviates.

Score
Type: float (ranking)
Meaning: review priority (higher → earlier in the list).

Important:

score is not bug probability and not an absolute “quality score”.
It exists to sort hotspots within a report.

Effects
Type: string / label / short hint
Meaning: semantic hint about side effects (if Semantic channel is enabled and something is found).

Typical effects (token/fragment level):
LOG, DB, FILE_IO, NET_IO, EXEC, ...

Taint
Type: string / label / short hint
Meaning: nearby data sources (if enabled), e.g.:

USER_INPUT — input data
SECRET — potential secrets
CONST — constants

Control
Type: string / label / short hint
Meaning: control context for the window, e.g.:
BRANCH_COND, ERROR_PATH, RETURN_PATH, INIT_PATH, CLEANUP_PATH, NORMAL

4.2 Hotspot fields in JSON

Typically:

file_path: string
line_start, line_end: int (1-based)
segment_id: string (e.g., hunk_0)
segment_kind: enum (diff_hunk, full_file, …)
kind: enum
score: float
(optional) effect_hint, taint_hint, control_hint

5) Control summary: table and fields

Where: Full report (Markdown) and JSON (if Control channel is enabled)

5.1 Control summary (Markdown)

A PR report may include a table:

Meaning:

overall — overall control risk for the file (often aggregate/max).
branch_cond — branching/conditions risk.
error_path — error-branch risk.
return_path — “return/raise tail” risk.

5.2 Control summary (JSON): typical fields in `control_summary.per_file[path]`

A common “per-file aggregates” model includes:

Shares of control regimes

control_share_branch_cond
control_share_error_path
control_share_return_path
control_share_init_path
control_share_cleanup_path

Complexity metrics by regime

control_*_complexity_mean
control_*_complexity_max

Cross metrics: control × effects

branch_cond_effect_density
error_path_effect_density
return_path_effect_density

Control risks (0..100)

branch_cond_risk
error_path_risk
return_path_risk
overall_control_risk

5.3 How to read a “strong signal” in the Control layer

Examples:

High branch_cond_risk + branch_cond_effect_density > 0 → conditions contain IO/NET/DB/LOG (often suspicious and hard to review).
High error_path_risk + high error_path_effect_density → error handlers perform many effects.
High return_path_risk + high return_path_effect_density → effectful logic right before returning (changes behavior at exit).

6) Semantic layer: effect_summary, file_semantic_risk, taint

If the Semantic channel is enabled, JSON (and sometimes Markdown) includes effect/flow-related fields.

6.1 effect_summary

Where: JSON, sometimes Full report (Markdown)
Meaning: aggregates for effects/taint.

effect_summary.per_file[path] (per file)
Typical metrics:

effect_density_tail ∈ [0,1]
Share of effectful tokens (NET/DB/FILE/EXEC/LOG) in the tail of the file/segment.
Intuition: closer to 1 → the tail is densely effectful.
dangerous_flow_score ∈ [0,1]
Coarse signal that USER_INPUT occurs near effects (NET_IO/DB/EXEC/LOG).
> 0 → at least one potentially risky neighboring flow was detected.
secret_leak_score ∈ [0,1]
Coarse signal that SECRET appears near LOG / NET_IO.
> 0 → potential secret leakage.

effect_summary.global
Maxima/aggregates across all report files — quick answer to “is there any effect activity at all?”

6.2 file_semantic_risk and file_semantic_risk_level

Where: JSON
Type:

file_semantic_risk[path] ∈ [0,100]
file_semantic_risk_level[path] ∈ {low, medium, high}

Meaning: a repo-aware semantic risk estimate for the file (effect tail + suspicious flow + historical adjustment).

Common prioritization rule:
High struct_risk and high file_semantic_risk for the same file → that file almost always deserves first-pass manual review.

6.3 taint (how to interpret)

Taint labels are usually not “security verdicts”; they describe nearby data sources:

USER_INPUT → track where the value goes next (especially near NET/DB/EXEC/LOG).
SECRET → ensure it isn’t logged or sent over the network.

7) Invariants / rule violations (if rules are enabled)

If team rules (semantic invariants) are configured, the report may include:

invariant_violations: a list of violations
- rule/id
- location
- short explanation

This is the most straightforward layer:
a rule is violated → here is the spot → fix it or explicitly exempt it.

8) Coverage & scope: what “partial analysis” means

8.1 Why a report can be partial

analysis=partial means the PR includes files outside the analysis profile (currently CodeGuard is focused on Python). Examples: YAML, CSV, Markdown, assets, data, configs, etc.

8.2 How to read partial correctly

For the Python part: the report is complete (per-file risk, hotspots, etc.).
For out-of-profile files: CodeGuard is silent because it doesn’t analyze them.

8.3 What the reviewer should do

Review Python via the report (start with top hotspots).
Review out-of-profile files manually: format correctness, backward compatibility, migrations, project standards, secret safety in configs, etc.

8.4 Product note (if relevant)

If the project needs YAML/Markdown/CSV to also pass an “automatic radar”, that’s a separate roadmap: add analyzers for new types/languages and expand scope.

9) Metadata and service fields

Depending on integration/version, you may see:

Generated / timestamp (Markdown header)
repo_root, pr_number (JSON)
Markdown / JSON links (in Summary)
expires_at_unix_s (in Summary, if Full report links are time-limited): Unix time (seconds) when the tokenized link/artifact may expire.

10) Name mapping: Markdown ↔ JSON (common confusion)

Markdown struct_risk (PR Summary / Full report)
≈ JSON global_struct_risk
Markdown “Per-file structural risk”
≈ JSON file_struct_risk
Markdown “Hotspots” (top-N table)
≈ JSON hotspots (may be wider/richer in the full artifact)
Markdown analysis: full|partial
← derived from analysis_limits + fallback/limitations

11) Quick answers to “weird” values

Why is control_summary 0.0 everywhere?

Control channel is disabled, or
control regimes don’t stand out in this PR, or
analysis is partial and the relevant chunks weren’t analyzed.

Why are Effects/Taint empty in hotspots?

Semantic channel is disabled, or
heuristics found no signals in this window, or
“reviewer mode” is showing only structural hotspots.

Why are there “only 10 hotspots”, but I’m sure there are more anomalies?
That’s the top-N output limit in the PR comment. Open the Full report (Markdown/JSON).

Revieko Wiki