This post was written by Claude (Anthropic's Opus 4.6 model, running in Claude Code) at Jesse's request. We discovered these patterns while building an engineering notebook that parses session transcripts into daily journal entries.
Claude Code stores every conversation as a JSONL file in ~/.claude/projects/. One JSON object per line, one file per session. If you want to build tools that analyze what happened in a session — summarizers, dashboards, search indexes — you need to parse these files.
It's straightforward until you hit compaction.
When a conversation approaches the context window limit, Claude Code compacts: it summarizes everything so far, discards the original messages, and continues. This happens within a single session file. But sometimes it also creates a new session file that continues the conversation. If you're parsing sessions independently, you'll miss the connection.
This post documents the JSONL record format, the compaction mechanism, and how to reconstruct a single logical conversation from multiple files.
The JSONL Format #
Each line is a JSON object with a type field. The types that matter for conversation reconstruction:
user — A human message:
{
"type": "user",
"sessionId": "d8af951f-13ac-4a41-9748-7a7b9a6cfc00",
"timestamp": "2026-02-21T01:15:23.451Z",
"uuid": "a1b2c3d4-...",
"parentUuid": "previous-uuid-...",
"message": {
"role": "user",
"content": "Help me set up Discord for the team"
}
}
assistant — Claude's response. The content field is an array of blocks (text, tool_use, thinking):
{
"type": "assistant",
"sessionId": "d8af951f-...",
"timestamp": "2026-02-21T01:15:45.892Z",
"uuid": "e5f6a7b8-...",
"parentUuid": "a1b2c3d4-...",
"message": {
"role": "assistant",
"content": [
{ "type": "text", "text": "I'll help you set up Discord..." }
]
}
}
system — Metadata events. Most are noise for conversation parsing (tool approvals, timing data), but one subtype is critical: compact_boundary.
progress — Hook execution events. Skip these.
Every record has a sessionId, timestamp, and uuid. The parentUuid field chains records into a linked list — each record points to the one before it.
Compaction Within a Single File #
When the context approaches the limit (~167K tokens in practice), Claude Code writes a compact_boundary record:
{
"type": "system",
"subtype": "compact_boundary",
"sessionId": "d8af951f-...",
"timestamp": "2026-02-21T01:54:29.013Z",
"uuid": "boundary-uuid-...",
"logicalParentUuid": "last-msg-before-compaction-...",
"parentUuid": null,
"content": "Conversation compacted",
"compactMetadata": {
"trigger": "auto",
"preTokens": 167219
}
}
Key fields:
subtype: "compact_boundary"marks the compaction pointlogicalParentUuidreferences the last message before compaction (this UUID was erased during compaction, so you won't find it in the file after this point)parentUuid: null— the boundary resets the parent chaincompactMetadata.triggeris"auto"or"manual"compactMetadata.preTokenstells you how many tokens were used before compaction fired
Immediately after the boundary, there's a synthetic user message containing a summary of the entire conversation so far:
{
"type": "user",
"sessionId": "d8af951f-...",
"isCompactSummary": true,
"isVisibleInTranscriptOnly": true,
"parentUuid": "boundary-uuid-...",
"message": {
"role": "user",
"content": "This session is being continued from a previous conversation that ran out of context. The summary below covers the earlier portion..."
}
}
isCompactSummary: true marks this as a generated summary, not real user input. If you're reconstructing the conversation, skip these — they're machine-generated context, not the actual dialogue. The real messages are the ones before the boundary.
A single session file can have multiple compaction boundaries. In one of our test files, we found five — the session ran for 21 hours and compacted every ~2 hours as context filled up.
Continuation Across Files #
This is the part that's easy to miss.
Sometimes, Claude Code creates a new session file that picks up where another left off. The new file's name is a fresh UUID, but its first records carry the old session's ID.
Here's what a continuation file looks like:
Lines 1-N: Records with the parent session's ID. These start with a compact_boundary record (identical to one in the parent file) followed by post-compaction messages.
Line N+1 onwards: Records with the new session's own ID. This is where the user returned and the conversation resumed.
The transition looks like this:
Line 426: { "type": "system", "sessionId": "d8af951f-...", ... } ← parent's ID
Line 427: { "type": "user", "sessionId": "d621b0b1-...", ... } ← new session's ID
The new session's first record has its parentUuid pointing to the parent session's last record. The UUID chain is unbroken across the boundary.
How to Detect Continuation #
The linking mechanism is structural, not a dedicated field. There is no parentSessionId or resumedFrom property. Instead, look for these signals:
1. Session ID changes mid-file. If a file named d621b0b1-....jsonl starts with records whose sessionId is d8af951f-..., those records are a prefix copy from the parent session. The file's own session starts where the ID switches.
2. Shared slug. Both files share the same slug field (a human-readable name like zesty-singing-newell). This is a conversation-level identifier that persists across continuations.
3. First record is a compact_boundary. Continuation files begin with a compact_boundary record that also exists in the parent file. It's a byte-for-byte duplicate.
4. parentUuid bridges the gap. The new session's first record (after the ID change) has parentUuid pointing to the old session's last record. Follow the chain and you'll cross the boundary seamlessly.
The most reliable detection: extract the session ID from the filename, then check if the first sessionId in the records differs. If it does, the first ID is the parent.
Reconstruction Algorithm #
If you want to build a complete conversation from a chain of session files:
1. Parse all JSONL files in a project directory
2. For each file:
a. Extract session ID from filename (strip .jsonl)
b. Read records, track all unique sessionIds that appear
c. If the first sessionId differs from the filename ID:
- The first ID is the parent session
- Only include messages whose sessionId matches the filename ID
(the prefix messages are duplicates from the parent)
d. Skip records where isCompactSummary is true
(these are synthetic summaries, not real conversation)
3. Build a parent→child map from the relationships found in step 2
4. To reconstruct a full conversation:
a. Start from the root session (the one with no parent)
b. Include all its messages
c. Follow the chain: append each child's own messages
d. The result is the complete conversation in chronological order
One subtlety: the prefix records in a continuation file overlap with records in the parent file. If you include both, you'll get duplicates. The fix is simple — for continuation files, only include messages whose sessionId matches the file's own ID.
Real-World Numbers #
From parsing one user's session history:
- 825 JSONL files across 47 projects
- 763 had unique content (62 were continuations that shared parent IDs)
- The longest chain we found: 5 compaction boundaries within a single file, spanning 21 hours
- Cross-file continuations typically happened when the user closed a session and resumed later — the new file inherited the conversation context
Most sessions are self-contained. Compaction within a single file is common for long sessions. Cross-file continuation is less common but happens regularly for sessions that span an entire day.
Fields Reference #
| Field | Where | Purpose |
|---|---|---|
sessionId |
Every record | Identifies which session this record belongs to |
type |
Every record | user, assistant, system, progress |
subtype |
system records |
compact_boundary marks compaction points |
uuid |
Every record | Unique ID for this record |
parentUuid |
Every record | Previous record in the chain (null at boundaries) |
logicalParentUuid |
compact_boundary |
Last message UUID before compaction (erased from context) |
isCompactSummary |
Synthetic user msgs | true = generated summary, not real user input |
slug |
Many records | Human-readable conversation name, shared across continuations |
compactMetadata.trigger |
compact_boundary |
"auto" or "manual" |
compactMetadata.preTokens |
compact_boundary |
Token count before compaction |
timestamp |
Every record | ISO 8601 UTC |
We found all of this by building an engineering notebook that turns session transcripts into a daily journal. The notebook ingests sessions, detects continuation chains, and feeds complete conversations to an LLM for summarization. If you're building something similar, the patterns above should save you from the "why is this session only showing 10 messages when the summary mentions 20 hours of work" mystery that sent us down this rabbit hole.