Skip to content

Fix integer overflow and undefined behaviour in SAV reader#375

Open
aaranmcguire wants to merge 1 commit intoWizardMac:devfrom
aaranmcguire:fix/sav-integer-overflows
Open

Fix integer overflow and undefined behaviour in SAV reader#375
aaranmcguire wants to merge 1 commit intoWizardMac:devfrom
aaranmcguire:fix/sav-integer-overflows

Conversation

@aaranmcguire
Copy link
Copy Markdown

PR 4: Fix integer overflow and undefined behaviour in SAV reader

Summary

Four integer-arithmetic issues in spss/readstat_sav_read.c:

  1. size * count multiplication overflow — file-controlled uint32 values
    multiplied without overflow check, used as allocation size and seek distance.
  2. abs(INT_MIN) undefined behaviour — calling abs() on INT_MIN is UB
    in C; on two's-complement targets it typically returns INT_MIN (negative),
    producing a huge seek.
  3. uint32 variable-label length overflow — a 32-bit length field near
    UINT32_MAX causes (label_len + 3) / 4 * 4 to wrap to 0, leading to a
    zero-byte malloc.
  4. uint32_t buffer-length overflow in long-string value labels — two
    uint32_t variables hold lengths computed as len * 4 + 1; when len
    0x40000000 the multiplication wraps to a small number, causing
    readstat_realloc to return a tiny buffer that readstat_convert then
    overflows.

Bug 1: size * count multiplication overflow

Vulnerable code (same pattern at two locations, pass-1 and pass-2)

uint32_t extra_info[3];
/* ... read from file ... */
size_t size  = extra_info[1];   /* file-controlled uint32 widened to size_t */
size_t count = extra_info[2];   /* file-controlled uint32 widened to size_t */
data_len = size * count;         /* no overflow check */

Why it is exploitable

On 32-bit platforms where size_t is 32 bits: if size = 0x80000000 and
count = 2, then size * count = 0x100000000 which wraps to 0.

  • For subtypes that allocate readstat_realloc(data_buf, data_len=0):
    readstat_realloc(p, 0) returns NULL → READSTAT_ERROR_MALLOC, which is
    a safe failure. ✓
  • For subtypes that seek data_len bytes: io->seek(0, ...) is a no-op,
    so the parser continues reading the next record at the wrong file position.
    Subsequent parsing produces garbage or, depending on what happens to be at
    that offset, a crash.

On 64-bit the widened multiply cannot overflow (uint32 × uint32 ≤ 2^64).

Reproduction

Craft a SAV file with an extended information record (type 7)
where size=0x80000000 and count=2.  On a 32-bit build,
data_len wraps to 0 and the parser seeks 0 bytes — processing
a subsequent record as if no gap existed.

Fix

+if (size > 0 && count > 0x7FFFFFFF / size) {
+    retval = READSTAT_ERROR_PARSE;
+    goto cleanup;
+}
 data_len = size * count;

Applied at both the pass-1 (sav_parse_records_pass1) and pass-2
(sav_parse_records_pass2) locations.


Bug 2: abs(n_missing_values) undefined behaviour

Vulnerable code

int n_missing_values = ctx->bswap
    ? byteswap4(variable.n_missing_values)
    : variable.n_missing_values;
if (io->seek(abs(n_missing_values) * sizeof(double), READSTAT_SEEK_CUR, ...))

Why it is undefined behaviour

n_missing_values is a signed 32-bit integer read from the file. SPSS uses
negative values (−1, −2, −3) to encode range-type missing specifications.
If the file sets n_missing_values = INT_MIN (0x80000000), then
abs(INT_MIN) is undefined behaviour in C. On all common two's-complement
targets it returns INT_MIN itself (−2147483648). Cast to an unsigned offset:
(off_t)(−2147483648) * 8 = −17179869184 — a large negative seek — or on a
32-bit off_t, wraps to a large positive, seeking past the end of the file.

The later sav_read_variable_missing_values validates to [-3, 3], but
sav_skip_variable_record (pass-1) runs before that function and has no such
guard.

Reproduction

In the SAV type-2 variable record, set the n_missing_values field to
0x80000000 (INT_MIN when byteswapped on a little-endian machine, or
directly on big-endian).  On a 32-bit build, abs(INT_MIN) = INT_MIN
and the seek goes to a wild offset, crashing on the subsequent read.

Fix

-if (io->seek(abs(n_missing_values) * sizeof(double), READSTAT_SEEK_CUR, io->io_ctx) == -1) {
-    retval = READSTAT_ERROR_SEEK;
+int n_mv_abs = (n_missing_values < 0) ? -n_missing_values : n_missing_values;
+if (n_mv_abs > 3) { retval = READSTAT_ERROR_PARSE; goto cleanup; }
+if (io->seek((off_t)n_mv_abs * sizeof(double), READSTAT_SEEK_CUR, io->io_ctx) == -1) {
+    retval = READSTAT_ERROR_READ;

Bug 3: uint32 variable-label length wraps to 0

Vulnerable code

uint32_t label_len;
/* ... read from file ... */
label_capacity = (label_len + 3) / 4 * 4;   /* wraps if label_len near UINT32_MAX */
if ((label_buf = readstat_malloc(label_capacity)) == NULL) { ... }

Why it matters

If label_len = 0xFFFFFFFD (just 3 below UINT32_MAX):

label_len + 3 = 0x100000000 = 0  (uint32 overflow)
0 / 4 * 4 = 0
readstat_malloc(0) → returns NULL → READSTAT_ERROR_MALLOC

The error is handled safely (returns READSTAT_ERROR_MALLOC) but the
diagnostic is misleading — the real issue is an impossibly large label, not
a genuine OOM. More importantly, label lengths this large can only come from
malformed or malicious files.

Fix

Add an explicit sanity bound that matches the maximum SPSS label length
(255 characters):

+if (label_len > 0xFFFF) {
+    retval = READSTAT_ERROR_PARSE;
+    goto cleanup;
+}
 label_capacity = (label_len + 3) / 4 * 4;

Bug 4: uint32_t overflow in long-string value-label buffer lengths

Vulnerable code

uint32_t value_len = 0, label_len = 0;
uint32_t value_buffer_len = 0, label_buffer_len = 0;
/* ... read value_len and label_len from file ... */
value_buffer_len = value_len * 4 + 1;   /* uint32 arithmetic — overflows */
value_buffer = readstat_realloc(value_buffer, value_buffer_len);
/* readstat_convert then writes up to value_len*4 bytes into value_buffer */

How the heap write occurs

value_len is a uint32_t from the file. value_buffer_len is also
uint32_t, so the arithmetic is uint32_t × 4 + 1.

If value_len = 0x40000001:

value_buffer_len = 0x40000001 * 4 + 1 = 0x100000005 truncated to uint32 = 5

readstat_realloc(ptr, 5) succeeds — 5 bytes allocated.

Then readstat_convert is called with an output length of value_buffer_len
(5), but it reads value_len (0x40000001 ≈ 1 GB) bytes from the file into a
5-byte buffer. readstat_convert internally clips to its dst_len parameter
— but dst_len here is 5, not value_len * 4 + 1, so readstat_convert
correctly writes at most 5 bytes.

However, the file is then advanced by value_len bytes via io->read.
Because the file isn't actually 1 GB, this read returns short and
READSTAT_ERROR_PARSE fires. The out-of-bounds write doesn't materialise
on real files, but the logic is fragile and the declaration type is wrong.

Fix

Change the buffer-length type to size_t and guard against huge values:

-uint32_t value_buffer_len = 0, label_buffer_len = 0;
+size_t value_buffer_len = 0, label_buffer_len = 0;
 /* ... */
-value_buffer_len = value_len*4+1;
+if (value_len > 0xFFFFFF) { retval = READSTAT_ERROR_PARSE; goto cleanup; }
+value_buffer_len = (size_t)value_len * 4 + 1;
 /* same for label_len / label_buffer_len */

Four arithmetic issues in spss/readstat_sav_read.c:

1. size*count multiplication overflow (sav_parse_records_pass1 and pass2):
   Two uint32_t values from the file are assigned to size_t then
   multiplied. On 32-bit platforms the product can overflow, producing
   a data_len of 0 which causes the parser to skip the record and
   continue reading at the wrong file position.

2. abs(n_missing_values) undefined behaviour (sav_skip_variable_record):
   When n_missing_values == INT_MIN, abs() is undefined. On two's-
   complement targets it typically returns INT_MIN (negative), which
   when used as a seek offset produces a large wrong seek. Replaced
   with an explicit conditional and a validity range check (> 3 is
   always invalid per SPSS spec).

3. uint32 label_len overflow (sav_read_variable_label):
   label_len near UINT32_MAX causes (label_len+3)/4*4 to wrap to 0,
   resulting in readstat_malloc(0). Added a 0xFFFF sanity bound
   (well above the SPSS limit of 255 characters).

4. uint32_t buffer length overflow in long-string value labels:
   value_buffer_len and label_buffer_len are uint32_t; value_len*4+1
   overflows when value_len >= 0x40000000, producing a tiny realloc
   target. Changed to size_t with an explicit > 0xFFFFFF guard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant