Chapter 9 · Lesson 2

Unions & Complex Types

Union Syntax and Memory Layout

A union is like a struct, but all members occupy the same memory location. Only one member holds a valid value at any time:

union Data {
    int   i;
    float f;
    char  c[4];
};

The size of a union is the size of its largest member (plus any padding needed for alignment). All members start at the same address. Writing through one member and reading through another is well-defined only for the char array case or for the type-punning use case described below.

Tagged Unions (Discriminated Unions)

A union alone does not track which member is currently active. The classic C pattern wraps a union in a struct along with an enum tag:

typedef struct {
    enum { TAG_INT, TAG_FLOAT, TAG_STRING } tag;
    union {
        int   i;
        float f;
        char *s;
    } value;
} Variant;

This pattern implements a dynamically-typed value — the same technique used internally by JSON parsers, interpreters, and virtual machines.

Use Cases

Memory-efficient storage: When you have a large collection of items that are each one of several types, a tagged union uses only the memory of the largest variant rather than allocating separate fields for every possible type.

Type punning: Reading the bytes of a float through a char array or an integer union member gives you access to the raw bit representation. C11 guarantees this is well-defined; older standards technically required using memcpy instead (and some compilers still do).

union FloatBits { float f; unsigned int u; };
union FloatBits fb = { .f = 1.0f };
printf("bits of 1.0f = 0x%08X\n", fb.u);  /* 0x3F800000 */

Flexible Array Members (C99)

A struct can end with an incomplete array member of unspecified size. The array's space is not counted in sizeof, and you must allocate extra space for it with malloc:

typedef struct {
    int  count;
    int  data[];   /* flexible array member — must be last */
} IntArray;

IntArray *arr = malloc(sizeof(IntArray) + 5 * sizeof(int));
arr->count = 5;

Anonymous Structs and Unions (C11)

C11 allows a struct or union member that has no name. Its sub-members are accessed directly on the containing type:

typedef struct {
    int type;
    union {            /* anonymous union — no member name */
        int   ival;
        float fval;
    };
} Value;

Value v;
v.type = 0;
v.ival = 42;   /* accessed directly, not v.u.ival */

Anonymous structs/unions reduce nesting in code that deals with complex data layouts, such as SIMD vector types and network packet headers.

Code Examples

Union Size Demonstrationc

#include <stdio.h>
#include <string.h>

union Sampler {
    char     c;        /* 1 byte  */
    short    s;        /* 2 bytes */
    int      i;        /* 4 bytes */
    double   d;        /* 8 bytes */
    char     bytes[8]; /* 8 bytes — for raw access */
};

void print_bytes(const unsigned char *p, int n) {
    for (int i = 0; i < n; i++) printf("%02X ", p[i]);
    printf("\n");
}

int main(void) {
    printf("Sizes of individual members:\n");
    printf("  char:   %zu byte(s)\n", sizeof(char));
    printf("  short:  %zu byte(s)\n", sizeof(short));
    printf("  int:    %zu byte(s)\n", sizeof(int));
    printf("  double: %zu byte(s)\n", sizeof(double));
    printf("\nsizeof(union Sampler) = %zu bytes (= largest member)\n\n",
           sizeof(union Sampler));

    union Sampler u;

    /* Write an int, read the raw bytes */
    u.i = 0x12345678;
    printf("u.i = 0x12345678, raw bytes: ");
    print_bytes((unsigned char *)&u, 4);

    /* Write a double, all 8 bytes change */
    u.d = 3.14;
    printf("u.d = 3.14,        raw bytes: ");
    print_bytes((unsigned char *)&u, 8);

    /* Write a char — only first byte changes */
    u.c = 'A';
    printf("u.c = 'A',         raw bytes: ");
    print_bytes((unsigned char *)&u, 8);
    printf("  (only byte 0 changed; rest are remnants of u.d)\n");

    return 0;
}

The union size is determined by the double member (8 bytes). All members share the same 8 bytes of storage — writing u.i only changes the first 4 bytes; writing u.c only changes the first byte. The output clearly shows that after u.c = 'A', bytes 1-7 still hold the remnants of u.d = 3.14.

Tagged Union (Variant Type)c

#include <stdio.h>
#include <string.h>

/* A dynamically-typed value — the pattern used in JSON parsers, scripting engines */
typedef enum {
    VAL_NULL,
    VAL_BOOL,
    VAL_INT,
    VAL_FLOAT,
    VAL_STRING
} ValType;

typedef struct {
    ValType type;
    union {
        int    boolean;   /* 0 = false, 1 = true  */
        long   integer;
        double floating;
        char   string[32];
    } as;
} Value;

/* Constructors */
Value val_null(void)         { return (Value){ VAL_NULL }; }
Value val_bool(int b)        { Value v = {VAL_BOOL}; v.as.boolean = !!b; return v; }
Value val_int(long i)        { Value v = {VAL_INT}; v.as.integer = i; return v; }
Value val_float(double d)    { Value v = {VAL_FLOAT}; v.as.floating = d; return v; }
Value val_string(const char *s) {
    Value v = {VAL_STRING};
    strncpy(v.as.string, s, sizeof(v.as.string) - 1);
    return v;
}

void print_value(const Value *v) {
    switch (v->type) {
        case VAL_NULL:    printf("null");               break;
        case VAL_BOOL:    printf("%s", v->as.boolean ? "true" : "false"); break;
        case VAL_INT:     printf("%ld", v->as.integer); break;
        case VAL_FLOAT:   printf("%.6g", v->as.floating); break;
        case VAL_STRING:  printf("\"%s\"", v->as.string); break;
    }
}

int main(void) {
    Value vals[] = {
        val_null(),
        val_bool(1),
        val_int(42),
        val_float(3.14159),
        val_string("hello")
    };
    const char *type_names[] = {"null", "bool", "int", "float", "string"};

    printf("%-8s  %s\n", "Type", "Value");
    printf("%-8s  %s\n", "--------", "-------");
    for (int i = 0; i < 5; i++) {
        printf("%-8s  ", type_names[vals[i].type]);
        print_value(&vals[i]);
        printf("\n");
    }

    printf("\nsizeof(Value) = %zu bytes\n", sizeof(Value));
    return 0;
}

The tagged union uses a ValType enum to track which union member is currently active. Each constructor sets both the tag and the appropriate union member. print_value switches on the tag to read the correct member. This is the core pattern behind dynamically-typed value representations in C.

Type Punning with Union (Float Bits)c

#include <stdio.h>
#include <stdint.h>

/* C11 guarantees union type punning is well-defined.
   For older standards, use memcpy instead for strict portability. */
union FloatBits {
    float    f;
    uint32_t u;
    struct {
        uint32_t mantissa : 23;
        uint32_t exponent : 8;
        uint32_t sign     : 1;
    } parts;  /* WARNING: bit field layout is implementation-defined */
};

void inspect_float(float value) {
    union FloatBits fb = { .f = value };
    printf("%.6g:\n", value);
    printf("  hex bits : 0x%08X\n", fb.u);
    printf("  sign     : %u\n", fb.parts.sign);
    printf("  exponent : %u (biased), actual = %d\n",
           fb.parts.exponent, (int)fb.parts.exponent - 127);
    printf("  mantissa : 0x%06X\n", fb.parts.mantissa);
}

/* Fast inverse square root (famous Quake III algorithm) */
float fast_inv_sqrt(float number) {
    union { float f; uint32_t i; } y = { .f = number };
    y.i = 0x5F3759DFu - (y.i >> 1);   /* Newton's method seed via bit magic */
    y.f = y.f * (1.5f - 0.5f * number * y.f * y.f);   /* one Newton iteration */
    return y.f;
}

int main(void) {
    inspect_float(1.0f);
    printf("\n");
    inspect_float(-0.5f);
    printf("\n");
    inspect_float(3.14159265f);

    printf("\nFast inverse square root:\n");
    float tests[] = {4.0f, 9.0f, 16.0f, 100.0f};
    for (int i = 0; i < 4; i++) {
        float v = tests[i];
        printf("  1/sqrt(%.0f) = approx %.6f  (exact %.6f)\n",
               v, fast_inv_sqrt(v), 1.0f / __builtin_sqrtf(v));
    }
    return 0;
}

IEEE 754 single precision: 1 sign bit, 8 exponent bits (biased by 127), 23 mantissa bits. Union punning lets us inspect the raw bits. The famous Quake fast inverse square root uses the integer representation of a float to compute an excellent starting approximation for Newton's method — a beautiful example of low-level bit manipulation enabling practical optimisation.

Quick Quiz

1. What is sizeof(union U) if U has members: char c, int i, double d on a typical 64-bit system?

2. What is the purpose of the enum tag in a tagged union pattern?

3. What must be true about a flexible array member in a C99 struct?

Was this lesson helpful?