Brain Entrainment Audio — Implementation Guide

Jeremy Allen

Home


This guide describes a language-agnostic approach to generating long-form (1–3 hour) audio files that deliver 40 Hz gamma entrainment via the GENUS protocol. The system mixes a synthetic entrainment layer with a natural ambient soundscape and outputs an encoded audio file suitable for daily listening.


Background: What the System Is Doing

Daily exposure to auditory stimulation pulsed at 40 Hz elicits a measurable Auditory Steady-State Response (ASSR) in the brain. In primate studies (PNAS 2026), 1-hour daily sessions over 7 days produced a 205% increase in amyloid-beta clearance into cerebrospinal fluid — a mechanism of active interest in Alzheimer’s prevention research.

The “40 Hz” refers to the pulse or modulation rate, not the carrier frequency. Two delivery strategies are implemented:

Strategy Mechanism Hardware requirement
Sine additive A pure 40 Hz sine wave layered additively under ambient audio Premium headphones rated ≥35 Hz
Pink noise AM Pink noise (1/f) amplitude-modulated at 40 Hz Any hardware including earbuds, laptop speakers

The sine approach matches the exact protocol used in the 2026 MDPI/PNAS studies. The AM approach delivers the 40 Hz rate via modulation of audible-frequency noise, sidestepping sub-bass hardware limitations.


System Architecture

The pipeline has four independent components, each with a single responsibility:

Source audio file
        │
        ▼
 [Source Loader] ─────────────────────────────────┐
                                                   │
[Entrainment Generator]  (pink AM or sine)         │
        │                                          │
        └─────────────────────────────► [Mixer] ───┘
                                           │
                                    [Normalizer]
                                           │
                                      [Exporter]
                                           │
                                   Output audio file

Each stage operates on raw floating-point PCM arrays. No codec-specific logic lives inside the pipeline — encoding only happens at the final export stage.


Stage 1: Source Loader

The source loader decodes any ambient audio file (nature recordings, etc.) into a raw float32 stereo PCM array at a specified sample rate.

Critical requirement: The source audio must be resampled to exactly match the pipeline’s target sample rate (typically 44100 Hz). Nature recordings sourced from the web are commonly 48000 Hz. If the rates differ, the arrays will have different lengths and the mixer will produce incorrect output or crash.

Looping: If the source file is shorter than the requested render duration, it must be seamlessly looped by repeating the decoded array until it reaches the required length.

Interface:

load(path, sample_rate, duration_sec) → float32 stereo array of shape (N, 2)

Implementation notes:


Stage 2: Entrainment Generators

Both generators return a float32 stereo array of shape (N, 2) where N = floor(duration_sec × sample_rate). Values must be in [−1.0, 1.0] before mix-level scaling. The left and right channels are identical (mono-in-stereo).

Generator A: Sine Additive

The simplest possible approach. A pure, unmodulated sine wave at 40 Hz, RMS-normalized to a target dB level.

Formula:

t = linspace(0, duration_sec, N, endpoint=False)
mono = sin(2π × 40 × t)

RMS normalization:

target_rms = 10^(level_db / 20)
mono = mono × (target_rms / rms(mono))

Key parameters:

No modulation, no processing. Clean sine only.


Generator B: Pink Noise AM (Amplitude Modulation)

Pink noise amplitude-modulated by a sine wave envelope. The AM creates a 40 Hz “breathing pulse” detectable by the auditory system at any frequency range.

Step 1: Generate pink noise (Voss-McCartney algorithm)

The Voss-McCartney algorithm produces a 1/f (pink) spectrum by summing multiple rows of white noise, each updated at a different octave interval:

For each row r in 0..15:
    row r updates its value every 2^r samples
    between update points, the row holds its last value (forward-fill)

pink = sum of all 16 rows
pink = pink / max(|pink|)   ← normalize to [-1, 1]

Performance optimization — tile instead of generate full duration:
Voss-McCartney is expensive for long durations. Instead, generate exactly 60 seconds of pink noise (“the seed”), then tile it:

pink = tile(pink_seed_60s, ceil(N / len(seed)))[:N]

Because amplitude modulation is applied after tiling, the 60-second loop point is completely imperceptible — the envelope is unique across the full render duration.

Step 2: Apply amplitude modulation

Modulate with a smooth sine-wave envelope (not hard on/off clicks). The envelope has:

Formula:

floor = 1.0 − depth                               # = 0.2
modulator = floor + depth × 0.5 × (1 + sin(2π × 40 × t))
           = 0.6 + 0.4 × sin(2π × 40 × t)
           
           At sin = +1:  0.6 + 0.4 = 1.0  (ceiling)
           At sin = −1:  0.6 − 0.4 = 0.2  (floor)

modulated = pink × modulator

Step 3: RMS normalize

Same as sine additive — scale to level_db target RMS, clip to [−1.0, 1.0].

Key parameters:


Stage 3: Mixer

The mixer performs a weighted additive blend of all layers. Each layer is a (float32 array, linear weight) pair.

Formula:

output = sum(array_i × weight_i  for each layer i)

The mixer does not normalize — all levels are set by the caller via weights. Typical values:

Layer Weight
Pink noise AM 0.14
Nature source 0.80

Or for sine additive:

Layer Weight
40 Hz sine 0.08
Nature source 1.0

Hard requirement: All input arrays must have identical shapes before mixing. The loader and generators must both target the same (N, 2) dimensions.


Stage 4: Normalizer

After mixing, the combined signal is RMS-normalized to a final target level, then hard-limited to prevent clipping.

Algorithm:

  1. Compute RMS of the entire mixed array
  2. Scale so RMS equals target_rms = 10^(target_db / 20) (typically −14 dBFS)
  3. Hard-clip to ±peak_limit (typically ±0.95)
scale = target_rms / rms(mixed)
normalized = clip(mixed × scale, -0.95, +0.95)

The final encode step (AAC/Opus) may introduce minor inter-sample peaks above 0.95 during reconstruction — this is expected and not a problem at these levels.


Stage 5: Exporter

The exporter encodes the normalized float32 PCM array to a compressed audio container.

Approach:

  1. Write the float32 array to a temporary WAV file (PCM_16 format)
  2. Call ffmpeg to encode WAV → output container (AAC/M4A at 256 kbps recommended for 1-hour files)
  3. Delete the temporary WAV

The temporary WAV must be cleaned up even on failure (use a finally block or equivalent).

Output format choices:


Preset System

A preset file fully describes one render: which source file to use, how long to render, which generator layers to include, their parameters and mix weights, and where to write the output.

Example preset (TOML):

name         = "pink_40hz_river"
source       = "input/river.opus"
duration_sec = 3600          # 1 hour
sample_rate  = 44100
output       = "output/pink_40hz_river_1h.m4a"
output_bitrate = "256k"

[[layers]]
generator         = "pink_noise_am"
level             = 0.14               # mix weight
modulation_freq_hz = 40.0
modulation_depth  = 0.8
level_db          = -14.0

[[layers]]
generator = "source"
level     = 0.80                       # mix weight

The render entrypoint reads the preset, instantiates generators, calls each one, mixes, normalizes, and exports. A --test flag overrides duration_sec = 180 and writes to a scratch directory instead of the output directory, allowing fast iteration without a full 1-hour render.


Waveform Validation

An independent analysis script validates rendered files without importing any pipeline code. It decodes the output file via ffmpeg and runs the following checks:

Check What it measures Why it matters
A Sample rate, channel count, duration Detects resampling failures
B Peak amplitude ≤ 0.999 Detects normalization failure
C RMS between −16 and −8 dBFS Detects silent or over-loud output
D DC offset < ±0.01 Detects pink/brown noise filter drift
E (sine) FFT peak at target frequency ≥15 dB above spectral floor Confirms sine layer present and dominant
F (pink AM) Hilbert envelope FFT peak within ±0.5 Hz of target Confirms AM rate correct
G (pink AM) Modulation depth 0.30–0.90 Confirms modulation formula correct
H (pink AM) Modulation floor 0.02–0.50 normalized Confirms signal never fully silences
I (pink AM) Log-log spectral slope −1.8 to −0.3 Confirms pink noise (not white or brown)
J High-band (1k–8k Hz) within 25 dB of low-band Confirms nature track present
K Entrainment layer offset within expected range vs. mix RMS Confirms mix weights correct

The analysis uses:


Choosing a Strategy

Hardware Approach Reason
Premium over-ear headphones (Sony, Bose, Sennheiser) Sine additive Hardware reproduces 40 Hz; matches 2026 study protocol
AirPods / earbuds / laptop speakers Pink noise AM Sub-bass limited; AM delivers the 40 Hz rate at audible frequencies
Any target below 40 Hz (delta, theta) Pink noise AM Even premium headphones can’t reproduce 1–4 Hz sine
Unknown/mixed hardware Layer both Sine handles capable hardware, AM covers the rest

Citations

  1. Iaccarino, H.F. et al. (2016). “Gamma frequency entrainment attenuates amyloid load and modifies microglia.” Nature, 540, 230–235.
  2. Hu, X. et al. (2026). “Long-term effects of forty-hertz auditory stimulation as a treatment of Alzheimer’s disease: Insights from an aged monkey model study.” PNAS, 123(2).
  3. “Can Soundscapes Carry 40 Hz for Gamma Entrainment?: Evidence from a Pilot EEG Study.” MDPI Applied Sciences / Healthcare, 14(4), 512 (2026).

Our Implementation Details

Language and runtime: Python 3.12, managed with uv for fast, reproducible dependency installation.

Libraries:

Library Purpose
numpy >= 1.26 All PCM array generation, manipulation, FFT analysis
scipy >= 1.13 Hilbert transform, Butterworth bandpass filters, linear regression (analysis only)
soundfile >= 0.12 Writing intermediate WAV files before ffmpeg encode
tomllib Preset parsing (stdlib in Python 3.11+, no install needed)

ffmpeg and ffprobe are required as external system tools — they handle all codec I/O (decoding source audio, encoding output files, and analysis metadata reads). No Python audio codec library is used.

Memory: A 1-hour render at 44100 Hz stereo float32 is approximately 1.26 GB of raw PCM in memory at peak (two layers simultaneously allocated during mixing). A system with 4 GB of available RAM is comfortable; 8 GB or more is recommended if running multiple renders in parallel.

Pink noise generation: Two implementations of Voss-McCartney are maintained — a reference loop version and a vectorized fast version. The fast version builds each octave row via index-scatter and numpy.maximum.accumulate for forward-fill, then sums all rows. Generation time for the 60-second seed at 44100 Hz is under 1 second on typical hardware.

Render time: A full 1-hour render (pink AM + source layer) takes approximately 15–30 seconds on a modern CPU, dominated by ffmpeg’s source decode and final encode steps. The numpy pipeline itself is fast enough that it is not the bottleneck.

Output format: AAC in an M4A container at 256 kbps. This is the preferred format for broad device compatibility (iOS, Windows, macOS, Android). Opus is also supported for smaller file sizes where compatibility is not a concern.

Test workflow: A --test flag renders 3 minutes to a scratch directory, enabling rapid iteration without waiting for a full 1-hour encode. The analysis script is always run on test output before any long render.