Jeremy Allen
This guide describes a language-agnostic approach to generating long-form (1–3 hour) audio files that deliver 40 Hz gamma entrainment via the GENUS protocol. The system mixes a synthetic entrainment layer with a natural ambient soundscape and outputs an encoded audio file suitable for daily listening.
Daily exposure to auditory stimulation pulsed at 40 Hz elicits a measurable Auditory Steady-State Response (ASSR) in the brain. In primate studies (PNAS 2026), 1-hour daily sessions over 7 days produced a 205% increase in amyloid-beta clearance into cerebrospinal fluid — a mechanism of active interest in Alzheimer’s prevention research.
The “40 Hz” refers to the pulse or modulation rate, not the carrier frequency. Two delivery strategies are implemented:
| Strategy | Mechanism | Hardware requirement |
|---|---|---|
| Sine additive | A pure 40 Hz sine wave layered additively under ambient audio | Premium headphones rated ≥35 Hz |
| Pink noise AM | Pink noise (1/f) amplitude-modulated at 40 Hz | Any hardware including earbuds, laptop speakers |
The sine approach matches the exact protocol used in the 2026 MDPI/PNAS studies. The AM approach delivers the 40 Hz rate via modulation of audible-frequency noise, sidestepping sub-bass hardware limitations.
The pipeline has four independent components, each with a single responsibility:
Source audio file
│
▼
[Source Loader] ─────────────────────────────────┐
│
[Entrainment Generator] (pink AM or sine) │
│ │
└─────────────────────────────► [Mixer] ───┘
│
[Normalizer]
│
[Exporter]
│
Output audio file
Each stage operates on raw floating-point PCM arrays. No codec-specific logic lives inside the pipeline — encoding only happens at the final export stage.
The source loader decodes any ambient audio file (nature recordings, etc.) into a raw float32 stereo PCM array at a specified sample rate.
Critical requirement: The source audio must be resampled to exactly match the pipeline’s target sample rate (typically 44100 Hz). Nature recordings sourced from the web are commonly 48000 Hz. If the rates differ, the arrays will have different lengths and the mixer will produce incorrect output or crash.
Looping: If the source file is shorter than the requested render duration, it must be seamlessly looped by repeating the decoded array until it reaches the required length.
Interface:
load(path, sample_rate, duration_sec) → float32 stereo array of shape (N, 2)
Implementation notes:
ffmpeg, piping raw
float32 little-endian PCM to stdout:
ffmpeg -i <path> -ar <sr> -ac 2 -f f32le -(N, 2)tile(arr, ceil(N / len(arr)))[:N]Both generators return a float32 stereo array of shape
(N, 2) where
N = floor(duration_sec × sample_rate). Values must be in
[−1.0, 1.0] before mix-level scaling. The left and right
channels are identical (mono-in-stereo).
The simplest possible approach. A pure, unmodulated sine wave at 40 Hz, RMS-normalized to a target dB level.
Formula:
t = linspace(0, duration_sec, N, endpoint=False)
mono = sin(2π × 40 × t)
RMS normalization:
target_rms = 10^(level_db / 20)
mono = mono × (target_rms / rms(mono))
Key parameters:
freq_hz: entrainment frequency (40.0 for gamma)level_db: RMS target, typically −12 dBFSNo modulation, no processing. Clean sine only.
Pink noise amplitude-modulated by a sine wave envelope. The AM creates a 40 Hz “breathing pulse” detectable by the auditory system at any frequency range.
The Voss-McCartney algorithm produces a 1/f (pink) spectrum by summing multiple rows of white noise, each updated at a different octave interval:
For each row r in 0..15:
row r updates its value every 2^r samples
between update points, the row holds its last value (forward-fill)
pink = sum of all 16 rows
pink = pink / max(|pink|) ← normalize to [-1, 1]
Performance optimization — tile instead of generate full
duration:
Voss-McCartney is expensive for long durations. Instead, generate
exactly 60 seconds of pink noise (“the seed”), then tile it:
pink = tile(pink_seed_60s, ceil(N / len(seed)))[:N]
Because amplitude modulation is applied after tiling, the 60-second loop point is completely imperceptible — the envelope is unique across the full render duration.
Modulate with a smooth sine-wave envelope (not hard on/off clicks). The envelope has:
Formula:
floor = 1.0 − depth # = 0.2
modulator = floor + depth × 0.5 × (1 + sin(2π × 40 × t))
= 0.6 + 0.4 × sin(2π × 40 × t)
At sin = +1: 0.6 + 0.4 = 1.0 (ceiling)
At sin = −1: 0.6 − 0.4 = 0.2 (floor)
modulated = pink × modulator
Same as sine additive — scale to level_db target RMS,
clip to [−1.0, 1.0].
Key parameters:
modulation_freq_hz: AM rate (40.0 for gamma)modulation_depth: 0.8 (80%)level_db: RMS target of the pink layer alone, typically
−14 dBFSseed_duration_sec: 60 seconds (performance
optimization; never change mid-render)The mixer performs a weighted additive blend of all layers. Each
layer is a (float32 array, linear weight) pair.
Formula:
output = sum(array_i × weight_i for each layer i)
The mixer does not normalize — all levels are set by the caller via weights. Typical values:
| Layer | Weight |
|---|---|
| Pink noise AM | 0.14 |
| Nature source | 0.80 |
Or for sine additive:
| Layer | Weight |
|---|---|
| 40 Hz sine | 0.08 |
| Nature source | 1.0 |
Hard requirement: All input arrays must have
identical shapes before mixing. The loader and generators must both
target the same (N, 2) dimensions.
After mixing, the combined signal is RMS-normalized to a final target level, then hard-limited to prevent clipping.
Algorithm:
target_rms = 10^(target_db / 20)
(typically −14 dBFS)±peak_limit (typically ±0.95)scale = target_rms / rms(mixed)
normalized = clip(mixed × scale, -0.95, +0.95)
The final encode step (AAC/Opus) may introduce minor inter-sample peaks above 0.95 during reconstruction — this is expected and not a problem at these levels.
The exporter encodes the normalized float32 PCM array to a compressed audio container.
Approach:
ffmpeg to encode WAV → output container (AAC/M4A
at 256 kbps recommended for 1-hour files)The temporary WAV must be cleaned up even on failure (use a
finally block or equivalent).
Output format choices:
.m4a (AAC) — wide compatibility, excellent quality at
256k.opus — smaller file, slightly less device
compatibilityA preset file fully describes one render: which source file to use, how long to render, which generator layers to include, their parameters and mix weights, and where to write the output.
Example preset (TOML):
name = "pink_40hz_river"
source = "input/river.opus"
duration_sec = 3600 # 1 hour
sample_rate = 44100
output = "output/pink_40hz_river_1h.m4a"
output_bitrate = "256k"
[[layers]]
generator = "pink_noise_am"
level = 0.14 # mix weight
modulation_freq_hz = 40.0
modulation_depth = 0.8
level_db = -14.0
[[layers]]
generator = "source"
level = 0.80 # mix weightThe render entrypoint reads the preset, instantiates generators,
calls each one, mixes, normalizes, and exports. A --test
flag overrides duration_sec = 180 and writes to a scratch
directory instead of the output directory, allowing fast iteration
without a full 1-hour render.
An independent analysis script validates rendered files without importing any pipeline code. It decodes the output file via ffmpeg and runs the following checks:
| Check | What it measures | Why it matters |
|---|---|---|
| A | Sample rate, channel count, duration | Detects resampling failures |
| B | Peak amplitude ≤ 0.999 | Detects normalization failure |
| C | RMS between −16 and −8 dBFS | Detects silent or over-loud output |
| D | DC offset < ±0.01 | Detects pink/brown noise filter drift |
| E (sine) | FFT peak at target frequency ≥15 dB above spectral floor | Confirms sine layer present and dominant |
| F (pink AM) | Hilbert envelope FFT peak within ±0.5 Hz of target | Confirms AM rate correct |
| G (pink AM) | Modulation depth 0.30–0.90 | Confirms modulation formula correct |
| H (pink AM) | Modulation floor 0.02–0.50 normalized | Confirms signal never fully silences |
| I (pink AM) | Log-log spectral slope −1.8 to −0.3 | Confirms pink noise (not white or brown) |
| J | High-band (1k–8k Hz) within 25 dB of low-band | Confirms nature track present |
| K | Entrainment layer offset within expected range vs. mix RMS | Confirms mix weights correct |
The analysis uses:
ffprobe for container metadata (sample rate, channels,
duration)| Hardware | Approach | Reason |
|---|---|---|
| Premium over-ear headphones (Sony, Bose, Sennheiser) | Sine additive | Hardware reproduces 40 Hz; matches 2026 study protocol |
| AirPods / earbuds / laptop speakers | Pink noise AM | Sub-bass limited; AM delivers the 40 Hz rate at audible frequencies |
| Any target below 40 Hz (delta, theta) | Pink noise AM | Even premium headphones can’t reproduce 1–4 Hz sine |
| Unknown/mixed hardware | Layer both | Sine handles capable hardware, AM covers the rest |
Language and runtime: Python 3.12, managed with uv for fast, reproducible dependency installation.
Libraries:
| Library | Purpose |
|---|---|
numpy >= 1.26 |
All PCM array generation, manipulation, FFT analysis |
scipy >= 1.13 |
Hilbert transform, Butterworth bandpass filters, linear regression (analysis only) |
soundfile >= 0.12 |
Writing intermediate WAV files before ffmpeg encode |
tomllib |
Preset parsing (stdlib in Python 3.11+, no install needed) |
ffmpeg and ffprobe are required as external
system tools — they handle all codec I/O (decoding source audio,
encoding output files, and analysis metadata reads). No Python audio
codec library is used.
Memory: A 1-hour render at 44100 Hz stereo float32 is approximately 1.26 GB of raw PCM in memory at peak (two layers simultaneously allocated during mixing). A system with 4 GB of available RAM is comfortable; 8 GB or more is recommended if running multiple renders in parallel.
Pink noise generation: Two implementations of
Voss-McCartney are maintained — a reference loop version and a
vectorized fast version. The fast version builds each octave row via
index-scatter and numpy.maximum.accumulate for
forward-fill, then sums all rows. Generation time for the 60-second seed
at 44100 Hz is under 1 second on typical hardware.
Render time: A full 1-hour render (pink AM + source layer) takes approximately 15–30 seconds on a modern CPU, dominated by ffmpeg’s source decode and final encode steps. The numpy pipeline itself is fast enough that it is not the bottleneck.
Output format: AAC in an M4A container at 256 kbps. This is the preferred format for broad device compatibility (iOS, Windows, macOS, Android). Opus is also supported for smaller file sizes where compatibility is not a concern.
Test workflow: A --test flag renders 3
minutes to a scratch directory, enabling rapid iteration without waiting
for a full 1-hour encode. The analysis script is always run on test
output before any long render.