Need help with julius?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

228 Stars 10 Forks MIT License 64 Commits 2 Opened issues


Fast PyTorch based DSP for audio and 1D signals

Services available


Need anything else?

Contributors list

# 47,272
51 commits
# 15,582
3 commits

Julius, fast PyTorch based DSP for audio and 1D signals

linter badge tests badge cov badge

Julius contains different Digital Signal Processing algorithms implemented with PyTorch, so that they are differentiable and available on CUDA. Note that all the modules implemented here can be used with TorchScript.

For now, I have implemented:

Along that, you might found useful utilities in:

Representation of the convolutions filters used for the efficient resampling.


  • 28/07/2021:
    0.2.5 released:
    : support for setting a custom output length when resampling.
  • 22/06/2021:
    0.2.4 released:
    : adding highpass and band passfilters. Extra linting and type checking of the code. New
    implemention, up to x6 faster FFT convolutions and more efficient memory usage.
  • 26/01/2021:
    0.2.2 released:
    fixing normalization of filters in lowpass and resample to avoid very low frequencies to be leaked. Switch from zero padding to replicate padding (uses first/last value instead of 0) to avoid discontinuities with strong artifacts.
  • 20/01/2021:
    implementation of resampling is now officially part of Torchaudio.


requires python 3.6. To install:
pip3 install -U julius


See the Julius documentation for the usage of Julius. Hereafter you will find a few examples to get you quickly started:

import julius
import torch

signal = torch.randn(6, 4, 1024)

Resample from a sample rate of 100 to 70. The old and new sample rate must be integers,

and resampling will be fast if they form an irreductible fraction with small numerator

and denominator (here 10 and 7). Any shape is supported, last dim is time.

resampled_signal = julius.resample_frac(signal, 100, 70)

Low pass filter with a 0.1 * sample_rate cutoff frequency.

low_freqs = julius.lowpass_filter(signal, 0.1)

Fast convolutions with FFT, useful for large kernels

conv = julius.FFTConv1d(4, 10, 512) convolved = conv(signal)

Decomposition over frequency bands in the Waveform domain

bands = julius.split_bands(signal, n_bands=10, sample_rate=100)

Decomposition with n_bands frequency bands evenly spaced in mel space.

Input shape can be [*, T], output will be [n_bands, *, T].

random_eq = (torch.rand(10, 1, 1, 1) * bands).sum(0)



This is an implementation of the sinc resample algorithm by Julius O. Smith. It is the same algorithm than the one used in resampy but to run efficiently on GPU it is limited to fractional changes of the sample rate. It will be fast if the old and new sample rate are small after dividing them by their GCD. For instance going from a sample rate of 2000 to 3000 (2, 3 after removing the GCD) will be extremely fast, while going from 20001 to 30001 will not. Julius resampling is faster than resampy even on CPU, and when running on GPU it makes resampling a completely negligible part of your pipeline (except of course for weird cases like going from a sample rate of 20001 to 30001).


Computing convolutions with very large kernels (>= 128) and a stride of 1 can be much faster using FFT. This implements the same API as

but with a FFT backend. Dilation and groups are not supported. FFTConv will be faster on CPU even for relatively small tensors (a few dozen channels, kernel size of 128). On CUDA, due to the higher parallelism, regular convolution can be faster in many cases, but for kernel sizes above 128, for a large number of channels or batch size, FFTConv1d will eventually be faster (basically when you no longer have idle cores that can hide the true complexity of the operation).


Classical Finite Impulse Reponse windowed sinc lowpass filter. It will use FFT convolutions automatically if the filter size is large enough. This is the basic block from which you can build high pass and band pass filters (see



Decomposition of a signal over frequency bands in the waveform domain. This can be useful for instance to perform parametric EQ (see Usage above).


You can find speed tests (and comparisons to reference implementations) on the benchmark. The CPU benchmarks are run on a Mac Book Pro 2020, with a 2.4 GHz 8-core intel CPU i9. The GPUs benchmark are run on Nvidia V100 with 16GB of memory. We also compare the validity of our implementations, as compared to reference ones like


Running tests

Clone this repository, then

pip3 install .[dev]'

To run the benchmarks:

pip3 install .[dev]'
python3 -m bench.gen


is released under the MIT license.


This package is named in the honor of Julius O. Smith, whose books and website were a gold mine of information for me to learn about DSP. Go checkout his website if you want to learn more about DSP.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.