© 2025 DXAssist. All rights reserved
Send your questions here via email.DX Assist – Support
Comparison table section
We’re working hard to add new tools, refine performance, and respond to your feedback — so don’t hesitate to reach out if you spot a bug, have a suggestion, or just want to say hi.
Changelog:
Version 1.1.1
Fixed a bug that caused issues when processing a track with no regions.
The trial version supports 3 minutes of material, but it doesn't have to be from the beginning of the session.
Version 1.1.0
Vertical Processing has been completely redesigned — it no longer loses regions and selects the appropriate versions more accurately. The Trial version now also supports Vertical Processing.
A new parameter has been added: Separation Level.
Version 1.0.1
Vertical Processing checkbox is now disabled by default.
Logo link in the top bar now correctly points to dxassist.me instead of aiaudio.tech
Version 1.0.0 – initial release
First public release of DXAssist.
Core features: AAF import, dialogue detection, audio stripping, parameter control, Vertical Processing (beta), trial mode limitations
Below you’ll find a handful of useful details and known issues we’re currently working on.
1. Parameters.
Probability determines how strict DXAssist is when detecting speech. Higher values cause elements like breaths, drawn-out word beginnings, and other speech-like sounds to be considered unnecessary and removed. It's recommended to experiment with settings in the range of 20% to 80%.
Strip Start and End Pad add a specified number of frames at the beginning and end of each region after processing. This is useful if you plan to add fade-ins and fade-outs later. However, keep in mind that setting the values too high may cause the regions to merge again.
Min Strip Gap defines the minimum gap that should remain between the cut regions. If the user finds the cuts between words too sharp and wants to maintain smoother continuity, this value can be increased. However, once again, setting it too high may result in regions being merged.
Vertical Processing is a new feature that processes material after the initial “horizontal” cuts are made. At this stage, audio bleed from other tracks may remain, since the program may recognize bleed from secondary lavaliers as speech. When Vertical Processing is enabled, the program attempts to choose the version that should be kept.
A video explaining how use Vertical Processing can be found in the Workflow section.
2. Supported AAF formats.
At this stage, DXAssist supports the WAV format contained within AAF files. We do not currently support MXF or AIFF formats. Additionally, due to the nature of the algorithm, we only support time division based on frames, not milliseconds. Expanding this functionality is one of the features we plan to develop in the future.
3. Troubleshooting
If program reads the AAF file but stops at the first clip and does not continue processing this means that the AAF contains formats that DXAssist currently does not support, such as MXF or AIFF. In this case, the best solution is to open the AAF in Pro Tools and re-export it using WAV embedded. A video explaining how to do this can be found in the Workflow section.
-
If nothing happens after pressing the Process button in the trial version, it may indicate that there is no content within the first 3 minutes of the AAF — which is the time range the trial version processes.
To test the app properly, you should prepare an AAF file that contains audio from the very beginning.
-
Since we aimed to make the algorithm support all languages and work fully offline (DXAssist runs without internet access and only requires it to activate the license), there may be cases where it doesn't recognize short words, drawn-out syllables, or various accents.
It may also fail to treat singing as speech and remove it, as well as breaths and similar vocal noises. We're continuing to improve the algorithm, but for now, you may need to manually correct such cases.
If you have any comments, suggestions, or anything you'd like to share with us, please feel free to write to us at support@dxassist.me.
Your input truly helps shape DX Assist
The DXAssist Team
Parameters

In version 1.1.0 of the software, you can choose which algorithms to run by selecting the corresponding checkboxes.
Horizontal Processing scans the tracks from start to finish of the AAF file, non-destructively removing regions that do not contain human speech. This process uses an advanced speech recognition model.
Vertical Processing, on the other hand, analyzes tracks vertically — across layers — and keeps only the best available versions.
You can use each process independently or combine them for optimal results.

Probability determines how strict the speech recognition model is when evaluating audio.
Lower values (e.g. 10–20%) allow breaths and subtle vocal sounds to be kept, but may also leave in unwanted noise or fragments. Higher values (around 70–80%) retain only clearly recognized speech — ideal for clean and well-recorded dialogue.

Separation Level defines the threshold at which the Vertical Processing algorithm decides to keep multiple audio signals. If the value is set to 0, only the single best signal will be retained. If it's set to 5, the algorithm will keep additional sounds whose loudness difference falls within that range. This gives you control over how selective the process should be — whether you prefer to let the algorithm decide everything, or retain more options for manual decision-making later.

Minimum Strip Gap is a parameter that defines the minimum number of frames between regions required to keep them separate. Lower values will result in many short, individual regions — sometimes even single words.
Higher values will merge nearby regions, preserving more fluid, continuous sentences. This setting can be adjusted according to your personal preference and editing style.


Strip Start & End Pad defines how many frames are added to the beginning and end of each region that remains after Horizontal Processing.Higher values will extend the preserved regions, which may reduce the precision of the cuts made by the algorithm — especially in dense dialogue sequences.