# jitsi-webrtc-vad-wrapper **Repository Path**: freeonsky/jitsi-webrtc-vad-wrapper ## Basic Information - **Project Name**: jitsi-webrtc-vad-wrapper - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-01-21 - **Last Updated**: 2024-01-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ### Jitsi-webrtc-vad-wrapper This repository contains a java wrapper around the native VAD engine that is part of the WebRTC native code package (https://webrtc.org/native-code/), as provided in a fork at https://github.com/dpirch/libfvad. ### Building In order for the maven package to function, two native shared libraries are required. The first library, `libfvad.so`, is the shared library of the VAD engine. This library can be compiled on Ubuntu 18.04 by running the script `compile_libfvad.sh`. See https://github.com/dpirch/libfvad for more details. The second library, `webrtcvadwrapper.so`, is JNI c++ code wrapping around `libfvad.so`. The headers can be (re)generated using `generate_jni_headers.sh` and the library can be compiled using `compile_libwebrtcvadwrapper.sh`. To install locally, simply run `mvn install`. As for now, a folder containing the shared library files needs to be manually added to the `java.library.path` property of any application using this maven module. This can be done by for example setting the environment variable `LD_LIBRARY_PATH` to `/path/to/shared/libraries:$LD_LIBRARY_PATH`. ### Using the wrapper The VAD engine requires mono, 16-bit PCM audio with a sample rate of 8, 16, 32 or 48 KHz as input. The input should be an audio segment of 10, 20 or 30 milliseconds. When the audio input is 16 Khz, the input array should thus be either of length 160, 320 or 480. The voice activity detection can run in 4 different modes. The modes range from 0 to 3. Mode 0 is very strict, which means the probability of the audio segment being speech when the VAD predicts it is speech is higher. Mode 3 is very aggressive, which means the probability of the audio being speech when the VAD predicts it is is lower. As expected, mode 1 and 2 gradually decrease this probability. The example below shows the creation of a `WebRTCVad` object which accepts 16 Khz audio and is running in mode 1. ```java import org.jitsi.webrtcvadwrapper.WebRTCVad; class Example { public static void main(String[] args) { int[] linear16Audio = new int[] { /* 160, 320 or 480 integer values */ }; WebRTCVad vad = new WebRTCVad(16000, 1); boolean isSpeechSegment = vad.isSpeech(linear16Audio); } } ``` ### Credit Thanks to Daniel Pirch and the WebRTC project authors for providing the VAD Engine.