• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Split Voices

 
Ranch Hand
Posts: 94
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I have a mp3 file which is of mixed language speech.

Example, Assume Jon is one speaker and Mary is another speak. John knows german and Mary knows english.

Not John is giving a message and mary is translating...When John speaks one sentences, Mary translates it ....It goes on...

Now the audio is in single file ....

Now i want to save the german speech in one file and english speech in another file...

Similarly for any audio with two given languages needs to be separated as two different mp3 files....

Is there a way to do this in Java ???

Thanks
 
Rancher
Posts: 4801
50
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm sure you could do it in Java, but it's going to be a bit complex (this is the Beginning forum, and that was an understatement), and likely to involve things like machine learning to pick out the different languages and/or voices.

A similar concept:
https://stackoverflow.com/questions/605586/how-to-split-male-and-female-voices-from-an-audio-filein-c-or-java
 
Bartender
Posts: 242
27
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How much manual effort is allowed?

If there should be no manual effort, this will be extremely difficult and I don't think anyone here is going to be able to give you a direct answer on how to do this. Machine learning algorithms are not highly reliable when you don't have a huge dataset and powerful computing to process it, and achieving 95% accuracy would be an amazing result for this. While even this would likely be unacceptable for this application since it would result in a decent chunk of lost/misplaced translation (and you probably won't achieve 95% accuracy)

If you can take some manual input, simple; just have a human enter the timestamps when one person starts speaking and then divide the audio file based on those timestamps.
 
Saloon Keeper
Posts: 15550
364
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It becomes even more difficult if the speech overlaps, as is common with German voice-overs in interviews with English speakers.

Writing a library that performs this task automatically is practically impossible if you don't already have quite some knowledge in the area of artificial intelligence and signal processing. And if you did, you probably wouldn't have to ask this question here.

I think your best bet is to use a search engine to see if you can find a library that does this for you. Looking for one that is written in Java will significantly reduce your chance of finding one, so you might consider looking for a library written in any language and then using JNI/JNA to invoke it from your Java application.
 
Saloon Keeper
Posts: 27817
196
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:
Writing a library that performs this task automatically is practically impossible if you don't already have quite some knowledge in the area of artificial intelligence and signal processing. And if you did, you probably wouldn't have to ask this question here.



Also a LOT of work. Most people attempting such a task would look towards existing signal-processing libraries first, not try to whip up something from scratch. And at that, the libraries in question would likely enlist specialized hardware such as a good GPU to do the work.

You could make a crude attempt by doing frequency analysis, since male and femable voices tend to occupy different parts of the spectrum. Better, if the voices were on separate audio (stereo) tracks, the job would be simplified. Finally. if you simply wanted to break apart a set of distinct phrases and translations such as a language-learning recording, it should be easy to simply split at the silent points between them.

But those last two aren't really what I'd do in Java. There's a program called audacity that's much better suited to that sort of stuff. And probably - with suitable plugins - the first option as well.
 
Joseph Michael
Ranch Hand
Posts: 94
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Tim,

#1 - How to identify whether the voices were on separate audio (stereo) tracks in the existing mp3 file?
#2 - Assume if the audios are in separate tracks, how to easily split and join them?

Thanks in advance
 
Tim Holloway
Saloon Keeper
Posts: 27817
196
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The easiest way to tell if the voices are on separate tracks is to listen to them on a stereo device. If one person speaks through the left speaker/earphone and the other speaks through the right speaker/earphone, then you're on the right track. I'm assuming 2-channel stereo, of course. Surround sound is overkill for stuff like this.

The most accurate way to tell is to run Audacity and load up the MP3. When you feed a stereo MP3 file to Audacity, it will display each track as a separate graph. Saving those tracks to separate files is then trivial. You can also snip out the silent parts of a track or set of tracks. Audacity is very powerful.
 
reply
    Bookmark Topic Watch Topic
  • New Topic