June 19, 2011

Scorx Ion Scorx Ion
Lab Rat
15 posts

[Solved] Passing QByteArray to 3rd party function

 

Hello,

I have written a program that records audio and stores it in a QByteArray buffer. I can verify the buffer is holding the correct audio data by playing that buffer in the same program. I am trying use CMU’s pocketsphinx to do speech recognition on this audio buffer. To do this I have to pass the QByteArray as the second parameter in the following function:

  1. int ps_process_raw(ps_decoder_t *ps, int16 const *data, size_t n_samples, int no_search, int full_utt);

I have tried several different ways to pass the buffer into pocket sphinx but I haven’t been successful. At best I get very unexpected and inconsistent results (processing the same buffer multiple times yields different speech detection strings). At worst I get seg faults after trying to process multiple buffers or the same buffer multiple times.

I’ve been reading up on serialization of data and wonder if this is the correct way to pass the data. However I don’t exactly understand how it works. I’ve tried the following methods in my recognizer class:

1) This option doesn’t give consistent or even correct results but doesn’t cause a seg fault:

  1. ps_process_raw(m_speechDecoder, (qint16 *)((m_byteArray[buffer].data())), m_byteArrayUsed[buffer], 0, 0);

2) In this approach I copy the data into a temporary QByteArray and then try to pass it using QDataStream. I make a copy because the data stream destroys the byte array it is passed.

  1. QByteArray storage;
  2. QDataStream storageIn(&storage;, QIODevice::WriteOnly);
  3. storageIn.setByteOrder(QDataStream::LittleEndian);
  4. storageIn.writeRawData(m_byteArray[buffer], m_byteArrayUsed[buffer]);
  5.  
  6. QDataStream storageOut(&storage;, QIODevice::ReadOnly);
  7. storageOut.setByteOrder(QDataStream::LittleEndian);
  8.  
  9. QVector<qint16> vectMethod1;
  10. vectMethod1.resize(m_byteArrayUsed[buffer] / 2);
  11. storageOut.readRawData((char*)vectMethod1.data(), m_byteArrayUsed[buffer] / 2);
  12.  
  13. procError = ps_process_raw(m_speechDecoder, vectMethod1.data(), m_byteArrayUsed[buffer], 0, 0);

3) This was a different way I thought of using a data stream. It also cases seg faults:

  1. AUDIO_PLAY_DEBUG << "Processing speech with QDataStream translation";
  2. QDataStream stream(&(m_byteArray[buffer]), QIODevice::ReadOnly);
  3. stream.setByteOrder(QDataStream::LittleEndian);
  4. QVector<qint16> vectMethod2;
  5. vectMethod2.resize(m_byteArrayUsed[buffer] / 2);
  6. for (int i = 0; i < (m_byteArrayUsed[buffer] / 2); i++)
  7.      stream >> vectMethod2[i];
  8.  
  9. procError = ps_process_raw(m_speechDecoder, vectMethod2.data(), m_byteArrayUsed[buffer], 0, 0);

What am I missing here? It seems like this should be a very simple thing to accomplish.

Thanks for any help!

edit: I forgot to mention that my buffer is actually an array of QByteArrays. When one element of the buffer is filled with recorded audio it is passed to the speech processor while the next buffer array element captures more audio data. The variable buffer is the index of the element that needs to be processed.

5 replies

June 20, 2011

Franzk Franzk
Lab Rat
833 posts

The fact that you cast a char (quint8) pointer to a quint16 pointer or try to have the compiler solve the problem for you slightly worries me. Here you’re telling the compiler to read through the array as if it were a quint16 array and shut up in the process (you’re breaking strict aliasing rules). You’re probably going to have more luck using an approach like

  1. QVector<quint16> out;
  2.  
  3. for (int i = 0; i < in.size(); ++i) {
  4.     out << in.at(i); // maybe amplify the signal if necessary
  5. }
  6.  
  7. procError = ps_process_raw(m_speechDecoder, out.data(), m_byteArrayUsed[buffer], 0, 0);

What are your in- and output types?

By the way, code can be highlighted with the @-tag.

 Signature 

“Horse sense is the thing a horse has which keeps it from betting on people.”—W.C. Fields

http://www.catb.org/~esr/faqs/smart-questions.html

June 22, 2011

Scorx Ion Scorx Ion
Lab Rat
15 posts

Thanks for the reply.

I had never heard of the strict aliasing rule before, but I found a good article [cellperformance.beyond3d.com] about it.

The method you suggested yielded different results than my other attempts. The first time I copy a given buffer into a QVector and pass it to pocketsphinx, I get NULL back as the recognized speech string. All subsequent time I recopy the same buffer, I get a string back that is erroneous or inconsistent. Note that the NULL string is returned the first time every buffer is processed.

In regards to input and output data types, I am trying to follow the pocketsphinx example documentation [cmusphinx.sourceforge.net] as closely as possible. The documentation recommends using an audio stream read in from an audio file that is “a single-channel (monaural), little-endian, unheadered 16-bit signed PCM audio file sampled at 16000 Hz.” I set my audio format accordingly and adjust it to nearest (nearest only adjusts the frequency). The biggest difference is that my stream is coming from an audio device instead of a file.

With this audio format, shouldn’t my QAudioInput give me a QIODevice that is storing the data as 16 bit signed integers? Perhaps I should be transferring the data from this mystery QIODevice into a different data structure than QByteArray.

June 22, 2011

Franzk Franzk
Lab Rat
833 posts

Gilgrum wrote:
With this audio format, shouldn’t my QAudioInput give me a QIODevice that is storing the data as 16 bit signed integers? Perhaps I should be transferring the data from this mystery QIODevice into a different data structure than QByteArray.
You should be able to get 16 bit signed integers, but QByteArray will always be a byte array and therefore store 8 bit values, which means you end up being the one having to solve the little/big endian (your program might run on a big endian machine) problem.

I’m not entirely sure how I would do this, but I think I would end up using QDataStream.

  1. QByteArray recordedAudio = ...;
  2. QVector<qint16> samples;
  3.  
  4. QDataStream dataStream(recordedAudio);
  5. dataStream.setByteOrder(QDataStream::LittleEndian);
  6. while (!dataStream.atEnd()) {
  7.     qint16 sample;
  8.     dataStream >> sample;
  9.     samples << sample;
  10. }
  11.  
  12. procError = ps_process_raw(m_speechDecoder, samples.data(), m_byteArrayUsed[buffer], 0, 0);

I haven’t tested this code and I think it could be improved performance wise (if necessary), but I hope this approach at least brings you closer to the desired result.

Note that you might be able to use QDataStream directly on the QIODevice you got from QAudioInput, depending on how the io device has been opened (read/write), but that may make your program more complex to understand.

 Signature 

“Horse sense is the thing a horse has which keeps it from betting on people.”—W.C. Fields

http://www.catb.org/~esr/faqs/smart-questions.html

July 27, 2011

Scorx Ion Scorx Ion
Lab Rat
15 posts

I know this is an older post now, but I’ve come up with a solution. I thought I should share it in case anyone else stumbles upon this. Feel free to skip to the bottom for tldr if you don’t care for the explanation.

From reading through the source code for the QIODevice returned by QAudioInput->start(), it is abundantly clear that QIODevice captures the data as whatever type you specify in your QAudioFormat. Also, the QByteArray is designed to store random data and gives access to that data using the smallest pointer (char *). I presume the QByteArray is written with the assumption that the programmer using it will know what data is being fed into it and will therefore know what to coerce (type cast) the return type to. Additionally, I surmise that QDataStream is mostly designed for use on data streams with mixed data types. There is no reason to use the QDataStream in my specific example.

With this information in hand, an analysis of my code revealed 2 mistakes I had made:
1) Somehow I was specifying Unsigned int in my audio format and then later coerced (type cast) it to signed int. Changing the audio format to the correct format of signed int fixed this issue. I also setup my type casting for the feeding the 3rd party library to (qint16 *)(m_byteArray.data()).
2) I was trying to record at a frequency of 16000 Hz as suggested by my audio processing library (Sphinx), but my hardware would only support 8000 Hz. I found out I have to configure my audio processing library to use 8000 Hz instead to fix this discrepancy.

All in all, very frustrating that the issues were very minor. But at least I now have highly accurate speech recognition due to the wonders of Sphinx and Qt.

tldr; I was specifying Unsigned int in my audio format when I meant I wanted Signed int. Additionally, my record frequency wasn’t matched up with the frequency my audio processor expected. Fixing those fixed the problem.

July 27, 2011

Franzk Franzk
Lab Rat
833 posts

Good to see you solved it.

Note that you’re going to have to take into account the endianness of the system when you type-cast. The QDataStream solution passes that responsibility to Qt. You specify that the file data is stored in little endian format (unsigned short i = 1; memory maps to 0×0100), but if you use this solution on a big endian system (unsigned short i = 1; memmaps to 0×0001) you’re going to see weird results all over again. I don’t know if you’re going to run into this, but take note of it in any case.

 Signature 

“Horse sense is the thing a horse has which keeps it from betting on people.”—W.C. Fields

http://www.catb.org/~esr/faqs/smart-questions.html

 
  ‹‹ Mac OS X: QIODevice::Read(char *data, qint64 size) reading exactly 1024 bytes      How to successfully create a debuggable Qt application on Mac OS X? ››

You must log in to post a reply. Not a member yet? Register here!