The two most critical performance criteria when evaluating a watermarking approach are:
  • Transparency - An ideal watermark should be imperceptible.
  • Persistence - An ideal watermark should persist through all form of audio manipulations, whether they are legitimate (robustness) or malicious (security).
Unfortunately, establishing either transparency or persistence beyond any doubt is equivalent to proving a negative; a challenge I most gladly leave to smart salesmen.
In addition to these two core concerns, there are a couple of more practical issues:
  • Data payload - An ideal watermark should convey all the required data within any arbitrary and small portion of the content.
  • Computational requirements - An ideal watermarking software solution should be lean and mean.
TRANSPARENCY Performing rigorous listening tests was beyond the limited scope of this shareware's development cycle. In the absence of any sophisticated psychoacoustic model I am fairly confident that a reasonably trained listener should be able to detect perceptual differences between an original file and its encoded version in a high quality A/B/X listening environment. In other words, don't expect the EyM Audio Watermark to be truly transparent.

Yet the more relevant question you should ask is whether its audio quality is good enough for your application. If you target consumer grade listening environments, where people might be satisfied with the audio quality of an MP3-compressed recording, then I doubt anyone would complain about any audible artifacts introduced by this watermarking process. Ultimately though, only you can be the judge of that and I encourage you to freely experiment with various configurations of this shareware to find a compromise between transparency and robustness that may best fit your needs.

PERSISTENCE Lossy audio compression:
The primary persistence goal for this shareware was that the watermarks should survive most popular kinds of lossy compression such as MP3. This was confirmed through early tests of the system.
It was further quantified using a set of 100 files (summing up to about 7 hours of mostly pop music) which were watermarked with an 8-byte payload (plus time stamp) at the default gain 1.0, compressed to MP3 at 128 kbps using a popular media player and decompressed back to 44.1 kHz WAVE files using the same media player. The decoder had no trouble recovering the 8-byte payload from these "post compression" audio files.

D/A - A/D, resampling and acoustic coupling:
An early and casual test also showed that the EyM watermarks could survive through the acoustic coupling between a consumer grade speaker and a microphone. As a funny anecdote this experiment involved the "filming" of a CD player using a digital video camera.
Other tests included downsampling stages applied to 44.1kHz watermarked audio files. Based on 100 files (summing up to about 7 hours of mostly pop music), the decoder had no trouble recovering an 8-byte payload after downsampling those to 32kHz, 24kHz, or 22.05kHz. The decoder appeared to have a harder time with a 16kHz downsampling as it failed to retrieve any code from 24 of these songs. Finally at 11.025kHz downsampling, the decoder failed to retrieve any of these 8-byte payloads from any of the 100 songs. This was certainly expected given the poor quality of the audio material at the point. Note also that by audio watermarking standards, 8 bytes is a significant payload and it is possible that these low sampling rate results could be improved by either using a smaller payload or increasing the watermark gain. (This series of tests were conducted using the default gain value 1.0.

Time scaling:
A widely known vulnerability of spread spectrum is synchronization. Accurate timing is the inevitable price to pay for the robustness benefits of spreading data in the frequency domain and the EyM Audio Watermarking decoder is not immune to it. Note however that there is a difference between fooling the watermark decoder and actually removing the watermark. So while speeding up or slowing down the audio recording might confuse the decoder, chances are that the watermark may still be there for a more thorough forensic decoder to find (one that may use the original recording to detect and compensate for any time scaling that might have occurred).

DATA PAYLOAD The arbitrary binary payload you provide is turned into a data packet that includes some synchronization, error checking and the optional 12-bit time stamp information. Because of this fixed overhead the effective data rate at which you mark your audio is a function of your payload size.

For instance, conveying a payload of only 1 byte (8 bits) with an optional time stamp will create a data packet that gets repeated at a 1.7 seconds interval throughout the audio stream (i.e. the equivalent of a 4.7 bps data rate). On the other hand, conveying a payload of 8 bytes (64 bits) without time stamp will create a data packet that gets repeated at a 3.4 seconds interval throughout the audio stream (i.e. the equivalent of a 18.8 bps data rate).

But thinking of the data payload efficiency in terms of its encoding data rate can be misleading. Indeed the audio stream can be thought of as a tempestuous and intermittent data channel. Not all embedded watermarks will go through, even if the audio material is not degraded. This is why the same data gets repeated throughout the audio stream, allowing the decoder to use an expected redundancy in order to retrieve that payload despite the intermittence of the data channel. From this perspective it is easier to understand why shorter packets have a better chance to "get through" while providing a higher amount of redundancy that further helps their recovery.

And hence the following rule of thumb: you should always carefully consider the nature of your application in order to identify the smallest watermark payload that will fit your need. As a hint, don't necessarily think of that payload as explicit information but rather as a simple identifier that can be subsequently linked to specific information. A 4-byte (32 bits) identifier can be used to distinguish between over 4 billion objects.

COMPUTATIONAL REQUIREMENTS CPUs have gotten so fast that the most noticeable bottleneck for this kind of process can be disk access for the WAVE files I/O. This is certainly noticeable on the 3 GHz Pentium 4 laptop I'm using, which has a 7200 rpm disk.

Processing speed:
On this platform, watermark encoding (including file I/O) seems to happen at around 30X for a 44.1kHz stereo WAV file. The processing should scale linearly with the number of channels and the sampling rate so encoding a 22.05 kHz mono WAV file should be 4 times faster (i.e. 120X).
Still on the same platform, the watermark decoding seems to happen somewhere between 30X and 50X depending on the file. The decoding process also scales linearly with the number of channel but it's scaling is not quite linear with the sampling rate as it benefits from some internal down-sampling.

Memory wise, the watermark encoding stage is a bit more memory intensive than the decoding stage. Indeed the encoder will try to maintain the entirety of the audio material in memory in order to avoid clipping before it decides to commit the output to disk. That audio is maintained internally at a 32 bits per sample resolution so depending on the size of your audio recording it could become significant.
The decoding stage never reads more than about a second's worth of audio at a time and therefore it benefits from a relatively small memory footprint.