pvsanal — Generate an fsig from a mono audio source ain, using phase vocoder overlap-add analysis.
Generate an fsig from a mono audio source ain, using phase vocoder overlap-add analysis.
ifftsize -- The FFT size in samples. Need not be a power of two (though these are especially efficient), but must be even. Odd numbers are rounded up internally. ifftsize determines the number of analysis bins in fsig, as ifftsize/2 + 1. For example, where ifftsize = 1024, fsig will contain 513 analysis bins, ordered linearly from the fundamental to Nyquist. The fundamental of analysis (which in principle gives the lowest resolvable frequency) is determined as sr/ifftsize. Thus, for the example just given and assuming sr = 44100, the fundamental of analysis is 43.07Hz. In practice, due to the phase-preserving nature of the phase vocoder, the frequency of any bin can deviate bilaterally, so that DC components are recorded. Given a strongly pitched signal, frequencies in adjacent bins can bunch very closely together, around partials in the source, and the lowest bins may even have negative frequencies.
As a rule, the only reason to use a non power-of-two value for ifftsize would be to match the known fundamental frequency of a strongly pitched source. Values with many small factors can be almost as efficient as power-of-two sizes; for example: 384, for a source pitched at around low A=110Hz.
ioverlap -- The distance in samples (“hop size”) between overlapping analysis frames. As a rule, this needs to be at least ifftsize/4, e.g. 256 for the example above. ioverlap determines the underlying analysis rate, as sr/ioverlap. ioverlap does not require to be a simple factor of ifftsize; for example a value of 160 would be legal. The choice of ioverlap may be dictated by the degree of pitch modification applied to the fsig, if any. As a rule of thumb, the more extreme the pitch shift, the higher the analysis rate needs to be, and hence the smaller the value for ioverlap. A higher analysis rate can also be advantageous with broadband transient sounds, such as drums (where a small analysis window gives less smearing, but more frequency-related errors).
Note that it is possible, and reasonable, to have distinct fsigs in an orchestra (even in the same instrument), running at different analysis rates. Interactions between such fsigs is currently unsupported, and the fsig assignment opcode does not allow copying between fsigs with different properties, even if the only difference is in ioverlap. However, this is not a closed issue, as it is possible in theory to achieve crude rate conversion (especially with regard to in-memory analysis files) in ways analogous to time-domain techniques.
iwinsize -- The size in samples of the analysis window filter (as set by iwintype). This must be at least ifftsize, and can usefully be larger. Though other proportions are permitted, it is recommended that iwinsize always be an integral multiple of ifftsize, e.g. 2048 for the example above. Internally, the analysis window (Hamming, von Hann) is multiplied by a sinc function, so that amplitudes are zero at the boundaries between frames. The larger analysis window size has been found to be especially important for oscillator bank resynthesis (e.g. using pvsadsyn), as it has the effect of increasing the frequency resolution of the analysis, and hence the accuracy of the resynthesis. As noted above, iwinsize determines the overall latency of the analysis/resynthesis system. In many cases, and especially in the absence of pitch modifications, it will be found that setting iwinsize=ifftsize works very well, and offers the lowest latency.
iwintype -- The shape of the analysis window. Currently only two choices are implemented:
0 = Hamming window
1 = von Hann window
Both are also supported by the PVOC-EX file format. The window type is stored as an internal attribute of the fsig, together with the other parameters (see pvsinfo). Other types may be implemented later on (e.g. the Kaiser window, also supported by PVOC-EX), though an obvious alternative is to enable windows to be defined via a function table. The main issue here is the constraint of f-tables to power-of-two sizes, so this method does not offer a complete solution. Most users will find the Hamming window meets all normal needs, and can be regarded as the default choice.
iformat -- (optional) The analysis format. Currently only one format is implemented by this opcode:
0 = amplitude + frequency
This is the classic phase vocoder format; easy to process, and a natural format for oscillator-bank resynthesis. It would be very easy (tempting, one might say) to treat an fsig frame not purely as a phase vocoder frame but as a generic additive synthesis frame. It is indeed possible to use an fsig this way, but it is important to bear in mind that the two are not, strictly speaking, directly equivalent.
Other important formats (supported by PVOC-EX) are:
1 = amplitude + phase
2 = complex (real + imaginary)
iformat is provided in case it proves useful later to add support for these other formats. Formats 0 and 1 are very closely related (as the phase is “wrapped” in both cases - it is a trivial matter to convert from one to the other), but the complex format might warrant a second explicit signal type (a “csig”) specifically for convolution-based processes, and other processes where the full complement of arithmetic operators may be useful.
iinit -- (optional) Skip reinitialization. This is not currently implemented for any of these opcodes, and it remains to be seen if it is even practical.
Warning | |
---|---|
It is unsafe to use the same f-variable for both input and output of pvs opcodes. Using the same one might lead to undefined behavior on some opcodes. Use a different one on the left and right sides of the opcode. |