Speech coding – first pass cleaning and speaker identification
- Coding is done on this page: https://uelbabydev.com/onacsa-lab-audio-coding-phase-one-v3
- Small segments 3-10 mintues long of lab audio data are coded, the title of each of these vocalisation tells us everything we should need to know. E.g. 1001_2_pk _0013, means participant 1 (1001), visit number 2 (visits are 5, 10, 15 and 36 months), different types of play: jp – joint play, sp – solo play, pk – puppet karaoke, ip – interrupted play, pb – peekaboo, and finally vocalisation number.
- Prior to coding, the VAD (automated Voice Activity Detector) identifies the voiced audio – and from this the onset (start) and offset (end) times for each vocalisation are calculated.
- There are approx. 100-200 separate identified vocal segments ‘vocalisations’ for each type of play.
- For each vocalisation on the page you can play the vocalisation and an extended vocalisation, with one second added to the start and end. There are then a series of up to three questions about each segment.
- Relevant questions will pop up based on responses. Simply select the response that best describes the audio. Once all questions are responded to click ‘Submit’
- There is an option to choose ‘Who is coding this audio file?’, make sure to choose yourself as this is how we keep track of who has coded what and how much you have coded. In future we hope to make this automatic.
- Because of how the VAD is set up, there a quite a few ‘false positives’ (noise/rattle) where no-one is vocalising.
Who is Talking?
For this question you only need to listen to the vocalisation, you don’t need to listen to the extended vocalisation. Even if someone else is speaking in the extended vocalisation you should ignore them.
The participating infant vocalising.
The participating mum vocalising.
Co-vocalisation. This is naturally occurring in speech where two or more speakers are speaking in overlap with each other. This means any part of one speaker’s vocalisation overlaps with any part of another speaker’s vocalisation. E.g. this could be at the end or beginning (or both) of a vocalisation:
Or it could be for the whole part of one vocalisation (e.g. a mum who is trying to calm a crying baby)
The important thing is that there is no pause or gap between the vocalisations. If it is unclear whether the vocalisations are co-vocalisations or merged vocalisations, please code as co-vocalisations.
Merged vocalisations. This is where two (or more) separate vocalisations have been picked up and ‘merged’ together in error by the VAD. E.g.
The important thing is that the vocalisations are not in overlap, and there is a small pause between vocalisations. If it is unclear whether the vocalisations are co-vocalisations or merged vocalisations, please code as co-vocalisations.
If the researcher is present in the recording (whether they are speaking or not). The researcher normally visits twice – one to set up the wearable device and once to collect it – these times will not be qualitatively coded.
Marta (Spanish) and Emily
High pitched rattle sound or clunking where the bell bashes against the inside of the wooden cage.
Quiet rattle, here the mum is speaking so code it as mum and when asking for noise select rattle.
The same as the rattle but the bell has been dulled, so you can still here the shaking but can’t here the bell.
Other Instrument or Toy
We sometimes use a xylophone, or buzzer to help synchronise the audio. Sometimes there are other noises like phones, please code these as other instrument/toy.
Anything unvoiced e.g. clothes rustling, unidentified noise
Often there are audio clips that are really short and really difficult to code, you should always listen to the extended vocalisation to make sure that the sound is noise, if you are sure that it’s not a vocalisation code it as noise.
Is there noise during the vocalisation?
Notes: Sometimes the VAD will correctly identify a voice, but there will also be noise during the vocaisation.
For this question you only need to listen to the vocalisation, you don’t need to listen to the extended vocalisation. Even if there is noise in the extended vocalisation we don’t mind, we only care if there is noise in the vocalisation.
Toy hitting table
Banging hands on table
Shaker – The same as the rattle but the bell has been dulled, so you can still here the shaking but can’t here the bell.
Other – here code for toys/instruments/phones etc if there was also vocalisations that stopped you from doing this earlier.
Is the start or end cut off?
This is only relevant is there is a significant cut that you can hear between the standard and extended vocalisation. For this question please listen to both the vocalisation and extended vocalisation.
Often you may find strings of vocalisations where one is cut off at the end and another is cut off at the start.
If the vocalisation is there in its entirety – i.e. none of the vocalisation has been cut off at the start or the end.
Code if the end has been cut off.
Beginning and end
Code if both start and end have been cut off.
Code if the start of the vocalisation has been cut off.
When the wider segment plays, sometimes there is an extra infant vocalisation present before the vocalisation (you will know this because you will have just coded the previous vocalisation).
This is almost impossible to find so don’t worry if you don’t find any.
Is this voc part of the previous voc?
Sometimes there are strings of vocalisations that the VAD accidentally cuts these into separate vocalisations. To help us put these back together please mark yes if your current vocalisation is part of the previous vocalisation.
To determine this, listen to the extended vocalisation for the one you are working on and the vocalisation from the previous one if you can hear the previous vocalisation in the current extended vocalisation then this is part of the same vocalisation.