I'm always ready to be called upon to comment on linguistic issues for the media, but this situation was particularly difficult owing to the content of the video. I hope that the phonetics community is able to assist the authorities. I wish to make it clear that am not carrying out the analysis myself.
The speaker displays many of the features of a British accent known widely as Multicultural London English (MLE), such as producing vowels in e.g. FACE and PRICE as monophthongs, not dropping /h/ sounds (/h/-dropping is common in London accent Cockney) and pronouncing voiced dental fricatives in e.g. the as a [d]. There are glottal stops, which are less common in Afro-Caribbean or African English accents, and /l/-vocalisation. The speaker also has a more syllable-timed speech rhythm; instead of pronouncing the phrase from all walks of life as /frəm ɔːl wɔːks əv laɪf/ it sounds more like [frɒm ɔː wɔːks ɒv lɐːf], with a full vowel in each syllable.
See HERE for information on the features of MLE (yes, it's Wikipedia, but a good summary).
The accent was identified chiefly by Professor Paul Kerswill and colleagues; Paul was at Reading, but is now at York via Lancaster.
It's impossible to say exactly how many speakers there are of this accent, but it is common among younger working-class speakers in the London area, and features of the accent have also been observed in other urban areas of the UK. It is not at all exclusive to speakers from an Afro-Caribbean background but is also spoken extensively but e.g. white and Asian speakers wishing to identify with a certain demographic / social group.
British impressionist and actor Alistair McGowan did a nice piece on MLE for the BBC's One Show, which you can view below.
I would say that the speaker in the clip is probably a UK or fully bilingual speaker of English rather than a second language learner or someone with an indigenised variety of English (e.g., Nigerian English). The speaker probably grew up in or near inner London and has probably been educated in the UK system. I would be surprised if he was from outside the greater London area, but this is an accent which is socioculturally attractive and so he may be from further afield. I would also suggest that he is lower middle-class rather than working-class as he sounds educated.
It should be noted, however, that we cannot actually see him speak in the film. Most of his face including his mouth is covered. It could, therefore, be a voice-over.
When I appeared on the BBC News Channel (I'm so sorry I don't have a clip of this to share) I was asked about forensic phonetic analysis of this speaker's voice. What we would need to be able to do this is a reference sample of a known speaker in order to make comparisons between that and the Foley video. As one of my colleagues, Martin Barry, points out, unless this speaker has spoken into a police microphone it will be almost impossible to carry out forensic speaker comparison successfully.
My picture from the AP session also appeared in the Los Angeles Times. You can view the online article HERE, which has comment from Martin Barry.