I am, of course, usually coming at it from the angle of someone pronouncing English.
As you doubtless know, accents such as Standard Southern British English (SSBE) have up to three consonants at the start of a syllable (onset consonants) and four consonants at the end (coda consonants), and a syllable usually has a vowel as its peak. The structure of the basic SSBE syllable can therefore be described as follows:
I'd recommend reading Roach (2009) chapter 8 or Cruttenden (2008) section 5.5 for a full description of what clusters are possible in syllable onsets and codas in SSBE. There's also a nice description on the Macquarie Linguistics pages. The main point I want to make here is that not all languages have syllables which are as complex as English (and English does not have the monopoly on complexity), and this is what can lead to problems with pronunciation as much as not being able to produce a sound.
The thing which always surprises me - and perhaps it shouldn't - is that teachers of English from other language backgrounds often know nothing about the phonology of their own language, and so do not understand that a learner's problem with pronouncing a sound in a particular position in the syllable is unlikely to be about not being able to produce the sound per se but that the learner's language does not permit certain sounds in certain positions in the syllable. If, for example, a learner is from a Chinese language background and that language only permits a zero-coda (i.e., no consonants at the end of syllables) or only a nasal of some description in the coda, pronouncing any other consonant at the end of a syllable may be difficult, and pronouncing clusters is going to be an extreme challenge.
In addition, learners have different strategies for dealing with clusters. Some learners (e.g., Japanese) will insert vowels between consonants in a cluster - this process is known as vowel epenthesis - in order to preserve as many consonants as possible. By comparison, Chinese speakers will often elide consonants in order to be more similar to Chinese syllable structure and number. Here's a favourite comparison of mine: In Japan, MacDonald's, which is /məkˈdɒnəldz/ in SSBE, is known as "ma-ku-do-na-ru-do", but in Hong Kong it is known as "mak-do-nau", with a strongly glottalised and unreleased [k] in the first syllable. Japanese tends to preserve the consonants but Cantonese preserves the number of syllables.
In World Englishes, we often see patterns of syllable structure influenced by a speaker's L1 or the indigenous language(s) of the region in which English has been adopted. This may be why speakers of many varieties of English around the world drop third-person singular "-s"; the meaning of it is retrievable from the context, and it's a rather superfluous inflection which is likely to be dropped anyway in complex codas in many L2 Englishes.
Does it matter that clusters are simplified? Yes, it does, if intelligibility and therefore meaning is compromised. One is unlikely to be misunderstood if leaving off third-person singular "-s", but it becomes more of a problem in other contexts; my understanding of a Hong Kong English pronunciation of MacDonald's (I'd asked what the student's favourite things were) was that the speaker had said Madonna, thanks largely to the lack of consonants at the end of the word.
What can teachers do about this? First, one needs to be aware of the syllable constraints of the L1 of the learners you are going to be teaching, so you have an idea of whether they are used to complex syllable onsets and codas to start with. If not, chances are learners will be able to produce singleton consonants and some clusters in onset position with little difficulty, but codas are always more problematic.
One strategy, if coda consonants are a problem, is to try to "slide" the coda consonants into the next word; this doesn't always work, but it can also help learners with listening if they can understand that speech is a stream rather than a string of discrete words, and so it may well sound like coda consonants belong to the next word. For example, in a phrase such as "MacDonald's is my favourite", one could slide the final /z/ of MacDonald's into the start of the word "is" and it would then be a little more straightforward for the listener to retrieve.
Cruttenden, A. (ed). (2008). Gimson's Pronunciation of English (7th ed.). London: Hodder Education.
Roach, P. (2009). English Phonetics and Phonology (4th ed.). Cambridge: Cambridge University Press.