So I'm making up an artificial language (don't ask me why, I don't know why, I think I'm possessed!). It is suppose to be an universal language with ease of learning as its highest directive, followed by logical simplicity and expandability. I got a grammar and word structure down, but I'm stumped on sounds, consonants specifically as I think I've settled on five vowels. There are two competing theories for me on what consonants to use, the first is simply to choose the most popular consonants, easy enough, Interlingua and Lojban claim to have done it already, and right away I see a problem. Lets take a look at lojban's consonants:
Lets say your native japanese speaker, well the "v" and "b" sounds are going to be hard to differentiate: you will have difficulty hearing the difference between the two! For a Cantonese speaker the "l" and "n" sounds, etc, it looks like they notices these problem and fix the most obvious "r" and "l" confusion by putting in the less popular trilled "r" sound, no ones is going to confuse that for an "l". This leads me to a different counter-intuitive theory: Choose sounds regardless of how unpopular they are as long as they are less likely to be confused for other sounds. It can take years if ever for an adult foreign language speaker to master being able to differentiate between similar sounds.[1],[2],[3] Yet how long does it take learn to speak a new sound? I don't think it takes too long, when I was looking up consonants I found the click sounds, very rare sounds only used in a few Sub-Saharan languages, yet it only took me minutes to learn how to make those sounds with or without vowels at the beginning and/or ends. No one is going to confuse a "!" (click-clack sound like a horse hooves on pavement) or "|" (snapping sound like a branch breaking) for any other sound! So I made up a chart of 15 potential consonant sounds which are choose such that it would be difficult for a uninitiate to confuse one for another, the exact sound is boxed and similar sounds which might be allowable mispronunciations are colored the same.
Am I right or wrong?
[1]:Best, Catherine; Strange, W. (1992), "Effects of phonological and phonetic factors on cross-language perception of approximants", Journal of phonetics 20: 305–330 http://www.haskins.yale.edu/sr/SR109/SR109_07.pdf
[2]: Logan, John; Lively, Scott; Pisoni, David (1991), "Training Japanese listeners to identify English /r/ and /l/: a first report", Journal of the Acoustical Society of America 89 (2): 874–886 http://www.ncbi.nlm.nih.gov/pubmed/2016438
[3]: Koyama S, Akahane-Yamada R, Gunji A, Kubo R, Roberts TP, Yabe H, Kakigi R. "Cortical evidence of the perceptual backward masking effect on /l/ and /r/ sounds from a following vowel in Japanese speakers." Neuroimage. 2003 Apr;18(4):962-74. http://www.ncbi.nlm.nih.gov/pubmed/12725771

Lets say your native japanese speaker, well the "v" and "b" sounds are going to be hard to differentiate: you will have difficulty hearing the difference between the two! For a Cantonese speaker the "l" and "n" sounds, etc, it looks like they notices these problem and fix the most obvious "r" and "l" confusion by putting in the less popular trilled "r" sound, no ones is going to confuse that for an "l". This leads me to a different counter-intuitive theory: Choose sounds regardless of how unpopular they are as long as they are less likely to be confused for other sounds. It can take years if ever for an adult foreign language speaker to master being able to differentiate between similar sounds.[1],[2],[3] Yet how long does it take learn to speak a new sound? I don't think it takes too long, when I was looking up consonants I found the click sounds, very rare sounds only used in a few Sub-Saharan languages, yet it only took me minutes to learn how to make those sounds with or without vowels at the beginning and/or ends. No one is going to confuse a "!" (click-clack sound like a horse hooves on pavement) or "|" (snapping sound like a branch breaking) for any other sound! So I made up a chart of 15 potential consonant sounds which are choose such that it would be difficult for a uninitiate to confuse one for another, the exact sound is boxed and similar sounds which might be allowable mispronunciations are colored the same.

Am I right or wrong?
[1]:Best, Catherine; Strange, W. (1992), "Effects of phonological and phonetic factors on cross-language perception of approximants", Journal of phonetics 20: 305–330 http://www.haskins.yale.edu/sr/SR109/SR109_07.pdf
[2]: Logan, John; Lively, Scott; Pisoni, David (1991), "Training Japanese listeners to identify English /r/ and /l/: a first report", Journal of the Acoustical Society of America 89 (2): 874–886 http://www.ncbi.nlm.nih.gov/pubmed/2016438
[3]: Koyama S, Akahane-Yamada R, Gunji A, Kubo R, Roberts TP, Yabe H, Kakigi R. "Cortical evidence of the perceptual backward masking effect on /l/ and /r/ sounds from a following vowel in Japanese speakers." Neuroimage. 2003 Apr;18(4):962-74. http://www.ncbi.nlm.nih.gov/pubmed/12725771