I think you are completely correct. I have never experienced these white blood cell making moving spots appear. (I had some floaters years ago but not recently.)
One thing you can do, at least with floaters, to convence yourself the cause is inside your eye, not any external objecting being seen, is to simply close only one eye. If it were something external, you would still see it with the still open eye.
If there are many Scheer spots, their number should drop by half, when one eye is closed, but that may be hard to notice. What I think would work is to have a spinning disk to look thru, which has a couple of opaque sectors covering 50%. Then the Scheer spots should flicker on and off, each eye producing a separated set.
---------
One slightly related effect is that an image which does not move on the retina is soon not visible. (As MadAnthony surely knows, but most do not, there is always a very fine "jitter" moving the eyes, even when you stare as steady as you can at some static sceen.* You do not see the shadows of all the blood vessels passing in front of your retina as this shadow does not move, not even slightly but is fixed on the retina.
Thus if your heart were to stop for say a minute all the brighter spots the white blood cells make on the retina would fall steadily on the same spot of the retina and become invisible. The processing in the brain of this highly cut up retinal image (by retinal blood vessels) "fills in" to make images percieved of objects appear to be continuous.
* with modern computers and eye tracking, it is possible and interesting to view a simple image that has been made stationary on the retina. (First time this was done years ago, a tiny light, lens and piece of film all on a light weight post was acually glued to the eye. Then the image it projected was static on the retina)
For example if the image is a squre of four black lines, soon one (or more sides will become invisible (and later return). I.e. it disappears "piece wise" - why this is true I understand but it is too complex to explain in detail. Has to due with way V1 cells parce objects out of a continuous field of stimulation and the importance of Hubel Wessel´s "line detectors" (and how they mutually re-inforce same orientation but surpress those that are near orthogonal)**
For example at times you experience two parallel lines, then perhaps a square sided U or C etc.
** If the stationary image on the retina is a circle, this dynamic interaction between the Hubel & Wessel line detectors does not occur so it disapears as a unit, and then will return as a unit.
BTW Hubel & Wessel´s line detectors do not really exist, despite earning a Nobel prize. They were mis lead into thinking they do (along with all the scientific community, many of who still think they do, or like me continue to speak of them, for convience). The stimulus that H&W showed the monkey with electrodes in V1 was almost always half black and half white with straight line dividing these areas. They rotated this screen to make the division line take different orientations. Some 180 different orentations make "cell A" max active and 90 degrees more rotation made it fall silent (smooth changes in activity with rotation) Cell B in contrast had a different orentation for the division line for max activity. etc.
The truth is the cells were responding to "Gabor functions" which are sort of limited range Fourier transforms. The cells, working together with their like oriented neighbors are much more sophisticated processors detecting the wavelength of the Gabor function components as well as their orientation.
This was shown later by using a stimulus pattern consisting of several paralled contrasting lines. (and screens with different but uniform spaces between the lines). Then it was harder to find cells that responed to it in any orientation.
The half dark half bright stimulus H&W used, if deconstructed into it Gabor funcion components has all wavelengths present - just like a step function has all Fourier components. Thus no matter what wavelength a cell monitored by H&W was tuned to (as párt of an analytic net), making the Fourier like Gabor transform, it responed.
SUMMARY: V1 is immediately making a transform of the 2D image (neural activity pattern) into an entirely different informational form - sort of the Fourier transform (gabor actual) space. This has one obvious advantage compared to staying in the 2D image form. The transform is NOT changed by shift of the physical location of the neural activity.
That makes recognizing what it is a much easier problem. For example in no way do you see the PlayBoy center fold girl. You "see" her image´s mathematical transform into a function space. Note that if you are told all the Fourier componets of an electrical signal, you know its shape, but not when it occured. Likewise what ultimate is processed more (mainly in the temporal lobes) for identification is, for example recognized as a book, regalrless of where the book is. There are no "retintopic images" in the temporal lobes of that book - only the transformed information in terms of transform components in the Gabor function of the book is identified as a book.