Yes, first-person always seems harder to visualize. I think the reason lies with the "picture" you view while visualizing.
In third-person (e.g. a movie), the "picture" is very smooth. It continues with its rectangular view, and pans and zooms the world, and you see the world and everything in it with smooth transitions, different views, and, most importantly in my position, stillness.
In first-person and living day, our eyes naturally jerk from point A to B to C, and we have one view (e.g. through our eyes at exactly 5'2" height) with shakiness caused by our moving head.
Because first-person is more rigid and strict, whereas third-person has more possibilities, I feel that a lot of the time visualization happens in third-person.
Bookmarks