In the language-learning process, young learners must map labels not only onto concrete objects, but also onto abstract relational terms, like verbs. This task will require the learner to discover the perspective that speakers are taking on objects and events in the world as they discuss them, and to successfully determine those aspects of the world that are being referenced by an utterance. Several sources of information are implicated in this process (syntactic, contextual, and social cues to a speaker's interpretation of events), but given the complexity of linguistic processing, the way and degree to which each source of information is used in different language-learning environments remains a largely open issue. Examining the verbal responses of individuals describing events, the actions of individuals interpreting utterances, and the eye movements of individuals in each of these paradigms, this research seeks to illuminate some of the processes underlying perspective-taking skills in language learning and the effects these skills have on the resolution of ambiguity in both vision and language, with a particular emphasis on the contributions of social cues to a speaker's intentions and attentional state. Results demonstrate that the way speakers allocate their attention when apprehending events influences both their own interpretation of these events and the listener's ability to determine the correct interpretation of their referential utterance. This successful referential assignment in turn guides the listener/learner's word-learning inferences, but contributes only minimally to syntactic ambiguity resolution, with linguistic factors such as lexical biases playing a much more significant role in syntactic parsing. These findings demonstrate the flexible and probabilistic nature of language processing, with shifting emphasis on different channels of input as the task at hand demands.