Conducted by a team at Edinburgh University, the research investigated the capabilities of multimodal large language models (MLLMs) to answer time-related questions by looking at pictures of clocks or calendars. The AIs were tested using various clock designs, including some with Roman numerals, with and without second hands, and different coloured dials. It was found that the AI systems interpreted the correct clock-hand positions less than a quarter of the time.
Roman numerals or stylised clock hands induced more mistakes. According to the Edinburgh team, the AI systems did not perform any better when the second hand was removed, suggesting fundamental issues with hand detection and angle interpretation.
AI models were also tasked with…