Modality Translation: From Image To Sound

  • Jiaxuan Wang

Student thesis: Master of Philosophy


A software system that performs signal modality translation is described, tested and evaluated. Motivated to develop a system with the functions of aiding visual impaired and blind people in navigation and location, the system uses a live video stream as its input; modality translation is applied and produces a live audio stream as the result. The audio signals generated are determined and modulated by statistical parameters derived from the image data. The statistical parameters are calculated by methods including the RGB channels split, the Fast Fourier Transform, grey-Level co-occurrence matrix algorithms, and other feature extraction algorithms. The real-time modulation of audio signals is based on chords composed of five tones, of which each tone’s frequency is determined by one to two statistical parameters and their corresponding coefficients that are obtained from repeated tests and trials. Further, a graphical user interface (GUI) track bar has been created for users to personally adjust two modulation factors within a fixed range. The work exploits real-time Digital Signal Processing (DSP) techniques and was applied within a software development framework based on OpenCV. This system met all of the design and performance objectives; the output sound is continuous, in real time, easy to distinguished for each captured image or video stream and pleasant to the ear. The system represents an initial exploration of using real-time DSP to modulate sound in response to live-feed video data, and further offers a potential route forwards for developing a wide range of techniques and systems in the area of acoustic assisted technologies.
Date of Award31 Dec 2018
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorPatrick Gaydecki (Supervisor) & Anthony Peyton (Supervisor)


  • OpenCV
  • Signal Processing
  • Light to sound
  • Image to Sound
  • Image Processing

Cite this