Multimodal artificial intelligence seeks to fuse these different types of data and take advantage of their complementarity to improve understanding and performance in areas such as natural language processing, computer vision, and speech recognition, among others