Humans can acquire various manipulation skills in the real world by understanding the structure of the environment and multisensory integration. It is an important step toward the realization of intelligent agents capable of autonomously acquiring diverse skills like humans to learn a manipulation task by model-based reinforcement leaning with a world model from sensor information consisting of multiple modalities. In this paper, we verify by experiments that the learning speed for the Pick and Place task can be improved by attaching a tactile sensor to the end-effector of a robot arm and using it as an input to the world model. We also discuss the need for unified learning environment setup for manipulation tasks.