Synthetically Spoken STAIR
A spoken extension of the STAIR dataset (Japanese)
This dataset consists of synthetically spoken captions for the STAIR dataset. Following the same methodology as (Chrupała et al., 2017) we generated speech for each caption of the STAIR dataset using Google’s Text-to-Speech API.
The dataset contains 616,767 spoken captions (representing roughtly 790+ hrs of speech). This dataset includes audio recordings but DOES NOT include the original images which should be downloaded separately from the MSCOCO website.
The dataset is fully described in this paper (Havard et al., 2019) and may be downloaded on Zenodo. If you use this data in your own publications please cite our ICASSP paper.
Sample
Caption | Audio |
---|---|
靴に入って猫がぐっすりと寝ている The cat is sleeping soundly in his shoes |
|
玄関の靴の上に猫が座って靴をかじっている A cat sits on the front door shoes and bites the shoes |
|
靴やサンダルにじゃれている猫 Cat playing with shoes and sandals |
|
猫がサンダルや靴の上で寛いでいる Cat is relaxing on sandals and shoes |
|
猫が靴やサンダルの上に乗って寝転んでいる Cat is lying on shoes or sandals |
See paired image on the MSCOCO website (Image ID: 100008). Translations are only given for illustration purposes and are not included in the data set (translations generated by Google’s MT system). Japanese captions are not included either and are to be downloaded from the STAIR website.
References
- _layouts/bibliography.html
- _layouts/bibliography.html