ReadMe for accessing IIIT-HWS dataset which comprises of handwritten synthetic word images. ==============Dataset Details================ Image Format: .png Image Size: 48x128 px #Word Images: 90M Vocabulary Size: 90K #Word Images/class: 100 #Synthetic fonts used for rendering: ~750 ============Image Data====================== 1. Extract iiit-hws.tar.gz file. 2. Image Directory structure: Images_90K_Normalized//.png .. =============Ground Truth==================== The label information and train/val/test split is provided as a matlab struct datatype which can be accessed using the IIIT-HWS-90K.mat file using the matlab command line interface. >> load IIIT-HWS-90K.mat >> list list = ALLlabels: [1x8817036 single] ALLnames: {1x8817036 cell} ALLtext: {1x88172 cell} TRNind: [6612779x1 single] VALind: [1322551x1 single] Struct fields information:- 1. list.ALLtext: Ground truth text for all the unique words present in the vocabulary. 2. list.ALLlabels: class id for each image in the dataset. This would be range from {1..#VocabularySize} and refers to the array index in list.ALLtext 3. list.ALLnames: relative file path for each word image from root word image directory. There is a 1-1 mapping of list.ALLnames and list.ALLlabels 4. list.TRNind: Pointer to list.ALLnames field which lists out the word image used for training. 5. VALind: Similar to TRNind but for validation dataset. NOTE: In our ECCV paper, we have used a subset of IIIT-HWS dataset using only 10K vocabulary. The ground truth file for the same can obatined in IIIT-HWS-10K.mat file kept in the same directory. ===========Citation=========== If you are using the dataset, please cite the below arxiv paper:- 1. Praveen Krishnan and C.V. Jawahar, Generating Synthetic Data for Text Recognition, arXiv preprint arXiv:1608.04224, 2016. If you are comparing our method for word spotting/recognition, please cite the below relevant papers:- 1. Praveen Krishnan, Kartik Dutta and C.V. Jawahar, Deep Feature Embedding for Accurate Recognition and Retrieval of Handwritten Text, ICFHR 2016. 2. Praveen Krishnan and C.V. Jawahar, Matching Handwritten Document Images, ECCV 2016 ============Contact============= Incase of any doubts, please contact the author using below details:- Author Name: Praveen Krishnan Author Email: praveen.krishnan@research.iiit.ac.in Doc Ver: v1.0