The extracted image is saved as the JPEG format and each image has a unique ID or name that represent writer gender,
hometown, age, education level, and serial number. In order to store the writer information which helps to identify
that person using an ID. So that this dataset not only uses for handwritten recognition but also help to predict a
person gender, age and his or her location as well as it can help investigators focusing more on a certain category
of suspects and forensic purposes. This id or name fixed according to the following criteria, Its first one digit
indicates writer gender. If it is 0 which means the writer was male and1 means writers were female. After that have
writer home district names First 3or 4 letter, the next one represents age and then their education or occupation
level (0 means primary level, 1 high school level, 2 means college level, 3 means university and 4 means other
occupation) and the last one is the serial number. And that information is separated by an underscore ( _ ). Here an
example.
Here the first digit one so it was written by a male writer and he is from Dhaka district the next one is 20 which
means his age 20 and he is a university student and the last one is the serial number of male data.
MatriVasha the largest dataset of handwritten Bangla compound characters for research on handwritten Bangla compound character recognition. The proposed dataset contains 120 different types of compound characters that consist of 306,464 images written where 152,950 male and 153,514 female handwritten Bangla compound characters. This dataset can be used for other issues such as gender, age, district base handwriting research because the sample was collected that included district authenticity, age group, and an equal number of men and women.
Rabby A.S.A., Haque S., Islam M.S., Abujar S., Hossain S.A. (2019) Ekush: A Multipurpose and Multitype Comprehensive Database for Online Off-Line Bangla Handwritten Characters. In: Santosh K., Hegadi R. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2018. Communications in Computer and Information Science, vol 1037. Springer, Singapore
Cite MatriVashaFerdous, J., Karmaker, S., Rabby, A. S. A., and Hossain, S. A. (2020). Matrivasha: A multipurpose comprehensive database for bangla handwritten compound characters. CoRR, abs/2005.02155