Technical Reports

Publications

Deep learning based question answering system in Bengali
Abstract

Recent advances in the field of natural language processing has improved state-of-the-art performances on many tasks including question answering for languages like English. Bengali language is ranked seventh and is spoken by about 300 million people all over the world. But due to lack of data and active research on QA similar progress has not been achieved for Bengali. Unlike English, there is no benchmark large scale QA dataset collected for Bengali, no pretrained language model that can be modified for Bengali question answering and no human baseline score for QA has been established either. In this work we use state-of-the-art transformer models to train QA system on a synthetic reading comprehension dataset translated from one of the most popular benchmark datasets in English called SQuAD 2.0. We collect a smaller human annotated QA dataset from Bengali Wikipedia with popular topics from Bangladeshi culture for evaluating our models. Finally, we compare our models with human children to set up a benchmark score using survey experiments.

Status: Published
Received 17 Jun 2020, Accepted 03 Oct 2020, Published online: 23 Nov 2020
DOI: 10.1080/24751839.2020.1833136
Publication: Journal of Information and Telecommunication, Volume 5, 2021 - Issue 2, Taylor & Francis.

Soil Analysis and Unconfined Compression Test Study Using Data Mining Techniques
Abstract

In this study, Random Forest Regressor, Linear Regression, Generalized Regression Neural Network (GRNN) and Fully connected Neural Network (FCNN) models are leveraged for predicting unconfined compression coefficient with respect to standard penetration test (N-value), depth and soil type. The study is focused on a particular correlation of undrained shear strength of clay (Cu) with the standard penetration strength. The data used is from 14 no. ward in Mymensingh and Rangamati districts which are situated in Bangladesh. By using this data, the study tries to solidify the correlation of SPT (N-value) with Cu. It also tries to check the goodness of the relationship by comparing it with unconfined compression strength values gained from the unconfined compression test calculated from the field by experts.

Status: Published
First Online: 19 November 2020
DOI: 10.1007/978-3-030-63119-2_4
Publication: International Conference on Computational Collective Intelligence, ICCCI 2020: Advances in Computational Collective Intelligence pp 38–48, Springer.

Analysis of Soil and Various Geo-technical Properties using Data Mining Techniques
Abstract

In this study, General Regression Neural Network(GRNN), Artificial Neural Network (ANN), Fully Connected Neural Network (FCNN), Support Vector Regression (SVR) and Linear Regression (LR) models have been implemented in order to predict the composition of soil with respect to the Standard Penetration Test (SPT), and soil depth. The primary focus has been on determining a significant correlation between the soil composition with SPT value and depth. Data sets have been used from ward 14, Mymensingh district of Bangladesh and from a construction project along India-Myanmar border. In this study, 8 types of soil, namely, fine sand, silty clay, clayey silt with fine sand, clayey silt, fine sand with silt, silty fine sand, sandy silt, and rubbish has been classified, and the probability of obtaining the soil type classification has been determined.

Status: Published
Date Added to IEEE Xplore: 18 September 2020
DOI: 10.1109/IS48319.2020.9199941
Publication: 2020 IEEE 10th International Conference on Intelligent Systems (IS).

Visualizing Bangla Word Embeddings using BERT
Abstract

BERT, Bidirectional Encoder Representation from Transformers is a masked language model that has created a stir in the Machine Learning people community by exhibiting state of the art results in about a wide assortment of NLP problems, including Question Answering (SQuAD v1.1), Natural Language Inference (MNLI), and others. In this project , we use BERT to extract features and word embedding vectors from Bangla text input.

Text Summarization on COVID-19 Articles
Abstract

Summarization has long been a significant area of concern in the field of Natural Language Processing, mostly due to the its dependency on human intervention. In this study, we use a denoising autoencoder, BART, to carry out abstractive summarization on the dataset of COVID-19, using ROUGE scores as an evaluation metric. The primary focus of the study has been the use of medical articles based on COVID-19, which is aimed to provide a significant support for the research purpose during the pandemic.

Analyzing Automatic Text Summarization, Where are We and What's Next
Abstract

Summarization is the task of gathering bits of texts to a shorter adaptation that contains the principle information from the source document. A lot of progress has been achieved recently using NLP to attain high standard summary. This paper is an extensive analysis on the trends seen in text summariza- tion. It mainly focuses on the two main kinds: extractive and abstractive text summarization. There are quite a few obstacles in achieving high quality summarization, mainly because of its subjectivity. The evaluation metrics like ROUGE support extractive summarization, and there has been significant progress in this field. But it is believed that to achieve truly high quality summarization capability, we must progress with abstractive summarization, because this way it will be close to human-like summarization.

Techniques Applied to Topic Segmentation
Abstract

Topic segmentation is an important initial step in most natural language processing tasks. It aims to determine the border between the top blocks in a text, helping in semantic analysis. This technique is used to improve the access to information. This technique helps in detecting the segments of topics from the entire text data. When the segments are determined , they can be used to create