Rahul Das

Experience

CRO Intern - Data Science & Risk Analytics

Deutsche Bank | Mumbai, Maharashtra, India · On-site | Jan 2025 - June 2025

Credit Risk Analysis & Monitoring: Performed credit risk analysis and assessed exposures at industry and counterparty levels, while monitoring key risk indicators within Corporate and Investment Banking.
Natural Language to SQL Automation: Built a secure Natural Language to SQL bot using LLMWare and in-house LLMs (Mistral & Gemini), enabling analysts to query confidential credit-risk data from on-prem SQLite databases by leveraging a Retrieval-Augmented Generation (RAG) layer for large, complex schemas.
PDF Table Extraction & RAG Analytics: Developed a RAG-powered AI pipeline to extract structured tables from investment banks’ PDF pillar files, ingesting them into a vector store for scalable, accurate collation and cross-document metric comparison.
Risk Reporting Automation & Trend Analysis: Created Python-based automations and analytical tools to generate weekly reports, automate daily and weekly breach detection, and observe long-term industry-level risk trends, improving monitoring and remediation efficiency.
Threshold Management System Development: Built an on-prem web app using Flask/Jinja, SQLite, and a Python backend (Pandas, Openpyxl, SQLAlchemy) with a JavaScript, HTML, and CSS UI. The system handles dynamic dashboards, complex data pipelines, automated report generation, and email alerts triggered by threshold breaches, expirations, activations, and risk fluctuations.

Deep Learning Intern

CORE Diagnostics | Remote | Dec 2023 - Feb 2024

Research & Meta-Analysis: Conducted a critical review and meta-analysis of bladder cancer detection and classification using AI, focusing on deep learning algorithms for invasive and non-invasive tumor WSI images.
Model Development & Transfer Learning: Developed a bladder cancer classification model using DenseNet121 with transfer learning, data augmentation, and a normalized dataset for multi-class classification (T0-T4 stages).
Model Evaluation & Visualization: Employed CrossEntropyLoss, Adam optimizer, and a learning rate scheduler, and evaluated the model using Grad-CAM for visual interpretability and accuracy assessment.
Guidance & Supervision: Worked under the guidance of Dr. Sambit K. Mohanty, an Oncopathologist and researcher in AI for healthcare and bladder cancer detection.

Data Science Intern

CSM Technologies | Bhubaneswar, Odisha, India · Hybrid | Aug 2023 - Nov 2023

Data Preprocessing & EDA: Cleaned and organized large transactional datasets, handled missing values, normalized features, and performed exploratory data analysis (EDA) with Matplotlib and Seaborn to identify spending patterns and unusual activity spikes.
Fraud Detection & Predictive Modeling: Implemented unsupervised models (Isolation Forest, Autoencoders) for fraud anomaly detection and trained ARIMA and Prophet models for spending pattern forecasting, optimizing model performance through hyperparameter tuning.
Feature Engineering & Dimensionality Reduction: Extracted behavioral indicators and applied PCA for dimensionality reduction, improving model accuracy and computational efficiency.
Model Deployment & Integration: Integrated models into a Flask backend with REST APIs, storing predictions and transaction data in an SQLite database, gaining hands-on experience in building scalable data-driven solutions.

ML-CV Intern

Council of Scientific and Industrial Research | Bhubaneswar, Odisha, India · On-site | May 2023 - Jul 2023

Image Processing & Pipeline Development: Enhanced the Eye-On-Pellet system by developing a robust image processing pipeline with Gaussian smoothing, adaptive thresholding, Canny edge detection, and Hough Circle Transform for accurate pellet size detection.
Time Series Analysis & Forecasting: Applied machine learning algorithms (Linear Regression, Random Forest, SVM, Gaussian Process, Prophet) to predict pellet size variations and optimize key process parameters in pelletization.
Impact on Pelletization Industry: Enabled real-time pellet distribution visualization and accurate size forecasts, improving process optimization and product quality in iron ore pellet manufacturing.
Rebar Counting System Development: Developed an image processing-based solution to accurately count rebars in bundles using similar techniques from the pellet size prediction project, focusing on robustness under varying lighting conditions.

Projects

Image Generation using GAN, WGAN, DCGAN - A comparative analysis

This project involves a comparative study of Basic GAN, Wasserstein GAN (WGAN), and Deep Convolutional GAN (DCGAN) using the MNIST and Fashion MNIST datasets. The study evaluates the performance of these GAN architectures by analyzing key outputs such as generated images and loss plots. Basic GANs, while pioneering in generative modeling, often suffer from challenges like mode collapse, where the generator produces limited diversity in outputs, and vanishing gradients, which hinder the generator's ability to learn effectively during training. These issues are particularly pronounced when training deep networks, as the discriminator becomes too strong, leading to unstable learning dynamics.

In contrast, WGANs address these problems by replacing the traditional loss function with the Wasserstein distance, which provides more stable gradients, helping the generator update more effectively even when the critic (discriminator) is strong. DCGANs, on the other hand, leverage convolutional layers to improve feature extraction, leading to sharper and more realistic images. My analysis shows that both WGAN and DCGAN mitigated common pitfalls of Basic GANs, leading to more stable training, better gradient flow, and improved diversity in generated outputs. This project highlights the practical advantages of WGANs and DCGANs in overcoming the limitations of basic GAN architectures.

View on GitHub

Room Booking Portal Backend

Developed a Java Spring Boot backend for an online hotel room booking system using a Controller-Service-Repository architecture. The system was built and tested on an H2 database with RESTful APIs for room booking, user authentication, and booking management. Postman was used for API testing to ensure seamless functionality and robust error handling, with a focus on efficient data processing, scalability, and easy future integration.

Implemented REST API controllers to manage key endpoints such as room booking, user login (/login), user registration (/signup), and retrieving booking details. These controllers processed HTTP requests, invoked service methods, and returned appropriate responses based on booking status and user actions.

The service layer handled the business logic for room availability checks, booking creation, user authentication, and error handling. Repository interfaces facilitated database interactions for CRUD operations on bookings, users, and room data. Data Transfer Objects (DTOs) were utilized to structure responses and ensure efficient, consistent data exchange across layers.

View on GitHub

Social Media Backend

To implement the social media platform features using Java Spring Boot, I utilized the Controller-Service-Repository architecture to ensure a clean and organized codebase:

Developed REST API controllers to handle various endpoints such as user login (/login), user registration (/signup), retrieving user details (/user), and managing posts and comments. These controllers processed HTTP requests, invoked service methods, and returned appropriate responses based on the request outcomes.

Implemented service classes to manage the business logic for user authentication, post creation, and comment management. The service layer was responsible for processing requests, applying business rules, and handling errors, such as invalid credentials or non-existent posts.

Created repository interfaces to interact with the database for CRUD operations on user, post, and comment data. Data Transfer Objects (DTOs) were used to encapsulate data exchanged between layers and to structure the responses sent to the client. This approach ensured that data was managed efficiently and consistently throughout the application.

View on GitHub

Air Traffic Control System

Designed and implemented an Air Traffic Control System to manage aircraft operations, including takeoffs, landings, and ground movements. Leveraged Ubuntu POSIX commands to establish real-time inter-process communication, which was crucial for maintaining accurate and timely updates across various system components. This approach ensured data integrity and coordinated operations effectively, allowing for seamless management of aircraft within a simulated air traffic environment. The system's design aimed to handle concurrent processes efficiently, mirroring real-world requirements for air traffic control and enhancing operational reliability.

View on GitHub

EnigmaChat

Developed EnigmaChat, a secure communication tool leveraging cryptographic techniques to ensure message confidentiality and integrity. Implemented Fernet encryption, which utilizes AES (Advanced Encryption Standard) with HMAC (Hash-based Message Authentication Code) to safeguard messages against unauthorized access and tampering. This approach combined cryptographic libraries such as cryptography and hashlib to provide robust encryption and authentication mechanisms, ensuring that all communications within the application remained private and secure.

Designed and built the server component to manage user sessions and handle commands, ensuring efficient and reliable communication between clients. Developed a client-side interface using Tkinter, featuring real-time chat functionality with an intuitive and user-friendly design. The integration of Tkinter with colorama enhanced the user experience by providing a visually appealing chat environment with dynamic color coding for messages.

View on GitHub

Artificial Intelligence based Bladder Cancer Detection and Stage Classification

Conducted a critical review and meta-analysis on Bladder Cancer and AI detection and classification. Conducted thorough research through various research papers and medical journals concerning Bladder Cancer and AI (Deep Learning) to figure out best deep learning algorithms for image segmentation tasks on WSI slides.

Developed a bladder cancer classification model using transfer learning with DenseNet121. Augmented data with techniques like horizontal flipping and rotation, trained on a normalized dataset. Employed CrossEntropyLoss and Adam optimizer for multi-class classification, along with a learning rate scheduler. Evaluated model on a validation set and utilized Grad-CAM visualization for interpretation. Demonstrated efficacy in predicting bladder cancer stages (T0-T4) through deep learning, transfer learning, and visualization.

Hotel Management System

Developed a comprehensive Hotel Management System using C on Ubuntu, focusing on efficient handling of room bookings, check-ins, and check-outs. Utilized shared memory to enable multiple processes to access and update booking information concurrently, ensuring data consistency and minimizing conflicts. Semaphores were implemented to synchronize access to critical sections, preventing race conditions and ensuring reliable transaction processing. Pipes were employed for inter-process communication, facilitating smooth data exchange between different components of the system and improving overall system performance and responsiveness.

View on GitHub

IoT-based Object Detection Using Deep Learning

Integrated a YOLOv8 deep learning model into a Raspberry Pi4. The choice of Raspberry Pi4 was driven by its compact size and excellent performance in processing deep learning models. The system also included an Arduino Uno connected to an ultrasonic sensor (HC-SR04) to detect object proximity. The project also aimed to create an alert system using proximity sensors for vehicles. A live camera was interfaced with the Raspberry Pi4 to provide real-time video input to the YOLOv8 model. This allowed the system to detect objects or anomalies in its environment. When the ultrasonic sensor detected an object within a specified threshold distance, a buzzer was triggered, providing an alert.

The combination of deep learning for object detection and IoT components for real-time sensing created a versatile system with applications in security and surveillance, facial and gesture recognition, smart cities planning, sports data analysis, and traffic management. The successful implementation showcased the potential of combining ML/DL techniques with IoT for practical solutions in various domains.

The project initially presented as "IoT-based Road Anomaly Detection using Deep Learning" demonstrates a versatile system with broader applications beyond road obstacle detection. The implemented system, featuring a YOLOv8 deep learning model integrated into a Raspberry Pi4, along with an Arduino Uno connected to an ultrasonic sensor, was initially designed to detect anomalies on roads. However, with minimal modifications, the system can be repurposed for various applications beyond road-related scenarios. The flexibility of the YOLOv8 model, combined with the real-time sensing capability of the IoT components, allows the system to adapt to different environments and purposes. Whether applied to security and surveillance, smart city planning, or traffic management, the system showcases its versatility by efficiently detecting objects or anomalies in diverse settings.

View Conference Proceeding

Communities and Crime: Machine Learning Predictions and Classification

Conducted a detailed comparative analysis of three machine learning models – Decision Trees, AdaBoost, and Support Vector Machines (SVM) – to predict community crime rates. The dataset, amalgamating socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR, comprised 128 socio-economic features. I initiated the exploration by preprocessing the dataset, handling missing values, and dropping non-numeric columns. The subsequent exploratory data analysis provided insights into the dataset's structure. For Decision Trees, I implemented a model with entropy-based splitting, optimizing its max depth for better accuracy. AdaBoost, an ensemble method, was then explored with varying max depth values. Finally, I delved into Multiclass SVM, adjusting parameters like learning rate and regularization strength, and scrutinized its accuracy over different epochs. The project culminated in a comprehensive understanding of each model's performance, aiding in the selection of an optimal algorithm for crime rate prediction in communities. Notably, this project was accomplished without resorting to machine learning libraries like sklearn (scikit-learn), emphasizing a pure Python approach.

View on GitHub

Early Detection of Heart Disease Using Machine Learning

Conducted an in-depth analysis of a heart disease dataset collected from multiple locations, including the Cleveland Clinic Foundation, Hungarian Institute of Cardiology, University Hospital in Zurich, Switzerland, and V.A. Medical Center in Long Beach, CA. After meticulously preprocessing the data, handling missing values, and normalizing features, I implemented various machine learning models, such as Naive Bayes, Perceptron, and an Artificial Neural Network (ANN), for heart disease prediction. Additionally, I explored the realm of ensemble learning with XGBoost. Model performance was evaluated using diverse metrics like accuracy, precision-recall curves, ROC curves, and confusion matrices, providing a comprehensive understanding of each algorithm's strengths and limitations. Notably, this project was accomplished without resorting to machine learning libraries like sklearn (scikit-learn), emphasizing a pure Python approach.

View on GitHub

Diabetes Prediction and Classification: Analysis and Optimization of Machine Learning Models

Performed preprocessing and exploratory data analysis on a diabetes dataset. I implemented Stochastic Gradient Descent (SGD) and Batch Gradient Descent for linear regression, comparing their performance in predicting diabetes outcomes. Additionally, I explored Lasso and Ridge Regression with polynomial features, optimizing hyperparameters for accurate predictions. I further implemented Logistic Regression and Least Squares Classification, assessing their performance in classifying diabetes outcomes. The project aimed to provide insights into the strengths and weaknesses of various machine learning algorithms for diabetes prediction and classification tasks. Notably, this project was accomplished without resorting to machine learning libraries like sklearn (scikit-learn), emphasizing a pure Python approach.

View on GitHub

Automatic Rebar Counting using Image Processing

The primary objective of our system is to accurately determine the number of individual rebars within a given bundle, even under varying lighting conditions.

Our approach involves a series of image processing steps, such as Gaussian smoothing, adaptive thresholding, Canny edge detection and Circle Hough Transform (CHT) calculation, to preprocess the input images effectively and perform the rebar counting task with high precision.

The technicalities of this project was inspired by the image processing and pellet detection part of my previous project - ML based Iron ore pellet size prediction.

View

Machine Learning based Iron Ore Pellet Size Prediction

During my Research Internship at CSIR-IMMT, under the guidance of Dr. Santosh Kumar Behera, Principal Scientist, I enhanced the Eye-On-Pellet system, a patented image processing-based system used for real-time size analysis of iron ore pellets in the pelletization industries. The project focused on improving image processing and time series analysis.

In image processing, I developed a robust pipeline involving Gaussian smoothing, adaptive thresholding, Canny edge detection, and Hough Circle Transform to accurately detect and categorize pellets based on their sizes.

For time series analysis, I explored machine learning algorithms like Linear Regression, Random Forest Regressor, Support Vector Regressor, Gaussian Process Regression, and Facebook's Prophet model to predict future pellet size variations.

The outcomes hold immense significance for the pelletization industry, enabling real-time pellet distribution visualization and accurate size forecasts. This empowers manufacturers to optimize critical process parameters like feed rate of iron ore and organic binders like bentonite (TPI), RPM of pelletizer disc, inclination of disc, water feed rate etc, ensuring consistent product quality and efficiency.

In conclusion, the integration of image processing and time series forecasting in the Eye-On-Pellet system significantly impacts the pelletization industry, enhancing product quality and competitiveness.

View on GitHub

About Me

Skills

Programming Languages

Software Development

Data Science

Machine Learning

Deep Learning

Computer Vision

NLP

Experience

CRO Intern - Data Science & Risk Analytics

Deep Learning Intern

Data Science Intern

ML-CV Intern

Education

Birla Institute of Technology and Science, Pilani

Doon International School

D.A.V. Public School, Chandrasekharpur

Projects

Image Generation using GAN, WGAN, DCGAN - A comparative analysis

Room Booking Portal Backend

Social Media Backend

Air Traffic Control System

EnigmaChat

Artificial Intelligence based Bladder Cancer Detection and Stage Classification

Hotel Management System

IoT-based Object Detection Using Deep Learning

Communities and Crime: Machine Learning Predictions and Classification

Early Detection of Heart Disease Using Machine Learning

Diabetes Prediction and Classification: Analysis and Optimization of Machine Learning Models

Automatic Rebar Counting using Image Processing

Machine Learning based Iron Ore Pellet Size Prediction

Contact