Experience

5 minute read

Applied Research Scientist, Thomson Reuters Lab, India

Period: August 2024 - Current

Legal AI Reasoning and Model Enhancement

Pioneered a legal synthetic data generation pipeline for Process Reward Models (PRMs), creating domain-specific training datasets that improved legal reasoning capabilities on LegalBench benchmark tasks.
Evaluated different Test-Time Scaling paradigms for legal reasoning and inference-time compute optimization.
Architected an IRAC (Issue-Rule-Application-Conclusion) Knowledge Graph framework using Thomson Reuters’ Westlaw corpus and court case data, generating high-quality preference datasets that improved legal reasoning alignment in fine-tuned LLMs.

AI-Powered Legal Document Update System

Built a comprehensive end-to-end LLM workflow pipeline to update Word documents with XML parsing and intelligent contextual mapping for paragraph identification and edit, achieving 70%+ success rate while preserving complete document formatting integrity.
Engineered a human-in-the-loop validation interface with reasoning chains and alert-point mapping, resulting in a 60% reduction in manual processing time while maintaining legal accuracy through strategic human oversight and audit trails.

DocEvolver

Created an MVP for a “Cursor for Word” style extension for updating and understanding MS Word files for lawyer-editors.

Search-and-Replace Agentic System

Architected a multi-agent AI system for automated Word editing with a comprehensive validation pipeline (schema enforcement, content integrity, audit logging), achieving 98% accuracy and 65% reduction in manual content revision.
Constructed error-resolving agents with function calling and multi-turn reasoning to fix XML issues, implementing few-shot learning and self-healing mechanisms. This led to adoption across teams, processing hundreds of documents monthly with sub-8 second processing time per section.

Additional Tools

Truth Social Monitor: Created a monitoring system for Trump’s Truth Social posts with sub-3 second latency, generating automated alerts for Reuters journalists.
Page Flipper: Revived the Page Monitor extension for website tracking, eliminating Visual Ping subscriptions for the team.
These tools provided Reuters with a critical competitive advantage over competitors.

Research Intern, Microsoft Research India

Period: January 2024 - July 2024

Programming with Representations (PwR)

Led backend development for Microsoft’s PwR Studio platform, focusing on the Natural Language to Domain Specific Language (NL2DSL) translation system using GPT-3.5 and GPT-4.
Developed a symbolic translation pipeline that generates finite state machines structured as custom DSL, achieving an 85% reduction in hallucinations.
Formulated rubrics and evaluation loops with error correction over DSL, increasing valid DSL generation from 65% to 95%.

Jugalbandi-Studio-Engine

Architected a Python-based platform that converts DSL into scalable finite-state-machine-based chatbot applications, reducing development time by 80%.
The platform was featured in Satya Nadella’s keynote talks, and I represented Microsoft Research in the pilot project, enabling 15+ non-technical organizations to build AI-powered conversational bots.

Jugalbandi (JB) Manager

Established a chatbot management platform supporting WhatsApp, Telegram, and Web channels with multilingual text and voice capabilities.
Integrated Bhashini Speech models with Azure service failover mechanisms, enabling 70% faster deployment of new chatbots.

Open-sourced Work

PwR-NL2DSL: Natural Language to DSL conversion.
PwR-Studio: Studio environment for Programming with Representations.
Jugalbandi Studio: Open-source chatbot framework.
Jugalbandi Manager: Chatbot management platform.
The complete system was picked up by Bhashini to support chatbots across government initiatives.

Mentors: Sriram Rajamani, B. Ashok, Akash Lal, Sameer Segal

Research Intern, AI Institute, University of South Carolina

Period: December 2022 - April 2024

Master thesis on Knowledge Enabled Multimodal Ingredient Substitution. Built a knowledge graph incorporating 27K ingredients and 40K substitution pairs, enabling precise ingredient recommendations using multimodal and constraint-based search.
Developed an LLM-based query module for the ingredient substitution knowledge graph and submitted this work to AAAI-25.
Resources: GitHub Repository Dataset on Kaggle Dataset used in UC Irvine + Stanford Health Hackathon 2024
Formulated cross-modal recipe retrieval and developed cooking action recognition for recipe analysis, achieving 95% recall and leading to Cook-Gen at IEEE SMC 2023.
Mentors: Revathy Venkataramanan, Dr. Amit Sheth

Visiting Researcher, Societal Computing at Saarland University (SIC)

Period: May 2023 - August 2023

Project: Time and multispectral domain satellite image super-resolution.
Worked on satellite image super-resolution using temporal and multispectral information.
Leveraged high temporal frequency low-resolution data for wildlife tracking and improved disaster analysis through GAN and diffusion approaches.
Technologies: GANs, Diffusion Models, Computer Vision, Remote Sensing, PyTorch
Mentors: Ingmar Weber, Ferda Ofli

Research Intern, University of Maryland, Baltimore (US)

Period: October 2022 - April 2023

Project: Personalized AI Assistant, funded under a HealthCareNLP grant.
Developed personalized response generation models using reward scaling over BART and T5.
Work accepted as K-PERM at AAAI Symposium 2024 and improved NUBIA score by 10%.
Focused on knowledge and persona-aware loss scaling for better response generation.
Technologies: NLP, Information Retrieval, Large Language Models, Conversational Models, Question Answering, Generative AI
Mentor: Manas Gaur

AI Intern, EdgeNeural.ai, Pune, India

Period: June 2022 - August 2022

Project: Accelerated inference with model optimization through quantization and CPU/GPU customization.
Developed training and optimization pipelines for automatic model training and hosting.
Technologies: OCR, Object Detection (YOLO, SSD), TensorRT, GPU Optimization, OpenVINO, Docker, AWS, PyTorch, TensorFlow
Collaborators: Sarvesh Devi, Chidhambararajan, Dhanraj

Research Intern, Video Analytics Lab, IISc Bangalore

Period: May 2022 - August 2022

Implemented StyleGAN-based architectures for disentangled video interpretation across domains.
Improved image/video generation and analysis workflows using GAN-based modeling.
Technologies: GANs, Recurrent Neural Networks, PyTorch, TensorFlow, Python
Mentor: Rishubh Parihar

Research Intern, Visual Learning and Intelligence Lab, IIT Hyderabad

Period: November 2021 - April 2022

Researched medical image processing with Prof. Dr. C. Krishna Mohan.
Developed a novel architecture for improved classification of low-quality images and unbalanced datasets.
Published SFFNet for panoramic dental X-ray segmentation at IEEE APSCON 2023.
Technologies: Medical Imaging, Healthcare, Deep Learning, TensorFlow, PyTorch
Mentor: R Sai Chandra Teja
Collaborators: Dhruv Makhwana, Rohit Pawar

Computer Vision Engineer, AI Mage (WETHEKOO)

Period: March 2021 - April 2021

Developed a fashion tagging engine using deep learning.
Optimized and deployed computer vision models on edge devices to improve product/user outcomes.
Technologies: TensorFlow, Computer Vision, Siamese Neural Networks, Tagging Engine, Segmentation

Software Engineer, Rhizicube Technologies

Period: June 2021 - September 2021

Oversaw server and REST API development and database design for a consumer data platform using Golang (Gin).
Built a real-time streaming data pipeline using Apache Kafka.
Built a LinkedIn scraper and a generalized organization website crawler using Selenium and Beautiful Soup.
Technologies: Back-end Web Development, Relational Databases, Kafka, MySQL, Data Scraping, Go, Database Design, Selenium, Python
Collaborators: Udit Sarin, Yash Goyal

Publications and Papers

Transformers Remember First, Forget Last: Dual-Process Interference in LLMs - arXiv 2025
K-PERM: Personalized Response Generation Using Dynamic Knowledge Retrieval and Persona-Adaptive Queries - AAAI Spring Symposium 2024
Multimodal Ingredient Substitution Knowledge Graph (MISKG) - Dataset Release 2024
Cook-Gen: Robust Generative Modeling of Cooking Actions from Recipes - IEEE SMC 2023
Spatial Field Fusion Network (SFFNet) for Panoramic Dental X-ray Segmentation - IEEE APSCON 2023

Kanak Raj