Sonu Dixit

About Me

Hello! I like to apply maths and neural networks to solve real-world problems. I have a strong foundation in AI and hands-on experience across diverse domains. I have worked on LLM fine-tuning, large-scale information retrieval, question-answering, and reinforcement learning. Please refer to the experience section for more details on my projects.

I earned my Master's degree in Artificial Intelligence from the Indian Institute of Science in 2019. During my Masters, I worked on multi-agent reinforcement learning for traffic signal control. In the industry, I have contributed to [24]7.ai and Disney Star India Pvt Ltd.

Outside of work, I enjoy swimming, running, and spending time in nature through long walks and treks.

Industry Experience

LLama 3 Finetuing for Customer Chat Conversation

We finetune LLama 3.1 8B model on Agent Customer conversation data.
After fine-tuning, the model shifted from generating verbose responses to concise, agent-like replies.
It began personalizing conversations by using customer names and adopting an agent identity, which was absent before fine-tuning. This effect might also be achievable through prompt engineering.
Performance significantly improved in initial and final conversation turns.
However, the model struggled to generate relevant responses during middle turns that required external information.
Performance on instruction-following benchmarks declined, likely due to the lack of diversity in training instructions. To address this, I recommend using a mixed dataset—70% chat data and 30% diverse tasks (e.g., instruction-following, reasoning, math, etc.).

Intent Classification

We classify customer queries into predefined intents to enable automated handling by our Chatbot. Our trained model outperforms the existing solution and LLMs including GPT-3.5.
Transformer Based multiple network(s) training.
Classes are added/removed with time. Final solution design must take this into account.
We find manual finetuning to be better than data efficient methods like SetFit
We compare against llms (GPT, Mixtral, and others) using few shot prompt-tuning and finetuning(generation based classification)
We design and conduct experiments to estabilish the approach can understand client-useful words even when its OOV for the model.
We conduct experimens to study its performance on OOD data, and how to minimise the risk.
We found bias in models behaviour. We design solutions to handle such bias.

Information Retrieval and Conditional Generation

Trained a DistilBERT on the MSMarco dataset to proactively recommend relevant articles, achieving comparable performance on the TREC21 and TREC20 datasets.
We train multiple formulations Bert_dot, Bert_cat, ColBert for ranking task. Bert_cat outperforms other formulations.
Bert_cat requires a lot of inference time compute. To reduce latency We perform a dual distil from ensemble scores of Bert_cat and ColBert to the DistilBERT
The method is inspired from Efficiently teaching an effective dense retriever with balanced topic aware sampling
We explore multiple first level rankers (TF-IDF, BM25, LSH, PQ-Encoding)
We finetune the trained DistilBERT on our client specific data.

Recommendation from a database of Question-Answer pairs. Upon finetuning, We outperform the exisiting solution.
Conditional response generation for an ongoing conversation. Results on public data team_name-Test_Team_Name

End to end training of RAG system using posterior guided retriever. Here we use the trained DBert as the retriever.
We manually evaluate the trained system on a client specific data (web-scraped client website data and questions)
We worked on ATLAS: Few-shot Learning with Retrieval Augmented Language Models

LLM as Co-Pilot

We utilize LLMs to generate contextually relevant responses during conversations.
We append the LLM context with relevant historical information, using the retriever trained in the IR task.
We designed an identity task (similar to Needle in a Haystack) to evaluate the LLM performance.

Compliance Evaluation and Early Warning System - Agent-Customer Chat

We pass the quantifiable metrics to LLM.
Prompt-Engineering, We use LLM-as-judge for evaluation.

Named Entity Recognition by Question Answering

Different clients need different granular entities(start_date_of_reservation, flight_time, ..). Traditional NER requires training data for the specific entities.
We formulate this as QA problem, where question is asked for client specific entities. Model needs to Generate/extract the answer(entity). The same underlying QA-system can generalise to multiple clients.
It reduces client-specific entity extraction finetuning.

Content Analysis and Storyline Extraction

Quantified character attributes using text, image, and audio data, correlating features with viewership metrics.
Episode vector learning, social relationship, face recognition. Website

Entity extraction from Semi-Structured Documents

Used HMM to extract values corresponding to predefined keys from documents.
Implemented the Viterbi and the scaled Baum-Welch algorithm for learning parameters.

Publications

Zero-Shot Generalization using Intrinsically Motivated Compositional Emergent Protocols

Authors: R Hazra*, Sonu Dixit* — NAACL 2021 Workshop (ViGIL)

Latest Preprint:Intrinsically Motivated Compositional Language Emergence

We argue intrinsic rewards increase compositionality in emergent communication between two Agents. The improved compositionality increases zero shot generalisation in the downstream tasks. website

gComm: Environment for Investigating Generalization in Grounded Language Acquisition

Authors: R Hazra, Sonu Dixit, S Sen — NAACL 2021 Workshop (ViGIL)

This work introduces gComm, an environment designed to evaluate grounded language acquisition. We focus on how agents generalize language understanding in complex, multi-agent scenarios.

M.Tech Thesis: Adaptive Traffic Signal Control using Multi Agent RL

Implemented multi-agent reinforcement learning (MARL) to dynamically adjust traffic signal durations based on congestion.
Algorithm: Proximal Policy Optimization (PPO) with Advantage Actor-Critic. Report
Simulated traffic data using PTV Vissim and demonstrated MARL’s superiority over fixed-time algorithms in terms of average speed, delay, and lane occupancy. Results
Each Signal is represented as a Neural Network. During training multiple small neural networks are trained at same time. The networks are connected via the reward that they get.
Advisor - Prof Shalabh Bhatnagar

Contact

Email: sonudixit2k@gmail.com

LinkedIn: linkedin.com/in/sonudixit