Vatsal Parsaniya

Data Scientist at Embibe

vatsalparsaniya@gmail.com

Hi! My name is Vatsal Parsaniya. A technology enthusiast, working to solve real-world problems and trying to bring a change, who enjoys connecting the dots: be it ideas from different disciplines, people from different teams, or applications from different industries.

I am a Kaggle 3X Expert and keen focus on building accurate, value-generating models that are scalable. Passionate about search-based systems!! I love to extract knowledge & insights from huge and complex data sets. The power of questioning and extracting answers from various data sets has always fascinated me.

In my undergraduate studies, I’ve taken on various leadership roles, including the development of chess playing robortic arm, coordinating institute-wide technical events for hundreds of people, and leading the robotics club of the college.

Interests

Search
Multilingual Information Retrieval
Natural Language Processing
Machine Learning

Education

B.Tech in ICT (Information and Communication Technology), 2021

PDEU (Pandit Deendayal Energy University)

Experience

Data Scientist

Embibe

Sep 2021 – Present Bengaluru

I closely collaborate with the product team within the Discovery Search Science group to transform intricate business challenges into data science problem statements. My role revolves around enhancing the search experience for users by optimizing multilingual search outcomes across a wide range of customer products and internal tools and also engage in both offline and online assessments with the engineering team to ensure the feasibility of our solutions, utilizing methodologies like NLP and Rule-Based systems.
Retrieval Augmented Generation (RAG):
- Involved in development of Retrieval Augmented Generation for search and chatbot applications. This entails retrieving academic content with ontologies information and dynamically selecting prompts to generate contextually relevant responses from a generative model.
Multilingual Semantic Search: Developed Semantic Similar entity based Search
- Finetuned MiniLM (SBERT Sentence Transformer Model) for semantic similarity for our academic/Indian domain entities, which helped us improve search retrieval and relevancy for user queries by 8%.
- Implemented results re-ranking utilizing the LambdaMART algorithm, which is a gradient boosting-based ranking algorithm. This significantly improved results ranking on our golden dataset compared to using embedding similarity for ranking.
- Integrated embedding model into the NVIDIA Triton Inference Server and incorporated a vector database into the hybrid search pipeline.
- Developed search capabilities in 11 Indic languages, and used the LABSE (Language Agnostic BERT) Model for semantic similarity.
- Established a search feedback pipeline using the Gradio Interface for SME validation and for the evaluation of various search algorithms on golden dataset.
- Worked on query understanding (QU) and query expansion (QE) modules, enabling users to search in mixed languages (English+Indic).
Entity Extraction (NER): For Search – Developed Solr based entity detection to facilitate search
- Implemented end-to-end Solr-based Entity Detection Service, using Solr’s tagger request handler.
- Used various natural language analyzers and filters like stemmer, stopwords, synonyms, lemmatizer, and possessive in our analyzer chain of Solr’s tagger request handler.
- The algorithm serves as a core service in various client products, tasked with highlighting academic entities and retrieving associated academic content.

Machine Learning Engineer (Full-time + Intern)

Intellica.ai

Nov 2020 – Sep 2021 Ahmedabad

worked on building a real-time telephonic conversational-ai system for an effective interview pre-screening round.
Improved 𝐬𝐩𝐞𝐞𝐜𝐡-𝐭𝐨-𝐭𝐞𝐱𝐭 𝐩𝐢𝐩𝐞𝐥𝐢𝐧𝐞 𝐟𝐨𝐫 𝐈𝐧𝐝𝐢𝐚𝐧 𝐄𝐧𝐠𝐥𝐢𝐬𝐡 𝐚𝐜𝐜𝐞𝐧𝐭 with transfer learning on the deep-speech STT model.
Microservices were developed to evaluate the Montreal Cognitive Assessment (MoCA). The microservices contain a computer vision-based cube and clock 𝐝𝐫𝐚𝐰𝐢𝐧𝐠 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐬𝐲𝐬𝐭𝐞𝐦, as well as a context-based 𝐚𝐧𝐬𝐰𝐞𝐫 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐬𝐲𝐬𝐭𝐞𝐦 leveraging NLP concepts.

Advisor

Cretus- The Robotics and Automation Club of PDEU

Jul 2018 – Jul 2021 Gandhinagar

I have held various positions in the Cretus club throughout my engineering career, including committee member, event management head, and advisor.
Manage robotics events with Arduino, Raspberry Pi, and various types of sensors for people who are interested in robotics.
Working with the organization’s budget, advising the Event, inventory management on available funds.

Tools & Languages

airflow

docker

elasticsearch

fastapi

git

jenkins

matplotlib

metabase

numpy

opencv

pandas

python

pytorch

raspberry pi

seaborn

sklearn

solr

tensorflow

Accomplishments

Generative AI with Large Language Models

Coursera Sep 2023

The course encompasses a comprehensive exploration of Language Model Models (LLMs) featuring diverse transformer-based architectures. It delves into both the complete and fine-tuning processes of LLMs, employing techniques such as PEFT, LoRA, and soft prompts. The curriculum includes a module on model evaluation, utilizing metrics such as Bleu, Rouge, and Helm.
Additionally, the course covers the RLHF approach for addressing toxicity, delves into the lifecycle of LLMs, and explores various model optimization strategies. Participants can also expect a hands-on tutorial within AWS Sagemaker Studio, where they will work with the FLAN-T5 model sourced from HuggingFace.

See certificate

Kaggle 3X Expert

kaggle Sep 2020

I started devoting more time to kaggle, and within 3 month, I had earned three Expert badges in the categories of Notebooks, Datasets, and Competition.

See certificate

Silver Medal medal in SIIM-ISIC Melanoma Classification Kaggle Competition

siim Aug 2020

My 1st Silver Medal 🥈 medal in Kaggle Competitions
We (Artificial Doctors) started working on this Research Based Competition - SIIM-ISIC Melanoma Classification (Medical Image Classification) on Kaggle. Our goal was to work on a real-world problem and to gain some knowledge in Deep-Learning through Project-Based-Learning.
There were 3300+ participants in this competition. Our hard work paid off & we ended up securing 130th(Top 4%) 🥈 rank on Private Leader-Board with 0.9409 (ROC-AUC Score).

See certificate

Deep Learning Specialization

Coursera Apr 2020

Neural Networks and Deep Learning
Improving Deep Neural Networks
Structuring Machine Learning Projects
Convolutional Neural Networks
Sequence Models

See certificate

Time Series with Python (SKILL TRACK)

DataCamp Apr 2020

Time Series Analysis in Python
Manipulating Time Series Data in Python
Visualizing Time Series Data in Python
ARIMA Models in Python
Machine Learning for Time Series

See certificate

1st Runner-up & Best pitch in Let’s Hack 2.0 Hackathon

iic Nov 2019

Me along with my 4 team members won the 1st Runner-up at the hackathon organised by PDPU-IIC for Digital Databases and Interface for Healthcare and Smart Card.
This project consists of developing a centralized database of Firebase for storing Health-related information of the patients and developing a web and App interface for the doctors, students, pharmacists and providing data and Data visualization insights to the government as well as the hospital authorities.

See certificate

Machine Learning By Stanford University (Andrew NG)

Coursera Apr 2019