Vatsal Parsaniya

Vatsal Parsaniya

Data Scientist at Embibe

vatsalparsaniya@gmail.com

Hi! My name is Vatsal Parsaniya. A technology enthusiast, working to solve real-world problems and trying to bring a change, who enjoys connecting the dots: be it ideas from different disciplines, people from different teams, or applications from different industries.

I am a Kaggle 3X Expert and keen focus on building accurate, value-generating models that are scalable. Passionate about search-based systems!! I love to extract knowledge & insights from huge and complex data sets. The power of questioning and extracting answers from various data sets has always fascinated me.

In my undergraduate studies, I’ve taken on various leadership roles, including the development of chess playing robortic arm, coordinating institute-wide technical events for hundreds of people, and leading the robotics club of the college.

Interests
  • Search
  • Multilingual Information Retrieval
  • Natural Language Processing
  • Machine Learning
Education
  • B.Tech in ICT (Information and Communication Technology), 2021

    PDEU (Pandit Deendayal Energy University)

Experience

 
 
 
 
 
Embibe
Data Scientist
Embibe
Sep 2021 – Present Bengaluru
  • I closely collaborate with the product team within the Discovery Search Science group to transform intricate business challenges into data science problem statements. My role revolves around enhancing the search experience for users by optimizing multilingual search outcomes across a wide range of customer products and internal tools and also engage in both offline and online assessments with the engineering team to ensure the feasibility of our solutions, utilizing methodologies like NLP and Rule-Based systems.
  • Retrieval Augmented Generation (RAG):
    • Involved in development of Retrieval Augmented Generation for search and chatbot applications. This entails retrieving academic content with ontologies information and dynamically selecting prompts to generate contextually relevant responses from a generative model.
  • Multilingual Semantic Search: Developed Semantic Similar entity based Search
    • Finetuned MiniLM (SBERT Sentence Transformer Model) for semantic similarity for our academic/Indian domain entities, which helped us improve search retrieval and relevancy for user queries by 8%.
    • Implemented results re-ranking utilizing the LambdaMART algorithm, which is a gradient boosting-based ranking algorithm. This significantly improved results ranking on our golden dataset compared to using embedding similarity for ranking.
    • Integrated embedding model into the NVIDIA Triton Inference Server and incorporated a vector database into the hybrid search pipeline.
    • Developed search capabilities in 11 Indic languages, and used the LABSE (Language Agnostic BERT) Model for semantic similarity.
    • Established a search feedback pipeline using the Gradio Interface for SME validation and for the evaluation of various search algorithms on golden dataset.
    • Worked on query understanding (QU) and query expansion (QE) modules, enabling users to search in mixed languages (English+Indic).
  • Entity Extraction (NER): For Search – Developed Solr based entity detection to facilitate search
    • Implemented end-to-end Solr-based Entity Detection Service, using Solr’s tagger request handler.
    • Used various natural language analyzers and filters like stemmer, stopwords, synonyms, lemmatizer, and possessive in our analyzer chain of Solr’s tagger request handler.
    • The algorithm serves as a core service in various client products, tasked with highlighting academic entities and retrieving associated academic content.
 
 
 
 
 
Intellica.ai
Machine Learning Engineer (Full-time + Intern)
Nov 2020 – Sep 2021 Ahmedabad
  • worked on building a real-time telephonic conversational-ai system for an effective interview pre-screening round.
  • Improved 𝐬𝐩𝐞𝐞𝐜𝐡-𝐭𝐨-𝐭𝐞𝐱𝐭 𝐩𝐢𝐩𝐞𝐥𝐢𝐧𝐞 𝐟𝐨𝐫 𝐈𝐧𝐝𝐢𝐚𝐧 𝐄𝐧𝐠𝐥𝐢𝐬𝐡 𝐚𝐜𝐜𝐞𝐧𝐭 with transfer learning on the deep-speech STT model.
  • Microservices were developed to evaluate the Montreal Cognitive Assessment (MoCA). The microservices contain a computer vision-based cube and clock 𝐝𝐫𝐚𝐰𝐢𝐧𝐠 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐬𝐲𝐬𝐭𝐞𝐦, as well as a context-based 𝐚𝐧𝐬𝐰𝐞𝐫 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐬𝐲𝐬𝐭𝐞𝐦 leveraging NLP concepts.
 
 
 
 
 
Cretus- The Robotics and Automation Club of PDEU
Advisor
Jul 2018 – Jul 2021 Gandhinagar
  • I have held various positions in the Cretus club throughout my engineering career, including committee member, event management head, and advisor.
  • Manage robotics events with Arduino, Raspberry Pi, and various types of sensors for people who are interested in robotics.
  • Working with the organization’s budget, advising the Event, inventory management on available funds.

Tools & Languages

airflow
airflow
docker
docker
elasticsearch
elasticsearch
fastapi
fastapi
git
git
jenkins
jenkins
matplotlib
matplotlib
metabase
metabase
numpy
numpy
opencv
opencv
pandas
pandas
python
python
pytorch
pytorch
raspberry_pi
raspberry pi
seaborn
seaborn
sklearn
sklearn
solr
solr
tensorflow
tensorflow

Accomplish­ments

Coursera
Generative AI with Large Language Models
  • The course encompasses a comprehensive exploration of Language Model Models (LLMs) featuring diverse transformer-based architectures. It delves into both the complete and fine-tuning processes of LLMs, employing techniques such as PEFT, LoRA, and soft prompts. The curriculum includes a module on model evaluation, utilizing metrics such as Bleu, Rouge, and Helm.
  • Additionally, the course covers the RLHF approach for addressing toxicity, delves into the lifecycle of LLMs, and explores various model optimization strategies. Participants can also expect a hands-on tutorial within AWS Sagemaker Studio, where they will work with the FLAN-T5 model sourced from HuggingFace.
See certificate
kaggle
Kaggle 3X Expert
  • I started devoting more time to kaggle, and within 3 month, I had earned three Expert badges in the categories of Notebooks, Datasets, and Competition.
See certificate
siim
Silver Medal medal in SIIM-ISIC Melanoma Classification Kaggle Competition
  • My 1st Silver Medal 🥈 medal in Kaggle Competitions
  • We (Artificial Doctors) started working on this Research Based Competition - SIIM-ISIC Melanoma Classification (Medical Image Classification) on Kaggle. Our goal was to work on a real-world problem and to gain some knowledge in Deep-Learning through Project-Based-Learning.
  • There were 3300+ participants in this competition. Our hard work paid off & we ended up securing 130th(Top 4%) 🥈 rank on Private Leader-Board with 0.9409 (ROC-AUC Score).
See certificate
Coursera
Deep Learning Specialization
  1. Neural Networks and Deep Learning
  2. Improving Deep Neural Networks
  3. Structuring Machine Learning Projects
  4. Convolutional Neural Networks
  5. Sequence Models
See certificate
DataCamp
Time Series with Python (SKILL TRACK)
  1. Time Series Analysis in Python
  2. Manipulating Time Series Data in Python
  3. Visualizing Time Series Data in Python
  4. ARIMA Models in Python
  5. Machine Learning for Time Series
See certificate
iic
1st Runner-up & Best pitch in Let’s Hack 2.0 Hackathon
  • Me along with my 4 team members won the 1st Runner-up at the hackathon organised by PDPU-IIC for Digital Databases and Interface for Healthcare and Smart Card.
  • This project consists of developing a centralized database of Firebase for storing Health-related information of the patients and developing a web and App interface for the doctors, students, pharmacists and providing data and Data visualization insights to the government as well as the hospital authorities.
See certificate
Coursera
Machine Learning By Stanford University (Andrew NG)
  1. Logistic Regression
  2. Artificial Neural Network
  3. Machine Learning (ML) Algorithms
  4. Machine Learning
See certificate