Siddhesh Sheth

I'm

About

I am Siddhesh Sheth, an aspiring Data Engineer who graduated with an MS in Computer Science from Indiana University Bloomington this May. Currently, I work as a Data Analyst at Hoosier Community Network. I recently ended my tenure as a Data Analyst at Indiana University Bloomington, with a notable contribution in bringing a spike in sales of electronics with my data analysis, SQL, and decision-making skills. I also interned at a tax resolution firm last summer where I extracted data from tax receipts using MySQL and overlooked their webpage development too.



In addition, I have a year of experience at Accenture as a Software and Data engineer where I mainly looked at data migrations from on-prem to cloud using Azure DMS and package transfer via SSMS. I am an AWS-certified cloud practitioner too.



My primary skillsets include proficiency in a range of programming languages (Python, SQL, Java, etc.), expertise in various databases(MySQL, MongoDB, Snowflake) and cloud platforms (AWS, Azure, Google Cloud), and strong capabilities in data engineering tools and analytics (SSIS, Azure Data Factory, PySpark, Kafka, Hadoop, Power BI, Tableau). I'm also excited about learning new technologies and solving complex problems.


Apart from academics I am also passionate about wildlife, and hiking and would love to scale unexplored summits.

Data Analyst and Data Engineer

Inquisitive data professional with a year of corporate experience handling client applications and databases prudently and intending augmentation of businesses with diligence.

  • Age: 25
  • Degree: Masters in CS
  • Email: shethsiddhesh268@gmail.com
  • University: Indiana University, Bloomington


Facts

  • Cricket and Badminton Lover
  • Wlidlife and nature explorer
  • Poet/Writer

Major Projects
2017-2024

Mini Projects
2017-2024

Connections

Skills


Python95%
SQL 95%
AWS/Azure/GCP 85%
Power BI 95%
Tableau 95%
MongoDB 80%
RSA Archer 95%
Kafka / Airflow 90%
Javascript/NodeJS 75%
HTML/CSS/Bootstrap 100%
Django 70%
Scala 75%

Resume

(Click to Download)

SUMMARY

Siddhesh Sheth

Inquisitive Data professional with corporate experience handling client applications and databases prudently and intending augmentation of businesses with diligence.

  • 700 N Alabama St.,Indianapolis,IN
  • (812)778-4657
  • shethsiddhesh268@gmail.com

Education

Master of Science in Computer Science

May 2022 - May 2024

Indiana University, Bloomington, IN

  • Courses taken
    Applied Algorithms, Advanced Database Technologies, Information Visualization, Data Mining, Elements of AI, Applied Machine Learning, Software Engineering, Computer Networks
  • GPA : 3.97/4

Bachelor of Engineering in Computer Engineering

July 2017 - July 2021

Pimpri Chinchwad College of Engineering, Akurdi, Pune, India

  • Courses taken

    Object Oriented Programming, Computer Organization and Architecture, Theory of Computation, Operating System, Web Technology, Design and Analysis of Algorithm, Data Structures and Algorithms, Advanced Data Structures, Cloud Computing, Data Mining and Warehousing, Computer Networks, Elements of Artificial Intelligence, Machine Learning

  • GPA: 9.2/10
  • Key Professional Experiences

    Data Analyst

    Mar 2023 - Present

    Hoosier Community Network,Indianapolis, USA

    • Ingested CDC healthcare data into pandas DataFrames using Python and sodapy library for efficient access to the CDC’s Socrata open data RESTful API, enabling comprehensive analysis of healthcare metrics and trends.
    • Deployed a PostgreSQL database on Heroku, leveraging SQLAlchemy for ORM (Object-Relational Mapping) to load DataFrames into the database.
    • Created interactive dashboards by connecting the PostgreSQL database to Metabase, enhancing data exploration through SQL queries. This integration reduced reporting time by 30%, enabling swift data-driven decision-making.

    Data Analyst

    Mar 2023 - Present

    Indiana University, Bloomington, USA

    • Led the initial design and indexing of SQL database schema, including transactions, orders, inventory management, and department tables. This design reduced data storage by 20% and improved query performance, facilitating efficient analysis of sales activities.
    • Extracted inventory data from Excel files using openpyxl and transaction data from the SQL database using SQL queries.
    • Implemented advanced transformations, including data type conversion and normalization, while conducting rigorous data wrangling and validation to ensure data integrity and accuracy for downstream reporting.
    • Utilized SQL for extraction and transformation and created interactive dashboards in Power BI, applying DAX for metrics that visualized sales trends. This effort led to a 15% increase in decision-making efficiency and improved strategic planning through real-time insights.

    Graduate Teaching Assistant

    Aug 2023 - Dec 2023

    Indiana University, Bloomington, USA

    • Monitored and graded more than 250 students for Computer Networks on a wide range of networking fundamentals.
    • Handled various Domain Name Systems (DNS), Internet Protocol Suite (TCP/IP), User Datagram Protocol (UDP), Network Security · FTP, Simple Mail Transfer Protocol (SMTP), and other terminologies.

    Data Specialist - Intern

    May 2023 - Aug 2023

    Scotfin Tech, Indianapolis, USA

    • Conceptualized and executed visually compelling web pages through Tailwind CSS and HTML5; handled asynchronous client requests using Ajax, leading to a significant 40% surge in user traffic; proficiently mitigated bounce rates by 25%.
    • Conducted a comprehensive analysis of trends within the tax resolution industry with SEMrush and optimized local search visibility using SEO Toolkit. Then, using standard Javascript, refined the website to ensure close to 100% throughput.
    • MySQL queries improved audits. Predefined schemas boosted data extraction by 30% for tax receipts, maintaining ACID properties, while normalization cut processing time by 20%, enabling faster access.

    Database Engineer Associate

    Aug 2021 - Aug 2022

    Accenture, Pune, MH, India

    • Oversaw the migration, updates, and backups of AT&T databases for Financial Billing operations through SSMS, guaranteeing no data loss and a 15% reduction in recovery time.
    • Orchestrated the seamless transfer of packages, overseeing 100+ SQL Server databases with Azure DMS. Established streamlined handling protocols, resulting in a 20% reduction in migration time.
    • As the lead RSA Archer Administrator, directed access roles for 100+ clients, addressing 50 queries/week. Introduced logical frameworks, diminishing recurring issues tied to governance, risk, and compliance by 75%.

    Database Engineer Associate

    Aug 2021 - Aug 2022

    Accenture, Pune, MH, India

    • Oversaw the migration, updates, and backups of AT&T databases for Financial Billing Operations through SSMS from SQL Server to Azure SQL Database, guaranteeing no data loss and a 15% reduction in recovery time.
    • Orchestrated the seamless transfer of packages, overseeing 100+ SQL Server databases with Azure DMS. Established streamlined handling protocols, resulting in a 20% reduction in migration time.
    • As the lead RSA Archer Admin, managed and configured access roles, handling 50+ weekly queries on permissions and security. I implemented advanced workflows and customizations to meet GRC requirements, reducing GRC-related issues by 75%.
    • Monitored and scaled Azure VMSS based on usage, cutting resource wastage by 15% and maintaining 99.9% uptime, while optimizing costs by 10%. Managed Astra alerts and implemented real-time threat detection, reducing security risks by 20%, with weekly audits and vulnerability patching improving system resilience and cutting GRC incidents by 30%.

    Key Projects

    1. Comprehensive Car Sales Analysis and Visualization Using Power BI

    Duration: Dec 2023 – Jan 2024

    • Used Power Query to extract, transform, and load (ETL) over 250k records, optimizing data models for analysis. Applied advanced DAX functions for dynamic KPI calculations, time intelligence, and complex measures to support business requirements.
    • Developed interactive dashboards with drill-through reports, cross-filtering, and custom visuals, including geospatial analysis using ArcGIS maps, allowing detailed insights into regional sales performance.
    • Integrated Python scripts within Power BI for advanced data manipulation and forecasting models, enabling accurate prediction of sales trends and enhanced analytical capabilities.

    2. Netflix Trend Analysis and Recommendation System

    Duration: Dec 2023 – Jan 2024

    • Implemented a movie recommendation system using TF-IDF vectors and cosine similarity. Developed models for predicting ratings ('Adults,' 'Kids,' 'Teens') and types ('Movies', 'TV-Show') using Naive Bayes, Logistic Regression, and Decision Tree models.
    • Performed EDA on 50,000+ records with Pandas, visualizing genre trends using Matplotlib and Seaborn. Identified the top 10% of genres, leading to a 30% increase in interactions with top-rated content.
    • Utilized Matplotlib and Tableau for advanced data visualizations, including dynamic dashboards and parameter controls, which facilitated interactive filtering and real-time data exploration.

    3. Real-Time Weather Data ETL Automation with Apache Airflow on AWS.

    Duration:July 2023 – Aug 2023

    • Orchestrated a scalable ETL pipeline using Apache Airflow to process more than 50,000 weather data points daily, reducing runtime by 40% with optimized DAGs and automated task dependencies using custom Operators.
    • Deployed AWS Glue for processing over 50 GB of data per day using custom PySpark jobs. Optimized data transformations, reducing processing time by 30% through dynamic partitioning and automated workflows.
    • Integrated AWS Athena for querying partitioned data stored in Amazon S3, allowing for sub-second queries on 20 GB of data, improving decision-making speed by 25%, and establishing a connection of AWS Athena with AWS Quicksight to visualize weather patterns.

    4. Analyzing Stock Market data in Real-Time using Kafka-driven ETL.

    Duration:June 2023 – July 2023

    • Developed a Python script using the yfinance library to fetch 1 year of NASDAQ stock data, iterating through ticker symbols and extracting data to flat files. Then pushed the data in S3 and launched an EC2 Linux instance for Kafka setup.
    • Instantiated a Zookeeper, created a topic in Kafka, and delivered over 1,000 data messages to a consumer and S3 for ingestion into Snowflake using Kafka Connect.
    • Integrated Snowflake with Snowpipe for real-time data ingestion from S3. This enabled enhanced SQL capabilities and materialized views, aiding data analysis efficiency and investment decision-making. Finally, connected Power BI to Snowflake for effective data visualization.

    5. Venue Management System

    Duration: Feb 2023 – May 2023

    • Developed a platform for tracking user sports activities, live venue updates, and facilitating connections through a chat feature.
    • Providing real-time updates using venue capacity, resulting in a 30% increase in venue utilization and a 20% boost in revenue.
    • Established frontend using JavaScript and React framework, and the backend was secured on the cloud with the premium features of Firebase and then hosted on Vercel.

    6. Literacy Education Nexus

    Duration: Jan 2023 – April 2023

    • Analyzed SOFI and World Bank data with Python and Tableau to visualize literacy education's link to global development.
    • Utilized linear regression algorithms to forecast key parameters like GDP, employment, and school enrollment trends until 2040.
    • Curated PowerBI visualizations to advocate education as a human right by showcasing its positive impact on economic growth.

    7. Breast Cancer Detection using Machine Learning

    Duration: Dec 2020 – March 2021

    • For the final year capstone project, I worked on obtaining and preprocessing datasets (TCGA, Histopathology) for classifying datasets comprising images and texts.
    • Under CNN, convolution and pooling reduce the layers during the pooling stage to maximize accuracy.
    • Applied RFE (Recursive Feature Elimination technique) to take only the optimum features, thereby uplifting the accuracy of the model by 7-8%.

    8. Fleet Management System

    Duration: =Jun 2019 - Aug 2019

    • Designed an online ledger using PostgreSQL to store details of heavy vehicles and drivers along with 30 days cost analysis.
    • Spearheaded the analysis part for MAX Travel, leading to a 20% reduction in operational costs by visuals using Matplotlib and Seaborn.
    • Not only streamlined their fleet management but also contributed to an impressive 30% reduction in vehicle downtime.

    9. AWS Face Detection Application

    Duration: Jun 2019 – July 2019

    • Configured AWS architecture to telegram BOT. The user uploads an image to Telegram and passes the image to S3 for access.
    • Actions and decisions taken are supervised by the EC2 instance, fed with an appropriate code in PHP.
    • The scope was further extended in classifying gender and count of faces and other objects.

    10. Blood Bank Donation System

    Duration: Feb 2018 – May 2019

    • Established and developed two SQL databases and schemas, one for patients’ personal history and the other for blood bank and donor details.
    • Implemented comprehensive CRUD operations, guaranteeing ACID properties.
    • Tested a multitude of test cases utilizing platforms such as Selenium IDE, WebDriver, and Selenium Grid, resulting in improved system performance and seamless user experience.

    Contact

    Feel free to reach out !!!!

    Location:

    700 N Alabama St, Indianapolis, IN, 46204

    Call:

    +1 (812) 778-4657