Data Acquisition | Kanad Rishiraj

Data Acquisition

Systems for extracting, validating, and operationalizing external data from messy, JS-heavy, and unreliable sources.

PythonSeleniumPlaywrightWeb scrapingBrowser automationChrome extensionsData pipelinesValidation/monitoringRate limitingAnti-botAWSS3 storageOpenAI integration

Contact: kanad.rishiraj@gmail.com

Featured Case Study

NomNomy

End-to-end restaurant discovery, platform mapping, parallel menu scraping, and operator-side consolidation.

Designed for JS-heavy pages, multi-platform schema differences, extraction reliability, and operator-side review before finalization.

Step 1: Discovery, Validation, and Platform Mapping

This run starts from a seed restaurant, discovers nearby restaurants, extracts and validates Google Maps details, persists normalized restaurant records, and then discovers DoorDash, Grubhub, and Uber Eats platform URLs for downstream scraping.

Step 2: Parallel Multi-Platform Menu Extraction

The platform scrapers run in parallel to extract prices, item details, and image assets from DoorDash, Uber Eats, and Grubhub. Image assets are downloaded directly and stored in Amazon S3 so the system retains stable artifacts instead of repeatedly hotlinking platform images.

Step 3: Streamlit Finalization and Consolidation

After scraping, the operator-facing Streamlit editor preselects prices, images, and details from the scraped platform data, while still allowing manual adjustments. The lower section exposes consolidated per-platform details for the item so the final output stays traceable.

Production Workflow Proof

MovieSaints

Production scraping workflows tied to real operator use: social lead discovery and FX data collection with source fallback.

Instagram Outreach Search to Google Sheets

This workflow starts from a target hashtag, logs into Instagram with humanized waits and scrolling, inspects posts, extracts creator profile details plus related hashtags, and then updates the Google Sheet-backed outreach queue with newly discovered leads and search terms.

FX Rate Extraction with Source Fallback

This run fetches cross-currency FX rates with x-rates as the primary source and Wise as fallback. When one source fails or returns incomplete data, the scraper shifts to the alternative source and still produces structured rate output for reliable business-side FX ingestion.

Applied Workflow Automation

FitJobs

Browser-side job extraction that turns a live posting into structured fields for downstream review.

LinkedIn Job Extraction and Extension Auto-Population

This browser-side FitJobs flow reads the active LinkedIn job page, extracts structured fields such as title, company, location, URL, and description, and auto-populates the extension UI so the result can be reviewed or passed into downstream fit-analysis logic.

Across these systems, I focus on resilience under upstream changes: selector fallbacks, source failover, structured normalization, and human review where needed.

Code examples and additional technical detail are available on request. Relevant GitHub repo: selenium-web-automation-utils