Selected Web Scraping & Data Pipeline Work

Hands-on Python engineer with experience building scraping and data collection systems across dynamic JS-heavy sites and structured web/public-data sources. Examples below show browser automation, HTTP extraction, normalization, validation, and reliability work across production and applied systems.

Contact: kanad.rishiraj@gmail.com

NomNomy

  • Built Python/Selenium scrapers for Uber Eats, Grubhub, and DoorDash, extracting structured menu data from JS-heavy delivery platforms including items, prices, images, and modifier/customization trees.
  • Improved scraper resilience with platform-specific readiness checks, challenge/gating handling, stale-element recovery, repeated-item detection, incremental persistence, completion metadata, and debug artifacts.
  • Extended the scrapers into a broader pipeline with CLI/batch orchestration, JSON persistence, and Streamlit-based review/finalization tooling for QA and normalization.

MovieSaints

  • Built Instagram scraping and automation workflows including session reuse via cookies, creator discovery, hashtag/post extraction, and messaging automation across dynamic UI flows.
  • Developed structured external-data pipelines for IMDb metadata/credits, FX-rate ingestion with source fallback, and other third-party web data using browser automation plus direct HTTP/HTML parsing.
  • Added persistence, retries, source fallback, normalization, and structured JSON/DB outputs so extracted data could be reused in downstream workflows.

FitJobs

  • Built multi-site job extraction logic for LinkedIn, Indeed, Glassdoor, Greenhouse, and Lever using site-specific scraper routing and DOM strategies.
  • Implemented resilient extraction for dynamic job pages using hidden-content expansion, retry-based metadata reads, structured-data parsing, and layered selector fallbacks.
  • Normalized extracted fields into a consistent payload and passed them directly into downstream analysis workflows for resume-to-job fit scoring.

TalkToGov

  • Built Python scraping/import pipelines using requests.Session and BeautifulSoup to extract structured data from government list and profile pages.
  • Designed recurring-scrape reliability features including HTML change detection, cache reuse, strict validation/reconciliation, and safety checks to prevent bad runs from corrupting downstream data.
  • Implemented broader import flows with paginated API extraction, 429 backoff handling, normalized CSV/DB outputs, and change-history tracking.

Code examples and additional technical detail are available on request.