This article is about How to do simple ruby web scraping by processing CSV.

In this article, we will create a Ruby on Rails application to scrap the link uploaded from a CSV file and find the occurrence of the link on a particular page.

In the below Ruby on Rails application development, the user needs to pass a CSV file and list of the user’s email to whom the parsed CSV will be sent.

In the CSV file, there will be 2 columns

  1. referal_link
  2. home_link

Let’s start creating a Rails Application

  • Run the below command to create a new rails application

$ rails new scrape_csv_data

$ cd scrape_csv_data

  • Then, we will generate an Upload CSV module. Run the below command.

$ rails g scaffold UploadCsv generated_csv:string csv_file:string

This will create all the required models, controllers, and migrations for csv_file. Run the migration using the below command. 

$ rails db:migrate

  • Then we will add the gem to upload the CSV file. Add the below line to your Gemfile.

gem 'carrierwave', '~> 2.0'

  Then,

$ bundle install

  • Then we will create the uploader in careerwave using the below command.

$ rails generate uploader Avatar

  • We will attach the uploader in the model app/models/upload_csv.rb. 

class UploadCsv < ApplicationRecord

mount_uploader :csv_file, AvatarUploader

end

  • Update the routes.

Rails.application.routes.draw do
resources :upload_csvs
root 'upload_csvs#index'
end

  • Then, we will start the server and check if the application is working successfully or not. 

$ rails s

  • Then we will create a job to read the CSV file and scrap the link from it and the generated file will be stored in the generated_csv column of that record for generating the job. Run the below command.

$ rails generate job genrate_csv

  • Add the below gem and run bundle install

gem 'httparty'
gem 'nokogiri'

  • Then we will replace the below code in the GenerateCsv job.
  • Then we will run the job after_create of upload_csvs and we will add the validation for the csv_file required. 

After uploading the file check the scrap generated file will be updated. You can check the generated file in /scrape_data/public/result_data.csv

  • Now we will send the generated file through email by using the below instructions.

First,  we will generate the mailer by using the below command.

$ rails generate mailer NotificationMailer

Add this code inside the notification mailer. 

Also, we need to add mail configuration inside config/environments/development.rb or production.rb.

Also, we need to update the view also app/views/notification_mailer/send_csv.html.erb

app/views/notification_mailer/send_csv.html.erb

web scraping with ruby on rails

Thank you!

Get Estimate for Ruby on Rails Application