
How to do Web Scraping with Ruby by Processing CSV
This article is about How to do simple ruby web scraping by processing CSV.
In this article, we will create a Ruby on Rails application to scrap the link uploaded from a CSV file and find the occurrence of the link on a particular page.
In the below Ruby on Rails application development, the user needs to pass a CSV file and list of the user’s email to whom the parsed CSV will be sent.
In the CSV file, there will be 2 columns
- referal_link
- home_link
Let’s start creating a Rails Application
- Run the below command to create a new rails application
$ rails new scrape_csv_data
$ cd scrape_csv_data
- Then, we will generate an Upload CSV module. Run the below command.
$ rails g scaffold UploadCsv generated_csv:string csv_file:string
This will create all the required models, controllers, and migrations for csv_file. Run the migration using the below command.
$ rails db:migrate
- Update the app/views/upload_csvs/_form.html.erb with the below code.
- Then we will add the gem to upload the CSV file. Add the below line to your Gemfile.
gem 'carrierwave', '~> 2.0'
Then,
$ bundle install
- Then we will create the uploader in careerwave using the below command.
$ rails generate uploader Avatar
- We will attach the uploader in the model app/models/upload_csv.rb.
class UploadCsv < ApplicationRecord
mount_uploader :csv_file, AvatarUploader
end
- Update the routes.
- Then, we will start the server and check if the application is working successfully or not.
$ rails s
- Then we will create a job to read the CSV file and scrap the link from it and the generated file will be stored in the
generated_csv
column of that record for generating the job. Run the below command.
$ rails generate job genrate_csv
- Add the below gem and run
bundle install
gem 'httparty'
gem 'nokogiri'
- Then we will replace the below code in the GenerateCsv job.
- Then we will run the job after_create of upload_csvs and we will add the validation for the csv_file required.
- Now update the code of app/models/upload_csv.rb.
After uploading the file check the scrap generated file will be updated. You can check the generated file in /scrape_data/public/result_data.csv
- Now we will send the generated file through email by using the below instructions.
First, we will generate the mailer by using the below command.
$ rails generate mailer NotificationMailer
Add this code inside the notification mailer.
Also, we need to add mail configuration inside config/environments/development.rb or production.rb.
Also, we need to update the view also app/views/notification_mailer/send_csv.html.erb
app/views/notification_mailer/send_csv.html.erb

Thank you!

