Data Science: Web Scraping using Python

Dhrumil Dalwadi
2 min readOct 4, 2021

What is Web Scraping?

Web scraping is the process of pulling a large number of data from the website in an efficient way. Web Scraping uses intelligence automation methods to get thousands or even millions of data sets in a smaller amount of time. It provides you with a technique to get access to structured web data in an automated fashion. Python is the most common programming language for web scrapping due to it’s functionality of easy to write and understand code and large collection of libraries. Some of the well known libraries for web scraping is Pandas, BeautifulSoup and Selenium.

In this practical implementation I have web scraped Flipkart website to get information about Laptops available and its price on Flipkart website.

Implementation

First we install require libraries using ‘pip’ command

pip install beautifulsoup
pip install selenium

Now, we get the url in the form of httpresponse using the request library and then we create a soup object.

url="https://www.flipkart.com/search?q=laptop&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off"
response = requests.get(url)
htmlcontent = response.content
soup = BeautifulSoup(htmlcontent,"html.parser")

Now, we create an empty list of products and price

products=[]
prices=[]
product=soup.find('div',attrs={'class':'_4rR01T'})
print(product.text)

We use the anchor tag ‘a’ to extract data using the division tag and store it in the above empty lists.

for a in soup.findAll('a', attrs={'class':'_1fQZEK'}):
name=a.find('div',attrs={'class':'_4rR01T'})
price=a.find('div',attrs={'class':'_30jeq3 _1_WHN1'})
products.append(name.text)
prices.append(price.text)

Now using the imported Pandas Library, create a dataframe in which the data is stored in a structured way to export it into the desired file format. Here I have exported the data in .csv format.

df = pd.DataFrame({'Product Name':products,'Prices':final})
df.to_csv('products.csv', index=False, encoding='utf-8')
data = pd.read_csv('products.csv')

We find the following result after data.head().

Data after being scraped

Source code can be found in the following link

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Dhrumil Dalwadi
Dhrumil Dalwadi

Written by Dhrumil Dalwadi

Blockchain and Cyber-security Enthusiast

No responses yet

Write a response