Item Pipeline

Other topics

Creating your own Pipeline

When creating a scrapy project with scrapy startproject myproject, you'll find a file already available for creating your own pipelines. It isn't mandatory to create your pipelines in this file, but it would be good practice. We'll be explaining how to create a pipeline using the file:

class MyPipeline(object):
    def process_item(self, item, spider):
        # process your `item` here
        return item

Now to enable it you need to specify it is going to be used in your settings. Go to your file and search (or add) the ITEM_PIPELINES variable. Update it with the path to your pipeline class and its priority over other pipelines:

    'myproject.pipelines.MyPipeline': 300,

Now every item that your spider returns, will go through this pipeline.

Creating a dynamic pipeline in Python Scrapy

Enable pipelines in your

    'project_folder.pipelines.MyPipeline': 100 

Then write this code in

# -*- coding: utf-8 -*-
from scrapy import Item, Field
from collections import OrderedDict

class DynamicItem(Item):
    def __setitem__(self, key, value):
        self._values[key] = value
        self.fields[key] = {}

Then in your `project_folder/spiders/

from project_folder.items import DynamicItem
       def parse(self, response):
               # create an ordered dictionary
               data = OrderedDict()
               data['first'] = ...
               data['second'] = ...
               data['third'] = ...
               # create dictionary as long as you need
               # now unpack dictionary
               yield DynamicItem( **data )

               # above line is same as this line
               yield DynamicItem( first = data['first'], second = data['second'], third = data['third'])

What are benefits of this code?

No need to create define each item in one by one.


Topic Id: 8589

Example Ids: 26858,28698

This site is not affiliated with any of the contributors.