Sync database with external csv files

Hello,

I’m applying for my first position as a Python/Django developer, even my first position in tech, after having had a career in a different industry.

For the application I’m asked to create a small Django app that takes it data from external csv files. For this I have written a custom command that gets run by a cronjob periodically:

from django.core.management.base import BaseCommand
import requests
from requests.auth import HTTPBasicAuth
import csv
from hotels.models import Cities, Hotels


"""
This command handles the logic for importing the csv files into the database.
It connects to the csv url's by using HHTPBasicAuth abd parses to a csv reader.
It then iterates over the reader and creates a model instance for each row in the csv.
"""

class Command(BaseCommand):
    help = 'Imports data from CSV files through a url'
    

    def handle(self, *args, **kwargs):

        csv_city = 'http://xxxxxx/djangocase/city.csv'
        csv_hotel = 'http://xxxxxxl/djangocase/hotel.csv'
        credentials = HTTPBasicAuth('xxxxxx', 'xxxxxx')

        # check if we are able to establish a HTTP connection
        response = [
            requests.get(csv_city, auth=credentials),
            requests.get(csv_hotel, auth=credentials)
            ]

        def create_csvreader_object(response_index):
            data = response[response_index]
            lines = data.text.splitlines()
            reader = csv.reader(lines, delimiter=';',)
            return reader

        if all(connection.status_code == 200 for connection in response):
            
            # clear the databases before importing the csv data
            Cities.objects.all().delete()
            Hotels.objects.all().delete()

            # create instances for the Cities model in the database
            id_counter = 1
            for row in create_csvreader_object(0):
                Cities.objects.create(
                    id = id_counter,
                    city_abbreviation = row[0],
                    city_name = row[1]
                )
                id_counter += 1
            """
            To create the instances in the Hotel model we need to get all objects in 
            the Cities model so we can match the city_abbreviation in the Cities model to 
            the corresponding field in the Hotel model in order to establish a Foreign relationship.
            """
            id_counter = 1
            all_cities = Cities.objects.all()
            for row in create_csvreader_object(1):
                for city in all_cities:
                    if city.city_abbreviation == row[0]:
                        Hotels.objects.create(
                            id = id_counter,
                            city = city,
                            city_abbreviation = row[0],
                            hotel_id  = row[1],
                            hotel_name = row[2]
                        )
                        id_counter += 1
                
            return 'Data fetched'
        else:
            return 'Unable to establish connection'

I was wondering if my approach to this is the right way to go or if there are smarter ways to do this? Are there built-in Django tools that can handle this kind of syncing?

Any feedback is welcome.

Thanks,
Artur

If this is something that you’re going to do a lot, then this library may come in hand: django-import-export.

1 Like

Oh cool! Thanks for pointing me there!
Do you have experience with that library yourself?

I used it one or two times.
The only thing that made me feel a little bad, is that if you’re importing a big dataset, then the admin import confirmation page, takes a while to load.

1 Like

Oh that’s good to know.

My goal is to periodically import data from external csv files through an http connection. There are basically three kind of changes that can occur in the csv sources: there could be rows modified, there could be rows added and there could be rows deleted. There will be no modifications in the db on the application side.

I’m wondering what would be the best approach to keeping my db in sync with the csv files: empty my db and create all entries from scratch every time I do an import, or have some more sophisticated logic that compares the csv data with the db and updates, removes or adds instances where needed?

Like all others things related to programming there are many aproaches to this problem.
Is up to you think about what fit the best for your application.

I know that import-export already handles updating or creating rows. I don’t know about deleting the objects.

This is fine too, i would just assert that the import is successful before commiting. So your current code should run inside a database transaction.

This is fine too, i would just assert that the import is successful before commiting. So your current code should run inside a database transaction.

That is a great suggestion, I was not familiar with that.
In my case would that work something like this?:

from django.db import transaction

def handle(self, *args, **kwargs):
    # This code executes in autocommit mode (Django's default).
    do_stuff()

    with transaction.atomic():
        # delete instances in the db
        Cities.objects.all().delete()

        # create the instances inside the transaction
        for row in create_csvreader_object(0):
        Cities.objects.create(
        id = id_counter,
        city_abbreviation = row[0],
        city_name = row[1]
        )

That’s correct!
In that case if there’s any exception while reading or processing, then no changes will be commited

So great that I learned something new, thank you!

1 Like