Best practice to insert data automatically from a pipeline

avanbeelen · June 30, 2020, 10:23am

Hi Django community,

(I am new to this forum, I tried my best to formulate my question)

Context: I study bioinformatics and for my minor ‘webapplication development’ my project group and I have to built a webapplication using Django. I am responsible to create models and write code that reads, parses and inserts data (using MySQL) into these models. The data comes from a pipeline, using Snakemake, and it has to be inserted automatically into these models. I wrote this code in views.py that does the trick using dummy data (see code below for an example).

def insert_data_into_database():
"""location entiteit"""
with open(snakemake.output.a, 'r') as tsv:
    next(tsv)
    for line in csv.reader(tsv, delimiter='\t'):
        with open('C:\\Users\\Aron\\PycharmProjects\\bpexi\\bpexi\\lysteria\\%s' % (line[2]), 'rb') as image:
            binary_data = image.read()
        Location.objects.get_or_create(location_id=line[0],
                                       description=line[1],
                                       location_image=binary_data)

My problem is: I have to read, parse and insert a lot of different tab seperated files, so views.py becomes cluttered with my code. This feels kind of counter intuitive.

My question is: What are good practices to insert data, from a pipeline, automatically into my models? I tried to create another file into my app directory, but that does not seem to work due to the fact I can not import my models for some reason (see code and error below).

code:

from .models import Sample

error:

ModuleNotFoundError: No module named '__main__.models'; '__main__' is not a package

or

code:

from models import Sample

error:

django.core.exceptions.ImproperlyConfigured: Requested setting INSTALLED_APPS, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings.

adamchainz · June 30, 2020, 11:07am

I think you probably want to wrap your code in a custom management command: https://docs.djangoproject.com/en/3.0/howto/custom-management-commands/ .

avanbeelen · June 30, 2020, 11:47am

I made a command that inserts all the data into my models.

Thank you so much!

kkkarthik · September 16, 2020, 3:17pm

Hi
I am in a similar situation. Would you be kind and let me know what was your final approach? Did you make a custom script and schedule it in task scheduler? So when it executes, it sends data to your models?

avanbeelen · September 16, 2020, 3:59pm

Hi, I solved my problem with this tutorial: https://www.youtube.com/watch?v=vOZPuQrt0gU&ab_channel=CodingEntrepreneurs

The tutorial explains how to make a script that stores data in your models.

Goodluck!

Topic		Replies	Views
Inserting data into database using files on filesystem Using Django	6	2609	September 20, 2020
create new objects using class inmodels Using Django	3	1387	November 27, 2021
Bulk insert to my model on Django from a dictionary Using the ORM	3	2846	August 7, 2023
Help with structuring Django Project Getting Started	3	27	August 9, 2024
Backward compatible data insertions during migrations Using the ORM	0	46	August 1, 2024

Best practice to insert data automatically from a pipeline

Related topics