skip saving a data if the slug already exists

wallcroft · October 28, 2022, 8:56am

i’m trying to save a list of word to the database, but sometime the word that will be saved is already exists in the databases and so is the slug, i want to skip this already existed word but still save the other words that doesn’t exist yet in the database, is there anyway i could do this?

this is what i’m doing now,
models.py

class Manado(models.Model):
    kata = models.CharField(max_length=50)
    slug = models.SlugField(max_length=250, unique=True, blank=True)

    class Meta:
        verbose_name_plural = "manado"

    def __str__(self):
        return self.kata

    def save(self, *args, **kwargs):
        if not self.slug:
            self.slug = slugify(self.kata)
        super(Manado, self).save(*args, **kwargs)

utils.py

def compare(translasi):

    with open("main/idwords.html", "r") as file:
        body = file.read()

    soup = BeautifulSoup(body, "lxml")
    word = soup.select_one(selector=".word").get_text(strip=True)
    allwords = word.split()
    text = translasi
    regx = re.sub("[^a-zA-Z]+", " ", text)

    text = regx.split()
    low_text = [lowtext.lower() for lowtext in text]
    low_allwords = [lowallwords.lower() for lowallwords in allwords]
    clean = sorted([*set(low_text)])
    allword = sorted([*set(low_allwords)])
    compared = sorted(list(set(clean) - set(allword)))
    context = {"compare": compared}
    return context


def compare_db(kata, kata_db):

    clean = sorted([*set(kata)])
    allword = sorted([*set(kata_db)])
    compared = sorted(list(set(clean) - set(allword)))
    context = {"compare": compared}
    return context

views.py

if "close-form" in request.POST:
        if request.user.is_authenticated:
            post = Post.objects.get(id=post.id)
            post.status = False
            post.updated = now
            post.save()
            # post = post.id
            trans = Post.objects.get(id=post.id)
            tr = trans.translasi.all().aggregate(Max("poin"))
            poin = tr
            pts = poin["poin__max"]
            record = trans.translasi.filter(poin=pts).values("id")
            # record = record.latest('created')
            # tr = tr.id
            tr_id = record[0]["id"]
            translasi = Translasi.objects.get(id=tr_id)
            translasi.best = True
            translasi.save()
            # Algoritma komparasi kata
            translasi = translasi.content
            hasil_banding = compare(translasi)
            hasil = hasil_banding["compare"]
            kata_db = Manado.objects.all().distinct()
            list_kata = []
            for kata_db in kata_db:
                kata_db = str(kata_db)
                list_kata.append(kata_db)
            banding_kata = compare_db(hasil, list_kata)
            banding_kata = banding_kata["compare"]
            print(banding_kata)
            if banding_kata == None:
                pass
            else:
                for hsl in banding_kata:
                    Manado.objects.create(kata=hsl)

czue · October 28, 2022, 12:02pm

There might be more efficient ways, but one thing you could do is first find all the words that are already there and then only update the ones that aren’t.

E.g:

words = get_words_to_add()
existing_words = set(Word.objects.filter(slug__in=[w.slug for w in words]).values_list('slug', flat=True))
words_to_add = [w for w in words if w.slug not in existing_words]
Word.objects.bulk_create(words_to_add)

(Note: I didn’t test/run this code)

wallcroft · October 28, 2022, 12:09pm

yeah i manage to do the save with code similar to this, but it do looks not to efficient

KenWhitesell · October 28, 2022, 12:42pm

The best answers to this is going to depend upon the specifics of your current implementation.

How are you getting this “list of word”? Are these submissions through a view, an API, or a management command?

What does your database model look like? Is the only field in that model these “word”?

wallcroft · October 29, 2022, 12:58am

i’ve updated the question with my code, please take a look

KenWhitesell · October 29, 2022, 1:16am

Based upon what you’ve posted here, I’d take a more direct route.

I’d create a unique constraint on each of the kata and slug field, and just try to insert each word - making sure to catch the errors that occur on a duplicate entry. (Ignore or log the error as appropriate and continue on to the next word.)

Or, you could use the get_or_create method on each word. (Basically the same thing, you just don’t need to worry about any errors being thrown.)

Depending upon how large these lists are, you could also create a set from each of those lists and then take the difference of the two to find the words in the new list that aren’t in the current model.

wallcroft · October 29, 2022, 3:39am

Using set and intersection is what i’m doing now, i have tried using get_or_create before but keep getting IntegrityError, is it because i dont handle the error properly?

KenWhitesell · October 29, 2022, 3:45am

I’d have to see the code you tried and the complete specific error received. There shouldn’t be an error with get_or_create - you would be checking the return value to determine whether a new row was inserted.

Topic		Replies	Views
How to make sure something is written to the db before continuing? Mystery Errors	1	658	June 30, 2022
How to avoid IntegrityError and db collision when saving model after sanitize slug with slugify in Django? Getting Started	4	94	August 8, 2024
Build Tags on Save Using Django	6	1582	December 9, 2021
How to create "swapped" version of an existing lookup? Using Django	7	659	March 4, 2020
saving the slug with self.id Using the ORM	1	835	March 19, 2022

skip saving a data if the slug already exists

Related topics