Advantage of session data?

Hi, I have a situation where I want view A’s template, lets call it templateA, to pass information to view B via a GET form. In this case the ‘information’ is a particular attribute of a particular entry in my database, lets call the attribute attribA. Are there any pros or cons of these two approaches?

(1) give view B another parameter, eg. def viewB(request, param) and then send the information in the template: <form action="{% url 'viewB' param %}" method="GET">.
viewB now has the objects id (assuming that is what param is) and can get attribA via query
SomeModel.objects.get(pk=param).attribA

(2) viewA adds attribA to the session dictionary: request.session['attribA'] = attribA
viewB can now access it directly rather than having to query the database using the passed objects id.

The second way seems better to me, but looking at the docs the default way of storing session data IS a database… So when request.session['attribA'] = attribA is called, is it actually faster/more efficient than (1)? I understand that using a file or cached sessions could improve it but then what is the point of the database option/default, is it because it will generally be a much smaller database?

Option 1 has the potential of being interfered with on the client side. Using the browser’s developer tools, the individual would have the ability to modify the action URL within the form. (Whether or not that’s a concerns is a different question. It may or may not be a “problem” that needs to be addressed. This concern always exists for data being retrieved from the browser.)

<opinion>
I think you’re asking the wrong question about option 2. Unless you can demonstrate that you have a performance problem, and that your site is being hindered by queries being performed, you’re wasting time and energy looking into a non-issue.
</opinion>

Beyond that, your question about option 2 is actually impossible to answer with any degree of certainty. A database maintains its own internal cache - not all queries necessary involve actually retrieving data from disk. If view A has already retrieved attribA via query, and view B executes the same query, there’s a significant probability that the result set is still in the database buffers.

And beyond that, Django has its own cache available. If attribA is always the same value for SomeModel.pk = param, you could store it in the Django cache, assuming that two people retrieving that same instance of SomeModel are supposed to see the same attribA.

However, the issue with this is ensuring that the cache remains consistent with the database. What happens if attribA is changed in another session after the cached value is stored but before the that user retrieves it? The user is now looking at inaccurate data.

Bottom line: If you want, cache the param in session. But query the database in the view. Don’t try to manage your own data cache unless you know all the potential issues and are prepared to mitigate those problems yourself. (And that seems like a lot of time spent on something that might possibly, potentially, and only theoretically become a problem under what is likely to be very unusual circumstances.)

Thanks for your advice Ken,

Regarding your first point of users interfering with the url client-side, I’m guessing this would solved by making it POST instead of GET?

Yes, I probably am focusing on something unecessary you are right, I was just curious because I’ve tended to do option 2 generally speaking but I hadn’t realised that the session data was by default held in a database.

Thats interesting, I didn’t know the database maintains its own internal cache, going to read about it in the docs. I assume the nature and size of the cache is dependent on the underlying DB? I’m just using SQLite at the moment for development but was planning to transfer to Postgresql when I launch it.

Would this example you give of attribA potentially getting changed in another session while another user session has cached but not yet read it be an example of a lost update problem? In my particular case once the attribA is written it cannot be overwritten so it would be ok I imagine but in general I see why this can be a major problem.

Ok, I will stick to just using the default DB session data for now, thanks for the tips. I am only making a small website as a personal project to learn so I know I don’t need to worry too much about things like performance and inconsistencies of many transactions but I’ve been recommended before that its good to treat small projects (especially when their main purpose is to learn) as if they were large ones, to learn good practises as you go along.

No, POST provides no additional assurances.

You can’t assume anything coming from the browser hasn’t been modified by the user. You can not prevent the user from altering data being sent to it and then returning that altered data. The best that you can hope to do is detect that what you are receiving is not valid based upon what you know about what was originally sent.

Nature and size of the db cache is not only dependent upon the db engine, but is usually also configurable to various degrees. There’s a lot you can do with PostgreSQL to manage how much memory is used by the cache.

Not really a lost-update issue - in the example I posed, the database had been updated. It’s just that the second user is looking at the data before it was updated, so they’re not seeing the change made.

I agree with treating small projects like large ones - using the techniques to build a base of knowledge of good practices.

One of those good practices is recognizing where your effort should be spent - and rarely should it be spent on “micro-optimizations”.

What you want to focus on are good algorithms and patterns that use the framework and environment effectively, not worry about whether statement “X” is faster or slower than statement “Y” - unless you’ve got profile data demonstrating that that difference is going to make a material difference in the overall throughput of your system. Saving 100 ms with your choice of code isn’t going to matter if it’s a difference between a page taking 15 seconds to generate instead of 14.9 seconds.

For example, it’s going to be a lot better to make sure your ListViews don’t create an “n+1” query problem, or that you don’t create a query that builds a Cartesian product between two large tables, or build an HTML select list with 1000 choices, etc, etc. (I’ve done these more times than I could possibly count.)

The Performance and optimization page in the docs has some additional information on this as well.

Ken

Ok, good to know. I was under the naive assumption that since POST data was sent in the request body it was harder for users to manipulate the response since I thought they wouldn’t be able to manipulate the url in a way that made sense. I need to read more about HTTPS.

Yes, at that point it wouldn’t be a lost up date issue, but if user 2 sees an older version of attribA than what is stored in the DB and then (assuming attribA in some integer for eg.) increments it by 10 and saves this to the database, any increments made by user 1 would get lost/overwritten no?

Thanks for your advice, I have read about the n+1 problems and ways to avoid them, I will definitely read up on the other two more major problems you mention. I thought that when you did a natural join on two tables that internally some Cartesian product might be was generated followed by elemination of certain rows not matching the query, but I guess that wouldn’t make sense. I’ve only ever worked on small “dummy/example” relations so I probably haven’t noticed the performance differences between good vs poor practise yet. I will take your advice and try to focus on these sorts of things rather than “micro-optimizations” as you say.

Mike