Thank you for accepting my proposal and providing me with this opportunity to contribute to Django. @smithdc1@carltongibson Thank you for guiding me through the proposal process and helping me make the necessary changes to my proposal. Sorry, I took a while to post this.
I want to get details on some of the things and provide an update on my progress so far
Communication:
What is the preferred mode of communication?. I am also setting up a documentation blog where I will be detailing my progress.
Schedule:
I have already completed some of the tasks that I had mentioned in my proposal so I will update the schedule
Current:
I am currently working on integrating some of the benchmarks in djangobench to django-asv, after Pr #7 gets merged I will further add some of the query benchmarks that I have migrated.
Congratulations on your proposal being selected for this years’s GSOC!
For me communication via this forum or GitHub works best. We can do email too, but we should aim to work in public.
Thank you for the PRs that you have already opened, we’ve already started to make progress on some items which is great.
I think over the next few weeks it would be useful to work on some of the project management aspects so that you can help manage us and to keep the project on track. I’m open to ideas here how we can do this with GitHub Issues or their new “projects experience” being a couple of ideas. Your proposal had a useful schedule so likely that forms a great starting point.
This will then allow some discussion on the plan to see if there’s any refinement needed at the outset. For example, under your 3rd milestone you mentioned “Write a python script to parse the output of asv compare command” and then to storing that in a database. I’d like to understand that further as I wonder if asv continuous helps in this use case. That’s a specific example, but the general point it let’s review the plan now and to “measure twice, cut once”.
On a related note, is there any infrastructure that you think the project needs? We can then work on getting that set up for you over the next couple of weeks with the ops team.
Don’t worry about being slow to post: we only found out on Friday, so it was on my list to follow up today.
For communication, I’d say the forum here or GitHub is a good start. If we need to do more we can later.
I’d say it’s a good idea to try to get the ball rolling earlier rather than later if possible. You’ve already started so… — but the time runs away quickly I find
The initial plan is to try to use GitHub Actions as our runner right?
To go with that, I wonder if a devcontainer.json for Codespaces so that it’s easy for everyone to fire up ASV and run with it is worth it?
Also, there’s a nice git scraping technique that we might think about using to write back the stored metrics to the repo periodically? (Again )
I’m working on various ASGI related tickets at the moment, so have in mind building a test project for the HTTP benchmarks phase later on.
I checked out the Github projects experience and I think it is more suited for managing different aspects of the project and keeping track of the progress. I have created a project and I currently adding my schedule to it, I have sent the access invite please suggest if there are any changes that I should make.
I wanted to clarify a few things
Should I run load testing and benchmarking in the same runner one after the other or separately at different schedules on different runners?
Should I store the benchmarking and load testing results in the same database or create a separate databse for each of them?
I checked out the output of asv continuous and it is similar to asv compare so the results of one of them can be parsed, it might easier to parse asv compare’s output.
Yes, my initial plan was to run the benchmarks using Github actions on a self-hosted runner, if the results are too noisy I will try to use other methods.
Since ASV manages the virtual environments and installation of the required packages, only ASV is to be installed so running the benchmarks won’t be much of a problem.
OK, let me ask the Ops Team about an option there…
It’ll just be a Django project… — with some views, and different configurations to serve them in different ways.
I have some preliminary bits, and I’m looking at some ASGI related tickets at the moment that touches on this, so focus on the djangobench/ASV part for now, and I’ll pull together a beginning at least.
I have set up a workflow here that writes the result of asv compare stable/3.2.x main to a text file results.txt, is this what you were looking for? please tell me the changes that I need to make to this.
I am currently migrating benchmarks from djangobench to django-asv and refactoring the existing benchmarks in django-asv, I will add the workflow this weekend.
I’m missing something here, likely you are a few steps ahead of me.
When ASV runs it generates a results directory containing a “database” (a JSON file) of the results. What’s the benefits of parsing the results and storing it in our own database over using the provided “database” which then allows them to be viewed.
I am very sorry I seem to have misunderstood the task at hand, I had thought of storing the results of asv compare but publishing the results to a github.io website and storing the JSON file seems to be better. I will implement it once I am done with the benchmark migration
As I had mentioned in my proposal one of my ideas to run the benchmark workflow on pull_request or commit made to Django was to use a workflow_dispatch to trigger the benchmark workflow, since it requires a personal access token I set up two temporary repositories benchmarks and Main to demonstrate the working of the workflows. I was able to trigger a workflow_dispatch from Main to benchmarks when a commit was pushed to Main and the results of the run are
Main - trigger workflow on commit
benchmarks - run benchmarks on workflow dispatch event
No need to be sorry! You are doing a great job, I am sure that we will all be learning lots this summer. Let’s keep discussing and working through the issues as we find them.
On the access token, I’ll defer to the fellows.
Personally I’d be concerned about security issues and think we should be asking if there’s any other options available. Can we run on a schedule for testing against main (is once a day / week enough given the environmental impact)?
For prs what happens if we add a label, does that trigger a workflow (and do we still need an access token)? Also what about buildbot? We have that for selenium and oracle tests?
I’m not sure about most of this. Just questions really to help the discussion and to improve out decisions.
Remember, you are likely now the world expert in benchmarking django
Do we need to remove the results folder from .gitignore.
One (likely wrong) thought is to save to a (e.g.) ci-results folder from the GHA, and then have a script to move that into results before the publish call. (The question here just being, do we need to keep the results folder empty.)
Could we accept PRs for results? So I run locally, with my machine name, but can submit that to the collected pool? (Is this worth it?) (This is a LATER question I think )
Could we run publish to output to GitHub pages? The output in the html folder is entirely static no?
I am sorry to inform you that I won’t be able to work till Monday as I have my Internal Assessment examinations, I will try to work for at least an hour and I will resume my work as soon as I can.
Added workflow to run the benchmarks regularly and add the results to the repository and added a workflow to publish the results to a website
Migrated most of the benchmarks from djangobench to django_asv, discussion on adding benchmarks that modify the existing settings is still going on
Work in the upcoming weeks:
Create a pipeline using azure-pipelines or Jenkins or buildbot to run the benchmarks and add the results when a pull request is labeled in the django repository
Create test harnesses using Locust and benchmark different ASGI and WSGI servers
This is an updated post of my progress along with the milestones yet to be achieved and the time remaining to complete them
Currently working on:
Migrating benchmarks in djangobench to django-asv:
Proposed Deadline: June 30
Progress: Most of the benchmarks have been migrated to the django-asv repo and 4 benchmarks that modify the existing settings specified are yet to be migrated and the discussion about them is currently going on
Running the benchmarks regularly:
Proposed Deadline: July 18
Progress: I have used Github actions to set up workflows that run the benchmarks daily and commit the results back to the repository. I have also set up a workflow that publishes these results to a website
Yet To Do: Implementation of the feature by which the benchmarks can be run when a pull request is labeled in django\django or a comment is made.
Storing the benchmark results:
Proposed Deadline: July 25
I had initially proposed to store the benchmark results in a database but @smithdc1 indicated that the benchmark results are already available in a JSON file and it is searchable so I will be skipping this
Work in upcoming weeks:
Create a test harness using Locust
Proposed Deadline: July 30
Add a locustfile.py and define the tasks to be performed during load testing
Perform load testing:
Proposed Deadline: August 25
Develop a Django project and perform load testing on it by writing dockerfiles to containerize the test harness and the Django project with different ASGI and WSGI servers and run them using docker-compose.
Add test results to a database
Proposed Deadline: August 30
I am looking into ways to store the test results of load testing
Update documentation:
Proposed Deadline: September 4