Hi, I’m wondering if anyone can help me with a problem I’ve been stuck on for weeks.
I’m building a Django API (using the Django Rest Framework) that streams a response from the OpenAI API and I’m having trouble getting it to work in production. I want it to start streaming the second it gets the first chunk.
Right now, everything works as expected in localhost, as well as an AWS EC2 instance (in Ubuntu) over HTTP. However, when we move the API to HTTPS, it waits until the response is fully generated to return.
Here is the code for the view responsible (assume generate_thing returns a streaming response from OpenAI):
def stream(response):
for chunk in response:
text = chunk["choices"][0]["text"]
yield text
@api_view(['POST'])
def example(request):
data = json.loads(request.body.decode('utf-8'))
response = generate_thing(data)
return StreamingHttpResponse(stream(response), content_type='text/event-stream')
I also explored using Django Channels to create a WebSocket at the recommendation of someone from StackOverflow. While that worked, it can only handle one request at a time, which is obviously not ideal for a production-grade app.
Here’s the code from that attempt:
class ExampleConsumer(WebsocketConsumer):
def connect(self):
self.accept()
def stream(self, response):
for chunk in response:
text = chunk["choices"][0]["text"]
self.send(text)
def receive(self, text_data):
data = json.loads(text_data)
response = generate_thing(data)
self.stream(response)
self.close()
Keep in mind that I’ve been using Django for only a month so I’m not too familiar with the ins and outs and I’m learning as I’m building. Any insight would be extremely helpful.
In general, that is not an accurate statement - certainly not globally or universally true.
It should not limit multiple people from connecting concurrently and issuing requests. It may help if you provided more details around your environment and the client code that you are using for this websocket.
If you’re talking about issuing multiple requests concurrently through the same websocket, then you have different issues to address, like keeping the data segregated between those requests.
Yes, it’s quite possible. However, it’s going to be up to you to multiplex the requests and responses. There’s nothing within a websocket frame itself to identify which request some data is being responded to, and so that means it would be up to you to define and implement an internal protocol within that websocket connection.
For example, in one of the systems I work on, every websocket frame is a JSON object with at least two keys, “app” and “data”. Each app can then define requirements for additional keys, such as “req” for a request number and “seq” for a sequence number.
Our different worker modules in Channels correspond to the “app” key, and generates the response to be returned to the browser through the consumer, populating the objects as appropriate.
On the server side of this, you’d probably want to either implement an async consumer or else off-load the generation of this data stream to a separate async worker process.
Before doing all this, you might also want to verify that these requests are sufficiently “parallelizable” to make it worthwhile to convert them to async requests. If they’re heavy CPU bound, you’re not likely going to see any benefit unless you offload those requests to separate systems. Otherwise, you may be better off just queuing the requests and handling them sequentially.
I’m running into the same issue where I can’t get a streaming response when deploying to a production environment.
Running locally works all fine, I get my streamed data as expected. But as soon as I move to a production env. all data gets buffered before sending out. Seems exactly like your issue.
Did you manage to solve the issue or figure out what was happening?
Hi everybody,
I am still trying to stream the response locally, could you share the code you used to get the response chunk by chunk locally? For me the whole response is being sent only once all the chunks have finished processing
Heres a youtube video on how to build a StreamingHttpResponse,
also if you have GzipMiddleware, disable it and use it as a decorator for your other views, GzipMiddleware won’t let it stream.
Hope this helps, if you have any more problem, let me know
I have not handcoded it myself So I am not pretty sure if it will work for your usecase or not. But after reading your answer I researched about using django channels for async responses and I stumbled upon this article Learn to use Websockets with Django by building your own ChatGPT and one of the section of this article is making the code production ready. There the whole point is about making it asynchronous, where as you code is totally synchronous. I guess if you try it it might fix your bug.
I encountered the same issue that Adam Thometz raised. How do I solve this? I use Azure OpenAI.Facing issue using StreamingHttpResponse over HTTPS. Building a Django API that streams a response from the OpenAI API and I’m having trouble getting it to work in production. Right now, everything works as expected in localhost.when we move the API to HTTPS, it waits until the response is fully generated to return. Does anyone know about the issue?