I am troubleshooting an anomaly in the way a generator from InMemoryUploadedFile.chunks() decodes files.
I created five variants of CSV files from Microsoft Excel 2013:
sample-mac.csvsample-dos.csvsample-comma.csvsample-tab-delimited.csv(actually saves as .txt but a change of extension doesn’t hurt)sample-unicode.csv
Excel provides options for saving in those formats. The last encodes with "utf-16" by default. The rest are plain texts (ascii). The plain text variants differ in terms of the end-of-line character ("\r" for mac, and "\r\n" for others created on an MS-Windows system). Plain text files (including csv files) created on Linux use only "\n" for EOL.
So I have a file which is an instance of InMemoryUploadedFile and I get a iterable generator containing all the chunks from that file:
chunks = file.chunks()
# where chunks is a generator
and I take the text in the first chunk for sampling:
sampler = next(chunks)
The sampler is still a binary text at the moment. So I decode the text and observe …
print(sampler.decode(charset))
# where charset is ascii (None implies utf-8)
OBSERVATION:
The string characters in the sampler decode properly (as expected) except for sample-mac.csv (the first file sample). Somehow, somewhere … the chunks() mangle or truncate the string such that the final output is a miserable (small) version of the intended. Some characters are lost! Why? It only happens in sample-mac.csv.
Now here is the bummer:
when I open and read the same file directly via a python shell, it reads perfectly.
file = open("sample-mac.csv", "r")
read = file.read()
print(read)
file.close()
The above code prints out everything – same encoding and all. So python reads the same file properly but something messes with the file when it passes through InMemoryUploadedFile.chunks(). Does anyone have an explanation for this?