Answers Re: Test your knowledge of simulation weather file formats Part 1: the DOE-2 *.BIN/*.BINM format

11 posts / 0 new
Last post

I got a total of five responses,? all of which are attached at the end of this e-mail.?
Although I was focusing on the FORMAT of the *.BIN weather file, most of the respondents
focused on the data instead, and to my chagrin more intensely than I have, esp. Julien who
pointed out an anomaly in the direct normal radiation that made me gasp and go back into
my file and processing procedures.? Kudos to Julien for spotting that, but I'll explain
what happened there following his e-mail attached below.

The three answers I was seeking are:

(1) *The weather file is for a location above the Arctic Circle*, which would make
DOE-2.1E crash due to a bug in the shading calculation but not in DOE-2.2 which fixed this
bug in 2004.? The point here is that this is a problem in DOE-2.1E but NOT in the *.BIN
format, except it might seem that way because the DOE-2 weather packer also crashes
because it uses the same shading routine to generate the weather statistics.

(2) *The weather file is for a leap year with the output file showing Feb. 29th*.? I've
heard many people say that DOE-2 weather files contain only 365 days, but that's
absolutely not true.? The *.BIN format stores data for 12 months of 32 days each, or 384
days!? The reason that Feb. 29th never shows up in a DOE-2 run is that the developers
never bothered to reset the February day count to 29 on leap years, even though DOE-2
calculates when that's the case.? Now that more people are running DOE-2 with actual
historical years, it's well past time for this little fix to be implemented.

(3) *The weather file reports shows the weather parameters to an additional decimal of
precision*, i.e., temperatures are to the 0.1F, pressures to 0.01 inches of mercury, solar
radiation to 0.1 Btu/sqft, and wind speeds to 0.1 mph.? This required a modest change to
the BIN format that I implemented as the *.BINM (M for Modified) starting in 2011.? This
leads to what I think is the most fascinating part about the DOE-2 *.BIN format that was
developed in the early 1980's when computer memory was very limited. Members of the
original development team (Ender Erdem and maybe Fred Buhl) came up with the strategy of
"packing" the data by converting all data to integers, pack four integers into one big
integer, and then store them in the file in binary form. By so doing the *.BIN files are
only 146K (70-80KB zipped), whereas other formats can be well over 1MB (200+KB zipped).
DOE-2 also uses the *.BIN format to improve execution speed by not reading the data an
hour at a time or 8760 ASCII reads, but by reading 16 day chunks at a time or 24 binary
reads? for the entire year (which also explains why the *.BIN format contains 24x16 or 384
days).

So now let's look at how did our contestants do, listed in order of when I received their
answers:

Parag -? 0 out of 3? (he looked almost entirely on the data, not on the format)

Julien -? 1 out of 3 (he noticed the leap day, but also focused his attention on the data
and pointed out two problems that I will address following his e-mail? below)

Aaron - 2 out of 3 (I was impressed that he knew about the packing and unpacking process,
but did not notice the leap year)

Javed -? 0 out of 3 (but had a good question on why DNI (Direct Normal) is larger than GHI
(Global Horizontal) that I will answer following his e-mail below)

Nathan - 1 out of 3 (he also noticed the leap day, and had other questions that I will
answer following his e-mail below).

So, nobody noticed all three answers, but since it's the holiday season, I will make them
all winners and provide a historical year weather file of their choosing. Just e-mail me
if you're interested and tell me which one you want.

This has been an interesting and insightful experience for me. I hope others also found it
entertaining and useful, as well. I've learned that (1) think twice before coming out with
a flawed contest rule, (2) look over more times whatever I put out on the Web.

(please be sure to read the contestant's submittals and my responses below)

Joe

Joe Huang
White Box Technologies, Inc.
346 Rheem Blvd., Suite 205A
Moraga CA 94556
yjhuang at whiteboxtechnologies.com
http://weather.whiteboxtechnologies.com for simulation-ready weather data
(o) (925)388-0265
(c) (510)928-2683
"building energy simulations at your fingertips"

Attached e-mails? follow in the same order (only final e-mails shown)
-----------------------------------------------------------

Joe Huang's picture
Offline
Joined: 2011-09-30
Reputation: 406

Joe,

This was fun, thanks for putting it together, I'm looking forward to the
next one.
I have to say that I'm impressed by Aaron's knowledge and level of detail,
and it's not the first time.

I've used Python to parse your hourly dump and visualize it, as well as
comparing with the Barrow EPW (converting your hourly dump to SI Units).
I've posted that on Github, where you can already look at the code + graphs
since it's in a jupyter notebook, but you could also download it and run it
on your machine. My hope is that I'll convert one or two members of this
list to coding :)
For the record, the actual parsing of the hourly dump into a usable format
(pandas.DataFrame) took me about 5 minutes and 16 lines of code (it was a
very simple one though).
https://github.com/jmarrec/BINM_Challenge/blob/master/Analyse_BINM.ipynb

Cheers,
Julien

--
Julien Marrec, EBCP, BPI MFBA
Owner at EffiBEM
T: +33 6 95 14 42 13

LinkedIn (en ) *| *(fr
) :

2017-12-07 10:40 GMT+01:00 Joe Huang :

jmarrec's picture
Offline
Joined: 2013-01-09
Reputation: 0

Joe,

This was fun, thanks for putting it together, I'm looking forward to the
next one.
I have to say that I'm impressed by Aaron's knowledge and level of detail,
and it's not the first time.

I've used Python to parse your hourly dump and visualize it, as well as
comparing with the Barrow EPW (converting your hourly dump to SI Units).
I've posted that on Github, where you can already look at the code + graphs
since it's in a jupyter notebook, but you could also download it and run it
on your machine. My hope is that I'll convert one or two members of this
list to coding :)
For the record, the actual parsing of the hourly dump into a usable format
(pandas.DataFrame) took me about 5 minutes and 16 lines of code (it was a
very simple one though).
https://github.com/jmarrec/BINM_Challenge/blob/master/Analyse_BINM.ipynb

Cheers,
Julien

--
Julien Marrec, EBCP, BPI MFBA
Owner at EffiBEM
T: +33 6 95 14 42 13

LinkedIn (en ) *| *(fr
) :

2017-12-07 10:40 GMT+01:00 Joe Huang :

jmarrec's picture
Offline
Joined: 2013-01-09
Reputation: 0

Julien,
Thank you for the Python code. One question ? how do you download just the code without the .png data in the file?

Christopher R. Jones, P.Eng.
Technical Specialist
Sustainability & Energy

[cid:image001.png at 01D36FFE.3C16F470]
T +1 416-644-0252

2300 Yonge Street, Suite 2300
Toronto, ON M4P 1E4 Canada

wsp.com

Please consider the environment before printing...

Jones, Christopher2's picture
Joined: 2017-10-12
Reputation: 0

Christopher,

This is a jupyter notebook. See http://jupyter.org/.
You can do "pip install jupyter" then start a server with "jupyter
notebook" in a terminal, navigate to the location of the ipynb file and
open it up. Then you can just run it there, cell by cell interactively
(CTRL+ENTER to run a cell).
This is **hugely** helpful in anything that has to do with data analysis,
because it allows for interactive exploring. I strongly suggest you try it
out.
Note that if you have installed a scientific python distro such as
Anaconda, you should already have it installed.

You can also download this as a python file from the notebook, which I've
done out of convenience for you (see attached).

Best,
Julien

--
Julien Marrec, EBCP, BPI MFBA
Owner at EffiBEM
T: +33 6 95 14 42 13

LinkedIn (en ) *| *(fr
) :

2017-12-08 14:26 GMT+01:00 Jones, Christopher :

jmarrec's picture
Offline
Joined: 2013-01-09
Reputation: 0

Thank you Sir!

Christopher R. Jones, P.Eng.
Technical Specialist
Sustainability & Energy

[cid:image001.png at 01D37006.4C428960]
T +1 416-644-0252

2300 Yonge Street, Suite 2300
Toronto, ON M4P 1E4 Canada

wsp.com

Please consider the environment before printing...

Jones, Christopher2's picture
Joined: 2017-10-12
Reputation: 0

Julien,

When I first read your post, I had the wrong impression that the python script was
disassembling (unpacking) the binary *.BINM file, which would be quite an achievement
since I hadn't documented yet the packing procedure for the extra precision, nor have the
original packing been explained anywhere but in the actual Fortran source code.

After looking at the python script, I then realized it was reading the ASCII dump of the
weather file, which I've said had a embarrassing glitch in my unpacking :-)

The original packing/unpacking can be seen in the Fortran code of the fmtwth2.f and
wthfmt2.f? programs that are part of the DOE-2 release package.? In DOE-2.2, the same two
programs have been renamed as? MKAFT and PKAFT.? My additions can be seen in the Fortran
code of the fmtwth2M.f and wthfmt2mleapyr.f but please be warned that I need to figure out
the mysterious extra line I found that was throwing off the solar radiation in November
and December.? The source code for all 4 programs are attached, in case anyone wants to
port them to python

Joe

Joe Huang
White Box Technologies, Inc.
346 Rheem Blvd., Suite 205A
Moraga CA 94556
yjhuang at whiteboxtechnologies.com
http://weather.whiteboxtechnologies.com for simulation-ready weather data
(o) (925)388-0265
(c) (510)928-2683
"building energy simulations at your fingertips"

Joe Huang's picture
Offline
Joined: 2011-09-30
Reputation: 406

Joe,

Thanks for sharing these. The other day I went looking for the MKAFT and
PKAFT, but realized I could only find them as a windows exe so I gave up on
this (I could have installed them on linux using WineHQ but I didn't care
enough).
I was able to quickly compile these as a Unix exe using gfortan, and I can
at least run them.

I tried out of the curiosity to see how I could port this to python, but
this isn't going to happen for me (at least not without someone explaining
very simply what the code does): I generally dislike Fortan syntax *very
much *so I never got into it, and while I can generally follow scientific
fortran about fine, the I/O stuff is really incomprehensible to me.
So far I was just able to figure out how to unpack the IWID and IWYR, after
30 minutes of scratcthing my head, the rest I get numbers but they don't
match (I'm trying to read 4 bytes as an int for WLAT, WLONG, etc)
I also have never really dealt with packed binary files (seems to me like
it's completely unnecessary nowadays that we have gigabytes of RAM and
plently of disk space). At least I learned some new vocabulary, such as
Hollerith strings (which I found out later, the fortran compiler itself
told me it's deprecated)

I don't understand this in wthfm2Mleapyr.f:

DO 100 IM1=1,12
READ (10) (IWDID(I),I=1,5),IWYR,WLAT,WLONG,IWTZN,LRECX,NUMDAY,
_ CLN(IM1),GT(IM1),IWSOL
READ (10) IDUM
100 CONTINUE

Is the format expected in the binary file 20 chars followed by 6 integers
(IWYR,WLAT,WLONG,IWTZN,LRECX,NUMDAY,) + 12 integers (CLN) + 12 integers
(GT) + 1 integer (IWSOL) + 1 integer (IDUM)?

I've tried this in python:

import struct
with open('AK_BARROW-W-POST-W-ROGERS-AP_700260_12.BINM', 'rb') as f:
bindata = f.read()
fmt = '20s6i12i12iii'
start_pos = 4 # Apparently I have to skip the first 4 bytes.
end_pos = start_pos + struct.calcsize(fmt)
struct.unpack(fmt, bindata[start_pos:end_pos])

Which give me this, and it doesn't seem to match the WEATHER.FMTM file
generated by the fortran utility:

(b'BARROW-W-POST700260 ',
2012,
1116639068,
1125960909,
9,
1,
31,
1063675494,
1139180158,
5,
9849418,
328,
200,
98569,
9849418,
583,
200,
98567,
9915211,
648,
327,
100618,
9850960,
648,
326,
106761,
9851217,
9,
326,
106764,
9851475,
649,
454)

Cheers,
Julien

--
Julien Marrec, EBCP, BPI MFBA
Owner at EffiBEM
T: +33 6 95 14 42 13

LinkedIn (en ) *| *(fr
) :

2017-12-09 4:23 GMT+01:00 Joe Huang :

jmarrec's picture
Offline
Joined: 2013-01-09
Reputation: 0

Julien,

I think we should take this offline as it's getting into the weeds.

Joe

Joe Huang
White Box Technologies, Inc.
346 Rheem Blvd., Suite 205A
Moraga CA 94556
yjhuang at whiteboxtechnologies.com
http://weather.whiteboxtechnologies.com for simulation-ready weather data
(o) (925)388-0265
(c) (510)928-2683
"building energy simulations at your fingertips"

Joe Huang's picture
Offline
Joined: 2011-09-30
Reputation: 406

Joe, I agree with you.

In order not to let anyone outside, if you?d like to be kept in the loop, feel free to email me personally.

Best,
Julien

??
Sent from a mobile device, please excuse the brevity.
Julien Marrec, EBCP, BPI MFBA
Owner

Direct: +33 6 95 14 42 13
Website: www.effibem.com

LinkedIn (en) | (fr)

jmarrec's picture
Offline
Joined: 2013-01-09
Reputation: 0

Julien,

I'm getting the digest version of this, so hopefully I'm responding to the
latest version. Feel free to forward to anyone else.

A few years ago I attempted to write a Java tool that could read DOE2 bin
files and do some other stuff, so I spent a little time trying to reverse
engineer the MKAFT and PKAFT files. You can find the github page for this
project here . In the
src/alexander/doe2/DOE2File.java file, you'll see a comment which
summarizes the contents of a DOE2 bin file. Hopefully, this can be an
extra resource for you as you're working with the Fortran file.

The following might help with your issues.

1. Latitude, longitude, clearness, and ground temperature are 4 byte floats
not ints. (Old school Fortran relies on implicit typing. Any variable
starting with A-H or O-Z is a floating point and everything else an integer)
2. The id is 20 characters long (20 bytes), but the clearness is given as
just a single int at a time and the ground temperature a single float at a
time. A single header should look like

20 chars + 1 int + 2 float + 3 int + 2 float + 1 int

The next "integer" IDUM does nothing. Since Fortran uses the dummy 4 bytes
before and after each record, it automatically will skip to the next
header. The space after the header is actually an array of integers. The
reading of this (which is the hourly data) happens further below with the
READ....IDAT30 statement. Since you are using Python, you will have to
read through this entire array before you get to the next header. The
array is 6152 bytes long for standard bin files and 7688 bytes long for
binm files. If you just skip this many bytes at this time in your code,
you will find that you arrive at another header (after stripping 4 + 4
bytes for the end and beginning of a Fortran record). Once you get to the
point of trying to get data out of the integer array, I think you'll start
to see what Joe was talking about with the ingenuity of the original
developers.

On a side note, while binary files are a pain and computers today are much
more powerful, I don't know that they are completely obsolete. I only say
this because my own code always seems to run much slower when I start
reading/writing a bunch of ASCII. I'd be curious to know other's more
expert take on this, but I'll just give an example here. A while back I
also wrote a utility to decode the BDLKEY.BIN file for use with some of my
tools. You can find this project here
. This tool reads the
BDLKEY.BIN file and writes it to an ASCII file with the same information.
An example output of this can be found in the repo here
. The length of this file turns
out to be over 7 MB in size. I'm guessing this would take a
noticeable amount of time to parse, yet, somehow DOE2 is able to load in
this same amount of information in split seconds when we run simulations.
If the speed is related to the file being accessed as binary, then I think
this form of I/O should be seriously considered by any simulation engine or
tool as a means for efficiency.

Aaron

From: Julien Marrec
To: Joe Huang , Papa Marrec <
francois.marrec at gmail.com>
Cc: "Jones, Christopher" , EnergyPlus_Support <
EnergyPlus_Support at yahoogroups.com>, Javed Iqbal , "
eQUEST-users at onebuilding.org"
Bcc:
Date: Sat, 9 Dec 2017 15:21:07 +0100
Subject: Re: [Equest-users] [Bldg-sim] Answers Re: Test your knowledge of
simulation weather file formats Part 1: the DOE-2 *.BIN/*.BINM format
Joe,

Thanks for sharing these. The other day I went looking for the MKAFT and
PKAFT, but realized I could only find them as a windows exe so I gave up on
this (I could have installed them on linux using WineHQ but I didn't care
enough).
I was able to quickly compile these as a Unix exe using gfortan, and I can
at least run them.

I tried out of the curiosity to see how I could port this to python, but
this isn't going to happen for me (at least not without someone explaining
very simply what the code does): I generally dislike Fortan syntax *very
much *so I never got into it, and while I can generally follow scientific
fortran about fine, the I/O stuff is really incomprehensible to me.
So far I was just able to figure out how to unpack the IWID and IWYR, after
30 minutes of scratcthing my head, the rest I get numbers but they don't
match (I'm trying to read 4 bytes as an int for WLAT, WLONG, etc)
I also have never really dealt with packed binary files (seems to me like
it's completely unnecessary nowadays that we have gigabytes of RAM and
plently of disk space). At least I learned some new vocabulary, such as
Hollerith strings (which I found out later, the fortran compiler itself
told me it's deprecated)

I don't understand this in wthfm2Mleapyr.f:

DO 100 IM1=1,12
READ (10) (IWDID(I),I=1,5),IWYR,WLAT,WLONG,IWTZN,LRECX,NUMDAY,
_ CLN(IM1),GT(IM1),IWSOL
READ (10) IDUM
100 CONTINUE

Is the format expected in the binary file 20 chars followed by 6 integers
(IWYR,WLAT,WLONG,IWTZN,LRECX,NUMDAY,) + 12 integers (CLN) + 12 integers
(GT) + 1 integer (IWSOL) + 1 integer (IDUM)?

I've tried this in python:

import struct
with open('AK_BARROW-W-POST-W-ROGERS-AP_700260_12.BINM', 'rb') as f:
bindata = f.read()
fmt = '20s6i12i12iii'
start_pos = 4 # Apparently I have to skip the first 4 bytes.
end_pos = start_pos + struct.calcsize(fmt)
struct.unpack(fmt, bindata[start_pos:end_pos])

Which give me this, and it doesn't seem to match the WEATHER.FMTM file
generated by the fortran utility:

(b'BARROW-W-POST700260 ',
2012,
1116639068,
1125960909,
9,
1,
31,
1063675494,
1139180158,
5,
9849418,
328,
200,
98569,
9849418,
583,
200,
98567,
9915211,
648,
327,
100618,
9850960,
648,
326,
106761,
9851217,
9,
326,
106764,
9851475,
649,
454)

Cheers,
Julien

--
Julien Marrec, EBCP, BPI MFBA
Owner at EffiBEM
T: +33 6 95 14 42 13 <+33%206%2095%2014%2042%2013>

LinkedIn (en ) *| *(fr
) :

2017-12-09 4:23 GMT+01:00 Joe Huang :

Aaron Powers2's picture
Offline
Joined: 2011-09-30
Reputation: 0