Downloading and parsing Dukascopy tick data with Birt’s PHP scripts
The Dukascopy data is available on the web in its raw form as files that span only 1 hour, so it becomes apparent that some tools are necessary to download and parse it. Before it was possible to get the data via any of the other methods, I made a series of scripts that I still use nowadays for downloading the free tick data available from Dukascopy. I’m a fan of the PHP simplicity, so I chose that to write the scripts. They’re not commercial quality code, but they work.
You can get the PHP script archive from the tick data downloads page.
You will find 4 scripts inside:
- A script for downloading the Dukascopy data, suggestively named “download_dukascopy_data.php”. As a courtesy to Dukascopy who is graciously providing free data, the script does not attempt to download files that are already on your harddisk. However, it still requests missing files, so to avoid stressing their server please set the dates in the $currencies array at the beginning of the script to the date of your last download; they’re using the standard unix timestamps (epoch date, which is in essence the number of seconds since 01.01.1970). If you want to easily convert from a regular date to such a unix timestamp, you can use Epoch Converter, a very easy to use online tool.
- A script for processing the downloaded data, which assumes that it is located in the same directory as the previous script and that the data was downloaded there (process_dukascopy_data.php); this one needs some parameters, run it without any for a description or check out the next script.
- A small shell script that will process all the downloaded data available in .bat form for windows and .sh form for linux.
Windows download & convert to CSV how-to
Start by visiting the windows PHP download section and fetching the latest binary version as a zip file.
Once you’ve done that, unpack it to c:\php\ and also unpack the scripts from the script archive you downloaded in the same directory.
Rename c:\php\php.ini-development to c:\php\php.ini. If your folder does not contain a file named php.ini-development, use php.ini-dist or any other php.ini-something file you can find.
Edit c:\php\php.ini, search for
;extension=php_curl.dll
and remove the semicolon in front of the line and add an “ext/” in front of “php_curl.dll” so that it looks like this:
extension=ext/php_curl.dll
Save the file and exit.
If you run into a zip error and your PHP installation has an ext/php_zip.dll, also apply the method above for extension=ext/php_zip.dll.
Head to the 7-Zip download page and get the command line version. Unpack it and put 7za.exe in the same directory (c:\php\).
Click start->run and type
cmd
then click ok (or alternatively type cmd and hit enter in the windows 7/vista “search programs and files” box in the start menu).
Type
cd \php
in the command window.
Type
php download_dukascopy_data.php
Have a coffee. Have another coffee. Go sleep. Go to work. Go to the gym. Go to a club. Wait some more. I’m not kidding, it takes ages. You can check the progress by watching the currency pair directories get filled. If you get any strange errors, run the process again when it’s finished – it will only download any files that were missed in the first step due to network errors.
If you only want to download some of the currency pairs available, you can edit download_dukascopy_data.php and change the array at the beginning of the file. You can switch the currency pair download order or completely remove the pairs that you don’t want. The number next to each pair is the unix timestamp at which to start downloading; if you wish to start at a later point in time (the default is the earliest date available) you can use epochconverter.com to get the timestamp for your chosen date.
When the download is finished, assuming you wanted to get the EURUSD data up to 01.01.2012, you’d type
php process_dukascopy_data.php EURUSD 200702 201201 EURUSD.csv
and the output will be placed in EURUSD.csv.
Alternatively, you can type
process.bat
which will batch process all the currency data. It’s mostly safe to ignore the error spam at this step. Note: if you use process.bat or process.sh, you might have to update the ending dates in them to get the full data range!
This should be it, if everything went fine you should have your CSV files in the same c:\php folder and you should be ready to proceed with preparing your tick data for Metatrader 4.
Warning: make sure you have enough space on your harddisk. As of 2012, the downloaded tick files have over 20 GB and if you add up the size of the resulted CSV files you will be well past the 100 GB mark.
-
-
#4 written by Pya March 20, 2012 (1 year ago)
-
#7 written by Armin March 22, 2012 (1 year ago)
Hey! I tried to get the processingscript up and running on my Mac. It always complained about not having installed LZMA, although it is installed and running as it should. The problem is, that the script checks for the OS using stripos(PHP_OS,’win’); On a Mac, PHP_OS returns ‘Darwin’ and therefor the check for Windows doesn’t return false. Maybe this helps someone who runs into the same problem. Anyways thanks a lot for your great work! Keep it up! Best regards
-
#8 written by birt March 22, 2012 (1 year ago)
LOL, that’s quite ironic given the fact that I use an iMac. I never used it for tick data, though; I use my Linux box for that so I didn’t know PHP_OS says Darwin. I fixed it to count Darwin as non-windows. The change is in v0.25 which is now available for download but you probably shouldn’t bother with it if you changed your copy already.
-
#9 written by Ruben April 19, 2012 (1 year ago)
-
#10 written by birt April 19, 2012 (1 year ago)
First get and install The Mac Ports and once you did that open a terminal and type:
sudo port install lzmautilsWhen you’re done with that, you will have an /opt/local/bin/lzma that will be “detected” by the php script.
-
-
-
#12 written by Tyler March 26, 2012 (1 year ago)
I noticed there are some “zero sized” .bi5 files in my tick downloads using the PHP script (which works great!). I realize this happens each weekend, but there are other periods of one or more hours where there are zero sized .bi5 files. Can anyone confirm whether this is due to a download issue or if these are real (and permanent?) gaps in Dukascopy’s history? Does anyone know if Dukascopy plans on correcting these, if they are real? I have tried deleting them and re-downloading, and they remain zero sized, which makes me think they are truly not in Dukascopy’s history. If there’s a chance these will be filled in in the future, might it be worth making a PHP script to delete all zero-sized .bi5 files that do not occur on weekends? If not, I can create my own in Python if anyone thinks there’s a chance these gaps will be corrected.
If anyone could comment on this or share their experiences, I’d appreciate it.
-
#13 written by Armin March 27, 2012 (1 year ago)
-
#14 written by Tyler March 29, 2012 (1 year ago)
Sorry Armin, I haven’t found the problem yet. My guess at this point would be that the files simply to not exist at Dukascopy, but I would be happy if someone else could confirm. If this is indeed the case, and there is a chance these data gaps will be fixed at some point, I’d probably suggest (or write) a script to delete these zero sized files so that Birt’s script will attempt to download them again later.
Any thoughts, Birt?
-
-
-
#16 written by birt March 26, 2012 (1 year ago)
-
#17 written by ole March 27, 2012 (1 year ago)
Hello Birt,
thank you for you fast answer – I just couldn’t find the part (still can not???) that explicitly explains how to limit the download to one currency pair by editing the download_dukascopy_data.php.
And since I’m a beginner in php I couldn’t figure it out myself … learned it today.Thanks again
Ole
-
#19 written by ole March 28, 2012 (1 year ago)
-
#21 written by Pya March 30, 2012 (1 year ago)
Hi Birt
Using now build 418, i cannot convert my csv file using csv2fxt
I have a Label displayed on the H4 screen instead of a % and the fxt file size is 0 .
2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Last date in file: 2012.03.20 08:59:59 (file: 2012.03.20 08:59:59.648)
2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Your tick data source seems to be Dukascopy, downloaded via PHP scripts or Dukascopier.
2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Date format identified: YYYY.MM.DD hh:mm:ss. Elucidating value: 2007.04.02 00:00:00.585
2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Bid volume column: 3. Sample: 20700000.00000000
2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Ask volume column: 4. Sample: 9500000.00000000
2012.03.30 16:10:11 CSV2FXT USDJPY,H4: We have two volume columns. Arranging them in the same order as the ask/bid prices.
2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Bid price column: 1. Sample: 1.337
2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Ask price column: 2. Sample: 1.337
2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Column 4 is a numeric field.
2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Column 3 is a numeric field.
2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Column 2 is a numeric field.
2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Column 1 is a numeric field.
2012.03.30 16:10:11 CSV2FXT USDJPY,H4: The date column appears to be 0. Sample: 2007.04.01 21:00:34.312
2012.03.30 16:10:11 CSV2FXT USDJPY,H4: CSV delimiter: comma (,).
2012.03.30 16:10:11 CSV2FXT USDJPY,H4 inputs: CSV2FXT_version_0.33=”"; CsvFile=”"; CreateHst=false; ValueInfo=”All spreads & commissions are in pips regardless of the number of digits.”; Spread=2; DateInfo1=”Use YYYY.MM.DD as date format for start/end date.”; DateInfo2=”Leave the fields empty to use the awhole CSV file.”; StartDate=”"; EndDate=”"; UseRealSpread=false; SpreadPadding=0; PipsCommission=0; Leverage=500; GMTOffsetInfo1=”Specify the target GMT Offset.”; GMTOffsetInfo2=”The FXT GMT offset is the GMT offset-
#22 written by birt April 2, 2012 (1 year ago)
The bug is in MT4 build 418, see http://forum.mql4.com/47037 for more information. It affects a whole ton of EAs and also the CSV2FXT script.
Build 419 is out since yesterday and fixes the bug (it introduces another bug, though) but it’s not yet available for upgrade for all brokers. If your broker doesn’t offer 419 yet, you can either wait until they do (might take 1-2 days, worst case I’ve seen was 1 week) or you can downgrade to 416.
As a rule of the thumb, I recommend not updating MT4 unless you actually need to. The older terminal versions were much more reliable than the new ones.
-
-
#23 written by Thomas March 31, 2012 (1 year ago)
-
#25 written by Dominique April 7, 2012 (1 year ago)
-
#28 written by pipsaw April 24, 2012 (1 year ago)
Hi Birt,
I have been inactive in the world of fx for sometime and was pleasantly surprised to see that you have made so many changes
Thanks!
A quick question for you, once having generated a CSV through process php script, is there any easier way to update it constantly? I am under the impression that the current script tends to duplicate data (if you don’t know the exact time of last entry). -
#30 written by ben April 26, 2012 (1 year ago)
thanks, i forgot you wrote this script. i took the easy route when i started testing (oct-nov 2011) and used the dukascopier. i hadn’t taken the time to check back since it stopped working at the end of jan. i had a few projects in the works and back-testing on 2 years of data was sufficient even if it didnt include the most recent stuff. but now a few months have gone by and i dont want to get stale! now here i am, getting fresh data again! thanks birt…
-
#31 written by jib May 16, 2012 (1 year ago)
Hello Birt,
I’ve just tried to run the download_dukascopy_data.php from 2009 and I only get this error:
WARNING: did not download http://www.dukascopy.com/datafeed/EURUSD/2009/00/01/11
h_ticks.bin (1230807600) – error code was 403
Content was: 403 Forbidden
Request forbidden by administrative rules.Do you have any idea about dukascopy rejecting this kind of access lately?
Thanks a lot for your help in these matters! (seriously)
-
#33 written by Jack June 25, 2012 (10 months ago)
-
#34 written by birt June 25, 2012 (10 months ago)
It depends on your internet connection, but in any case it takes several hours. When a pair is done it will say “Downloading [other_pair] starting from…”. Anyway, you can simply stop it at any time; starting it once more will resume the download from that point (with a bit of overhead needed to figure out what’s on your hdd already).
-
-
#35 written by yair July 16, 2012 (10 months ago)
Nice script. Thanks.
Few comments if i may:
1. add command line parameters for a currency pair and start date (in human readable date and not epoch)
2. Line #65
$month = str_pad(gmstrftime('%m',$i) - 1, 2, '0', STR_PAD_LEFT);
has -1 to the month, so instead of July, it asked dukascopy for june… IMHO it’s a bug3. Why stop the script if the files already exist? why not just override them?
Thanks again
-
#36 written by birt July 16, 2012 (10 months ago)
1. I meant this for personal use which is why it lacks some user-friendly features. I might add it at some point in the future but it’s not high on my priority list.
2. Not a bug as you noticed in the post below. It’s just the Dukascopy storage format.
3. Hmm? Not sure I understood the question properly, but the downloading script does not stop, it simply skips them.
-
-
#38 written by tony July 28, 2012 (9 months ago)
-
#39 written by birt July 31, 2012 (9 months ago)
-
-
#40 written by Jay August 13, 2012 (9 months ago)
-
#42 written by Fabio August 20, 2012 (9 months ago)
-
#44 written by Gustavo August 24, 2012 (8 months ago)
-
#45 written by birt August 24, 2012 (8 months ago)
-
-
#47 written by Uzair September 20, 2012 (7 months ago)
Birt, is any kind of cleaning required after parsing? I used the DukasCopier tool for a while on AUDUSD without ever cleaning. When I switched to using your scripts last week, I found a couple of really weird values in the first set of files I looked at for AUDUSD:
time bid ask bidsize asksize
2007.04.03 23:30:01.867 234.33 234.38 4 4
2007.04.03 23:30:02.095 96.295 96.42 1.6 0.9-
#48 written by birt September 21, 2012 (7 months ago)
If it’s in the first data line, then yes, it needs to be cleaned up. However, if such errors occur in the middle of the file, the CSV2FXT script should be able to filter them out (it has a filter that disregards ticks if the price jumps more than X% from a tick to another, can’t remember what X was right now).
-
-
#50 written by Yaro September 22, 2012 (7 months ago)
-
#54 written by nz88 October 1, 2012 (7 months ago)
“If you only want to download some of the currency pairs available, you can edit download_dukascopy_data.php and change the array at the beginning of the file. You can switch the currency pair download order or completely remove the pairs that you don’t want.”
Can you please type an example of how to change “php download_dukascopy_data.php” to a cmd script for just downloading all of the currency data for EURUSD for example?
Many thanks.
-
#55 written by Jay October 1, 2012 (7 months ago)
-
-
#56 written by nz88 October 1, 2012 (7 months ago)
-
#58 written by neo October 29, 2012 (6 months ago)
After importing the csv into metatrader, I tried to export 1min data from the history center. However, it didnt work (not the same timespan as in the csv).
Any idea how to get it work? I started the patch script prior to trying to export – but it seems like the whole data set is only availiable in the tester, not in the rest of metatrader.-
#59 written by birt October 29, 2012 (6 months ago)
The patch script is no longer support and it has nothing to do with exporting. Furthermore, importing a tick data CSV directly in Metatrader is not supported so I’m not surprised it didn’t work.
If you only want to export the Dukascopy data as M1 from Metatrader, you can use CSV2FXT to convert it to HST files and once these are in place you can export the data.
-
-
#60 written by andrea November 18, 2012 (6 months ago)
Hi birt I’m receiving 403 forbidden from dukascopy site using latest script 0.26 to download the datafeed, I also tried to manual download from the browser for example http://www.dukascopy.com/datafeed/EURUSD/2012/00/06/19h_ticks.bin and it’s the same 403 error.
-
#61 written by birt November 19, 2012 (5 months ago)
-
-
#62 written by Fabio November 22, 2012 (5 months ago)
-
#63 written by birt November 26, 2012 (5 months ago)
It’s not trivial to change the script to make it do that.
In any case, if you want M1 data I would suggest using the Dukascopy historical data page and selecting M1 data – it has a much larger time span than tick data and it’s also much easier to obtain.
-
-
#64 written by Gogar January 11, 2013 (4 months ago)
-
#65 written by birt January 11, 2013 (4 months ago)
I’m pretty sure that all bi5 files that you download from Dukascopy are compressed (except the 0-sized files for the weekends, of course). To answer your question, the script doesn’t attempt to process files that will not decompress. Anyway, what makes you think this is the case? What error do you get?
Finally, if the script spits out an error (error, not warning) about a bi5 file and stops, it’s probably best to just delete that particular file and download it again by simply running the download script once more, it will only download that particular file.
-
#66 written by Gogar January 11, 2013 (4 months ago)
Thank you for your reply. Downloading the files again helped. When i download them now (the ones that didn’t work), they are now much smaller (for example 12KB instead of 54KB).
The error came from the lzma utility (on linux) reporting “Decode Error”.
I’ll have to verify that the problem is completely solved, but it seems to be the case. Thanks again
-
-
-
#68 written by TS April 14, 2013 (1 month ago)
Hi Birt, When I hit download php, I missed 2012/02/23/20h_ticks.bi5 and error code 416 came up. I tried the download php several times but no luck. Could you advice if this error is OK for MT4 tick data or how to fix. My process is following;
c:/php>php download_dukascopy_data.phpWARNING: did not download http://www.dukascopy.com/datafeed/EURUSD/2012/02/23/20
h_ticks.bi5 (1332532800 – 03/23/12 20:00 GMT) – error code was 416
Content was:416 – Requested Range Not Satisfiable
416 – Requested Range Not Satisfiable
This error is no this specific hour but most of 2012/02/23 hours and 2012/02/05 00 hours.
Thank you.
-
- Comment Feed for this Post
Didn't find any related posts :(
Hi Birt,
thanks so much for the update on your PHP scripts.
does it work if I repack them in D:\php\? (instead of c:\php\)
because my Cdrive has only 256GB and my Ddrive has 2TB
Cheers,
K