Downloading and parsing Dukascopy tick data with Birt’s PHP scripts

The Dukascopy data is available on the web in its raw form as files that span only 1 hour, so it becomes apparent that some tools are necessary to download and parse it. Before it was possible to get the data via any of the other methods, I made a series of scripts that I still use nowadays for downloading the free tick data available from Dukascopy. I’m a fan of the PHP simplicity, so I chose that to write the scripts. They’re not commercial quality code, but they work.

You can get the PHP script archive from the tick data downloads page.

You will find 4 scripts inside:

  • A script for downloading the Dukascopy data, suggestively named “download_dukascopy_data.php”. As a courtesy to Dukascopy who is graciously providing free data, the script does not attempt to download files that are already on your harddisk. However, it still requests missing files, so to avoid stressing their server please set the dates in the $currencies array at the beginning of the script to the date of your last download; they’re using the standard unix timestamps (epoch date, which is in essence the number of seconds since 01.01.1970). If you want to easily convert from a regular date to such a unix timestamp, you can use Epoch Converter, a very easy to use online tool.
  • A script for processing the downloaded data, which assumes that it is located in the same directory as the previous script and that the data was downloaded there (process_dukascopy_data.php); this one needs some parameters, run it without any for a description or check out the next script.
  • A small shell script that will process all the downloaded data available in .bat form for windows and .sh form for linux.

Windows download & convert to CSV how-to

Start by visiting the windows PHP download section and fetching the latest binary version as a zip file.
Once you’ve done that, unpack it to c:\php\ and also unpack the scripts from the script archive you downloaded in the same directory.
Rename c:\php\php.ini-development to c:\php\php.ini. If your folder does not contain a file named php.ini-development, use php.ini-dist or any other php.ini-something file you can find.
Edit c:\php\php.ini, search for
;extension=php_curl.dll
and remove the semicolon in front of the line and add an “ext/” in front of “php_curl.dll” so that it looks like this:
extension=ext/php_curl.dll
Save the file and exit.
If you run into a zip error and your PHP installation has an ext/php_zip.dll, also apply the method above for extension=ext/php_zip.dll.
Head to the 7-Zip download page and get the command line version. Unpack it and put 7za.exe in the same directory (c:\php\).
Click start->run and type
cmd
then click ok (or alternatively type cmd and hit enter in the windows 7/vista “search programs and files” box in the start menu).
Type
cd \php
in the command window.
Type
php download_dukascopy_data.php
Have a coffee. Have another coffee. Go sleep. Go to work. Go to the gym. Go to a club. Wait some more. I’m not kidding, it takes ages. You can check the progress by watching the currency pair directories get filled. If you get any strange errors, run the process again when it’s finished – it will only download any files that were missed in the first step due to network errors.

If you only want to download some of the currency pairs available, you can edit download_dukascopy_data.php and change the array at the beginning of the file. You can switch the currency pair download order or completely remove the pairs that you don’t want. The number next to each pair is the unix timestamp at which to start downloading; if you wish to start at a later point in time (the default is the earliest date available) you can use epochconverter.com to get the timestamp for your chosen date.

When the download is finished, assuming you wanted to get the EURUSD data up to 01.01.2012, you’d type
php process_dukascopy_data.php EURUSD 200702 201201 EURUSD.csv
and the output will be placed in EURUSD.csv.
Alternatively, you can type
process.bat
which will batch process all the currency data. It’s mostly safe to ignore the error spam at this step. Note: if you use process.bat or process.sh, you might have to update the ending dates in them to get the full data range!

This should be it, if everything went fine you should have your CSV files in the same c:\php folder and you should be ready to proceed with preparing your tick data for Metatrader 4.

Warning: make sure you have enough space on your harddisk. As of 2012, the downloaded tick files have over 20 GB and if you add up the size of the resulted CSV files you will be well past the 100 GB mark.

  • #1 written by K March 6, 2012 (2 months ago)

    Hi Birt,

    thanks so much for the update on your PHP scripts.
    does it work if I repack them in D:\php\? (instead of c:\php\)
    because my Cdrive has only 256GB and my Ddrive has 2TB

    Cheers,
    K

    • #2 written by birt March 6, 2012 (2 months ago)

      Certainly. In fact, you can even put it on any drive in any path, such as q:\my\special\path\to\php – all you need to do is make sure you use that path everywhere instead of c:\php.

  • #3 written by K March 6, 2012 (2 months ago)

    thanks for the quick reply, Birt

    Keep up with youre excellent work here
    great thanks again

  • #4 written by Pya March 20, 2012 (1 month ago)

    Hi Birt

    No ticks data from Dukascopy after 27/01/12 ?

    Thanks

    • #5 written by birt March 20, 2012 (1 month ago)

      There are plenty. You just need to use the latest version of the PHP scripts (which is 0.24 now).

  • #6 written by Pya March 20, 2012 (1 month ago)

    right
    Thanks a lot

  • #7 written by Armin March 22, 2012 (1 month ago)

    Hey! I tried to get the processingscript up and running on my Mac. It always complained about not having installed LZMA, although it is installed and running as it should. The problem is, that the script checks for the OS using stripos(PHP_OS,’win’); On a Mac, PHP_OS returns ‘Darwin’ and therefor the check for Windows doesn’t return false. Maybe this helps someone who runs into the same problem. Anyways thanks a lot for your great work! Keep it up! Best regards

    • #8 written by birt March 22, 2012 (1 month ago)

      LOL, that’s quite ironic given the fact that I use an iMac. I never used it for tick data, though; I use my Linux box for that so I didn’t know PHP_OS says Darwin. I fixed it to count Darwin as non-windows. The change is in v0.25 which is now available for download but you probably shouldn’t bother with it if you changed your copy already.

    • #9 written by Ruben April 19, 2012 (4 weeks ago)

      Hi Armin,

      Would you mind telling me where you got an OSX port for lzma? Can’t seem to find one. Thanks!

      Ruben.

      • #10 written by birt April 19, 2012 (4 weeks ago)

        First get and install The Mac Ports and once you did that open a terminal and type:

        sudo port install lzmautils

        When you’re done with that, you will have an /opt/local/bin/lzma that will be “detected” by the php script.

        • #11 written by Ruben April 19, 2012 (3 weeks ago)

          Awesome, thanks Birt! Works like a charm now. P.S. for other Mac users using PeerGuardian; you have to disable PeerGuardian first for rsync to work in Mac Ports.

  • #12 written by Tyler March 26, 2012 (1 month ago)

    I noticed there are some “zero sized” .bi5 files in my tick downloads using the PHP script (which works great!). I realize this happens each weekend, but there are other periods of one or more hours where there are zero sized .bi5 files. Can anyone confirm whether this is due to a download issue or if these are real (and permanent?) gaps in Dukascopy’s history? Does anyone know if Dukascopy plans on correcting these, if they are real? I have tried deleting them and re-downloading, and they remain zero sized, which makes me think they are truly not in Dukascopy’s history. If there’s a chance these will be filled in in the future, might it be worth making a PHP script to delete all zero-sized .bi5 files that do not occur on weekends? If not, I can create my own in Python if anyone thinks there’s a chance these gaps will be corrected.

    If anyone could comment on this or share their experiences, I’d appreciate it.

    • #13 written by Armin March 27, 2012 (1 month ago)

      I noticed the same thing while downloading…they are definitely not only on weekends…Did you find out what the problem was?

      • #14 written by Tyler March 29, 2012 (1 month ago)

        Sorry Armin, I haven’t found the problem yet. My guess at this point would be that the files simply to not exist at Dukascopy, but I would be happy if someone else could confirm. If this is indeed the case, and there is a chance these data gaps will be fixed at some point, I’d probably suggest (or write) a script to delete these zero sized files so that Birt’s script will attempt to download them again later.

        Any thoughts, Birt?

  • #15 written by ole March 26, 2012 (1 month ago)

    Hello Birt,

    I would like to ask if there is a way to download only one currency pair with your .php script instead of all of them?

    Thanks

    Ole

  • #16 written by birt March 26, 2012 (1 month ago)

    Sure – as specified above, you have to use a text editor to open download_dukascopy_data.php and simply delete the lines for the currency pairs you don’t need and leave just those you need. You can also edit the starting date in there, use epochconverter.com to get the timestamp.

  • #17 written by ole March 27, 2012 (1 month ago)

    Hello Birt,

    thank you for you fast answer – I just couldn’t find the part (still can not???) that explicitly explains how to limit the download to one currency pair by editing the download_dukascopy_data.php.
    And since I’m a beginner in php I couldn’t figure it out myself … learned it today.

    Thanks again

    Ole

    • #18 written by birt March 27, 2012 (1 month ago)

      It’s somewhere in a paragraph, quote: “You can change the array at the beginning of the file to switch the currency pair download order or to remove some pairs that you don’t want.”

      I guess it’s not very clear, I’ll rephrase it.

  • #19 written by ole March 28, 2012 (1 month ago)

    Sorry Birt,

    i must have been blind yesterday. Just fund the part it as I wanted to read about the next step on how to convert the .csv.

    I guess now I belong to those users that I sometimes wonder about how determined they ignore things that are clearly written.

    Thank you for your patience,

    Ole

    • #20 written by birt March 28, 2012 (1 month ago)

      No problem at all. It was indeed kind of hidden in a middle of a paragraph but it should be much more clear now that I rephrased it.

  • #21 written by Pya March 30, 2012 (1 month ago)

    Hi Birt

    Using now build 418, i cannot convert my csv file using csv2fxt

    I have a Label displayed on the H4 screen instead of a % and the fxt file size is 0 .

    2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Last date in file: 2012.03.20 08:59:59 (file: 2012.03.20 08:59:59.648)
    2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Your tick data source seems to be Dukascopy, downloaded via PHP scripts or Dukascopier.
    2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Date format identified: YYYY.MM.DD hh:mm:ss. Elucidating value: 2007.04.02 00:00:00.585
    2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Bid volume column: 3. Sample: 20700000.00000000
    2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Ask volume column: 4. Sample: 9500000.00000000
    2012.03.30 16:10:11 CSV2FXT USDJPY,H4: We have two volume columns. Arranging them in the same order as the ask/bid prices.
    2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Bid price column: 1. Sample: 1.337
    2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Ask price column: 2. Sample: 1.337
    2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Column 4 is a numeric field.
    2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Column 3 is a numeric field.
    2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Column 2 is a numeric field.
    2012.03.30 16:10:11 CSV2FXT USDJPY,H4: Column 1 is a numeric field.
    2012.03.30 16:10:11 CSV2FXT USDJPY,H4: The date column appears to be 0. Sample: 2007.04.01 21:00:34.312
    2012.03.30 16:10:11 CSV2FXT USDJPY,H4: CSV delimiter: comma (,).
    2012.03.30 16:10:11 CSV2FXT USDJPY,H4 inputs: CSV2FXT_version_0.33=”"; CsvFile=”"; CreateHst=false; ValueInfo=”All spreads & commissions are in pips regardless of the number of digits.”; Spread=2; DateInfo1=”Use YYYY.MM.DD as date format for start/end date.”; DateInfo2=”Leave the fields empty to use the awhole CSV file.”; StartDate=”"; EndDate=”"; UseRealSpread=false; SpreadPadding=0; PipsCommission=0; Leverage=500; GMTOffsetInfo1=”Specify the target GMT Offset.”; GMTOffsetInfo2=”The FXT GMT offset is the GMT offset

    • #22 written by birt April 2, 2012 (1 month ago)

      The bug is in MT4 build 418, see http://forum.mql4.com/47037 for more information. It affects a whole ton of EAs and also the CSV2FXT script.

      Build 419 is out since yesterday and fixes the bug (it introduces another bug, though) but it’s not yet available for upgrade for all brokers. If your broker doesn’t offer 419 yet, you can either wait until they do (might take 1-2 days, worst case I’ve seen was 1 week) or you can downgrade to 416.

      As a rule of the thumb, I recommend not updating MT4 unless you actually need to. The older terminal versions were much more reliable than the new ones.

  • #23 written by Thomas March 31, 2012 (1 month ago)

    Hello,
    anybody help to convert m1 candle csv format date, time, OHLC, volume to higher timeframe ? any utility ?

    Thanx.

    • #24 written by birt April 2, 2012 (1 month ago)

      Have you tried to import it in MT4 and use the period converter script then export the timeframe you need?

  • #25 written by Dominique April 7, 2012 (1 month ago)

    The newest 7-zip package did no longer contain the 7za.exe file. You have to copy the 7z.exe and rename it to 7za.exe or change the code within the process_dukascopy_data.php

    • #26 written by birt April 7, 2012 (1 month ago)

      My guess is that you did not download the command line version (as the guide specifies) but something else. I’ve just fetched the latest command line version and 7za.exe is inside.

  • #27 written by Dominique April 7, 2012 (1 month ago)

    Yep, sorry, my fault.
    I have downloaded the first file in the list ” 7-Zip for 32-bit Windows”

  • #28 written by pipsaw April 24, 2012 (3 weeks ago)

    Hi Birt,

    I have been inactive in the world of fx for sometime and was pleasantly surprised to see that you have made so many changes :) Thanks!
    A quick question for you, once having generated a CSV through process php script, is there any easier way to update it constantly? I am under the impression that the current script tends to duplicate data (if you don’t know the exact time of last entry).

    • #29 written by pipsaw April 24, 2012 (3 weeks ago)

      Sorry, don’t bother please. Found it in FAQs, thanks!

  • #30 written by ben April 26, 2012 (2 weeks ago)

    thanks, i forgot you wrote this script. i took the easy route when i started testing (oct-nov 2011) and used the dukascopier. i hadn’t taken the time to check back since it stopped working at the end of jan. i had a few projects in the works and back-testing on 2 years of data was sufficient even if it didnt include the most recent stuff. but now a few months have gone by and i dont want to get stale! now here i am, getting fresh data again! thanks birt…

  • #31 written by jib May 16, 2012 (1 day ago)

    Hello Birt,

    I’ve just tried to run the download_dukascopy_data.php from 2009 and I only get this error:

    WARNING: did not download http://www.dukascopy.com/datafeed/EURUSD/2009/00/01/11
    h_ticks.bin (1230807600) – error code was 403
    Content was: 403 Forbidden
    Request forbidden by administrative rules.

    Do you have any idea about dukascopy rejecting this kind of access lately?

    Thanks a lot for your help in these matters! (seriously)

    • #32 written by birt May 16, 2012 (22 hours ago)

      It would appear that the bin files are no longer available. You need to use the latest version of the script.

  • You may use these HTML tags: <a> <abbr> <acronym> <b> <blockquote> <cite> <code> <del> <em> <i> <q> <strike> <strong>

     

  • Comment Feed for this Post
Go to Top