Python Charting Stocks/Forex for Technical Analysis Part 5 - Finishing price puller

Charting Stocks in Python playlist: http://www.youtube.com/watch?v=u6Xd3kRHhJI&feature=share&list=PLQVvvaa0QuDcR-u9O8LyLR7URiKuW-XZq This is the fifth video in the series for stock price analysis, showing how to automatically pull stock price data. The purpose of the videos in this series is to teach you how to program your own charting and analysis of stocks or Forex. This is beneficial for you if you plan to do any sort of algorithmic, high-frequency, or any sort of automated trading. Sentdex.com Facebook.com/sentdex Twitter.com/sentdex

Comments

I just fell into you tutorials when I was searching for plotting tutorial. I have been able to follow it so far. However, is there any way I can get hold of the source file?
Appreciate your feedback.

Krishna
Hey, if i want to continue with your playlist, do i need to run the shell?If so, then how do I run another program for python script.
You have been incredibly helpful!!
Just a tip, i messed up and got an error saying "global name 'splitExisting' is not defined"

It was a typo, I had splitExisiting, instead of splitExisting. Hope that helps anyone.
Hi, I have two quick questions :
1. Why do we initially setup the time.sleep() to 300sec initially ? And why then 10sec ?
2. When I run the code, it prints "Pulled" then waits for 30sec and only then prints the name of the stock, "Sleeping..." and the time stamp : so the waiting of 300sec happens before the initial print command is even completed and before the next two. Any idea why this happens ?
I'd suggest that instead of trying to design and debug code in your videos, that you get the code working perfectly offline, then spend the time explaining how it works.
Hi Harrison,

Do have any wisdom on indentation in Python?

I was following your tutorial and found that I was getting the warning below:-

Currently obtaining stock data for: MPL.AX
2015-08-03 22:58:00
Mainloop local variable 'splitSource' referenced before assignment
Currently obtaining stock data for: AAPL
2015-08-03 22:58:00
Mainloop local variable 'splitSource' referenced before assignment
>>>

I figured out that it was caused by too many indentation on the code block:-

saveFile = open(saveFileLine,'a')

sourceCode=urllib2.urlopen(urlToVisit).read()
splitSource=sourceCode.split('\n')

After removing the extra indents to the code block this the warning went away. I think what happened is this code block became part of the try/except block. And as it came after the except, the code block did not get executed as no exception were being thrown.

I like the idea that Python forces you indent properly which makes code more readable. But it can be tricky though.

Martin
hey ...
u r get monay from forex
I thought I should share some improvements I've made to this amazing code!

if 'values' and 'labels' and 'alternate_ranges' not in eachLine:(sometimes I get lines saving with these other strings too(EDIT: the booleans maybe won't work this way, better to have nestled 'if' statements )

I haven't tested this yet, but to print a list of stocks that didn't save additional data on a pull:
failedStocks=[](put outside and before main def)
linesWriten=0(inside def at top)
linesWriten=linesWriten+1(after writing)
if linesWriten==0:(after linesWriten counter)
failedStocks.append(stock)
print failedStocks(after the function call)

If your not pulling american stocks you have to use a suffix for the market you want stocks from, here is a modification I made that checks yahoo to see if the ticker symbol I want is .TO or .V. These are Canadian markets, alter the code to suit your region. You will have to do further string operations on the printed out data to separate what you want and what you don't want. If you didn't get a pull you have to go to yahoo and manually find the right ticker symbol by searching for the company. I copy/paste the printout to save it, I guess you could save to file too.

def pullData(stock):
try:
timeFrame= '1m'
toSite = 'http://chartapi.finance.yahoo.com/instrument/1.0/'+stock+'.TO'+'/chartdata;type=quote;range=%s/csv'%timeFrame vSite = 'http://chartapi.finance.yahoo.com/instrument/1.0/'+stock+'.V'+'/chartdata;type=quote;range=%s/csv'%timeFrame

countLine=0
toCode = urllib2.urlopen(toSite).read()
vCode = urllib2.urlopen(vSite).read()
toSource = toCode.split('\n')
vSource = vCode.split('\n')

for eachLine in toSource:
splitLine = eachLine.split(',')
if len(splitLine)==6:
if 'values' not in eachLine:
countLine=countLine+1
if countLine==0:
for eachLine in vSource:
splitLine = eachLine.split(',')
if len(splitLine)==6:
if 'values' not in eachLine:
countLine=countLine+1
if countLine==0:
print "'"+str(stock)+".shit',"
else:
print"'"+str(stock)+".V',"
else:
print"'"+str(stock)+".TO',"

#print 'Pulled',stock
#print 'sleeping...'
time.sleep(10)

except Exception,e:
print 'main loop',str(e)

for eachStock in stocksToPull:
pullData(eachStock)

Thank you Harrison for your amazing tutorials!
Hi Harrison, any idea why my python 3 code doesn't work? I printed out every section of the code, and all of it seems to work except when I get to indexing the int(splitLine[0]) . If i try to print splitLine[0], doesn't seem to be indexing the list.

http://codepad.org/TYeRyE5w
_______________________

def pullData(stock):
try:
print('currently pulling', stock)
print(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
urlToVisit = 'http://chartapi.finance.yahoo.com/instrument/1.0/' + stock + '/chartdata;type=quote;range=10d/csv'
saveFileLine = stock+'.txt'

try:
readExistingData = open(saveFileLine, 'r').read()
splitExisting = readExistingData.split('\n')
mostRecentLine = splitExisting[-2]
lastUnix = mostRecentLine.split(',')[0]
except:
lastUnix = 0

saveFile = open(saveFileLine, 'a')
sourceCode = urllib.request.urlopen(urlToVisit).read()
splitSource = str(sourceCode, encoding='utf8').split('\n')

for eachLine in splitSource:
if 'values' not in eachLine:
splitLine = eachLine.split(',')
if len(splitLine) == 6:
if int(splitLine[0]) > int(lastUnix):
lineToWrite = eachLine +'\n'
saveFile.write(lineToWrite)
saveFile.close()

print('pulled', stock)
print('sleeping..')
print(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
time.sleep(300)

except Exception as e:
print('main loop', str(e))
In splitSource, the line containing the term 'values' is actually 'values:Timestamp,close,high,low,open,volume'.
So, when you split this by ',' , we get 'values:Timestamp' and not 'values'.
I am not sure how the if statement to eliminate 'values' is working.
Just a thought to keep the file size from getting to big whenever it adds a new line of data it should maybe delete the oldest line of data at the top.
Just want to let you know, thanks for the quick responses:
I am a total noob. Im not sure if I did anything wrong, but my timestamp in apple comes out like this.

1403271299,91.6699,91.9200,91.6100,91.7650,13683200
1403271599,91.7200,91.7700,91.4800,91.6600,829900
1403271840,91.7500,91.8000,91.5600,91.7100,649300
1403272199,91.9650,91.9800,91.7200,91.7200,726300

However, I see in your video i see the beginning timestamp as date, not unix.

The code I wrote is this:

import urllib2
import time
import datetime

stocksToPull = 'AAPL','GOOG','MSFT','CMG','AMZN','EBAY','TSLA'

def pullData(stock):
    try:
       print 'currently pulling',stock
       print str(datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%S'))
       urlToVisit = 'http://chartapi.finance.yahoo.com/instrument/1.0/'+stock+'/chartdata;type=quote;range=10d/csv'
       saveFileLine = stock+'.txt'

       try:
           readExistingData = open(saveFileLine,'r').read()
           splitExisting = readExistingData.split('\n')
           mostRecentLine = splitExisting[-2]
           lastUnix = mostRecentLine.split(',')[0]
       except Exception,e:
           print str(e)
           time.sleep(1)
           lastUnix = 0

       saveFile = open(saveFileLine,'a')
       sourceCode = urllib2.urlopen(urlToVisit).read()
       splitSource = sourceCode.split('\n')

       for eachLine in splitSource:
           if 'values' not in eachLine:
               splitLine = eachLine.split(',')
               if len(splitLine)==6:
                   if int(splitLine[0]) > int(lastUnix):


                       lineToWrite = eachLine+'\n'
                       saveFile.write(lineToWrite)

       saveFile.close()

       print 'Pullled',stock
       print 'sleeping.....'
       print str(datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%S'))
       time.sleep(10)




    except Exception,e:
        print ' main loop',str(e)

while True:
    for eachStock in stocksToPull:
        pullData(eachStock)
    time.sleep(18000)
hmmm... now I'm getting this:
currently pulling AAPL
2014-07-05 15:24
main loop invalid literal for int() with base 10: '1404406800,94'
currently pulling GOOG
2014-07-05 15:25
main loop invalid literal for int() with base 10: '1404406800,584'
currently pulling MSFT
2014-07-05 15:25
main loop invalid literal for int() with base 10: '1404406800,41'
currently pulling CMG
2014-07-05 15:26
main loop invalid literal for int() with base 10: '1404406806,603'
currently pulling AMZN
2014-07-05 15:26
main loop invalid literal for int() with base 10: '1404406800,337'
currently pulling EBAY
2014-07-05 15:27
main loop invalid literal for int() with base 10: '1404406800,50'
currently pulling TSLA
2014-07-05 15:28
main loop invalid literal for int() with base 10: '1404406800,229'

from this:

import urllib2
import time
import datetime

stocksToPull = 'AAPL','GOOG','MSFT','CMG','AMZN','EBAY','TSLA'

def pullData(stock):
try:
print 'currently pulling',stock
print str(datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%S'))
urlToVisit = 'http://chartapi.finance.yahoo.com/instrument/1.0/'+stock+'/chartdata;type=quote;range=10d/csv'
saveFileLine = stock+'.txt'

try:
readExistingData = open(saveFileLine,'r').read()
splitExisting = readExistingData.split('\n')
mostRecentLine = splitExisting[-2]
lastUnix = mostRecentLine.split('.')[0]
except Exception,e:
print str (e)
time.sleep(1)
lastUnix = 0

saveFile = open(saveFileLine,'a')
sourceCode = urllib2.urlopen(urlToVisit).read()
splitSource = sourceCode.split('\n')

for eachLine in splitSource:
if 'values' not in eachLine:
splitLine = eachLine.split(',')
if len(splitLine)==6:
if int(splitLine[0]) > int(lastUnix):


lineToWrite = eachLine+'\n'
saveFile.write(lineToWrite)

saveFile.close()

print 'Pullled',stock
print 'sleeping.....'
print str(datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%S'))
time.sleep(10)

except Exception,e:
print ' main loop',str(e)


for eachStock in stocksToPull:
pullData(eachStock)
For some reason, my GOOG.txt is much smaller (3kb vs. 14kb) - only 50 obs or so when I do range of 1y...

Also, any of your (finished) code available on GitHub, like

https://github.com/Canuckish/codepile/blob/master/stock_an_v4 ?
It seems like yahoo does now present the last datapoint in another timeframe then the previous ones, i.e. if yahoo shows us 5minute data, the last datapoint will not necessarily be a 5minute one but the value the stock had at the time of pulling the data (<=5minutes). So I guess to not corrupt the timeframe consistency, the last data point should not be added to the file?
I've edited the code so it does now ignore the last datapoint:
for eachLine in splitSource[:-2]:
why do i have syntax error with the except: lastUnix =0 ?

great videos. thanks

Additional Information:

Visibility: 11888

Duration: 21m 3s

Rating: 53