Scikit Learn Machine Learning Tutorial for investing with Python p. 3

In this part of our machine learning tutorial with scikit-learn and Python, we're covering how to acquire, label and organize our data, as well as figure out which machine learning algorithm to use. Playlist link: https://www.youtube.com/watch?v=URTZ2jKCgBc&list=PLQVvvaa0QuDd0flgGphKCej-9jp-QdzZ3&index=3 Flowchart for figuring out which machine learning algorithm to use: http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html To get company data, you can use sec.gov, finance.yahoo.com, or many other locations. To alleviate the need for people to suck up tons of bandwidth, I have compiled and zipped up a sample dataset that is the straight HTML data as if you had parsed Yahoo Finance for over a decade. The location: http://pythonprogramming.net/downloads/intraQuarter.zip sample code: http://pythonprogramming.net http://seaofbtc.com http://sentdex.com http://hkinsley.com https://twitter.com/sentdex Bitcoin donations: 1GV7srgR4NJx4vrk7avCmmVQQrqmv87ty6

Comments

we are gonna use Yahoo finance a company that scares me a little less lolololololol
Thanks for these videos they are super useful and fantastic!
its not chevron, its exxon vs. apple :)
can you please explain how did you parse the past data ???
At 13.30 you are talking about P/E ratio and you said that E is equity.
Are you sure about that?
Because usually P/E means Price/eps (eps- earnings per share).
And it also seems to be more logical because equity= assets - liabilities and equity is very often close to 0 or even less then 0, and that make cap/equity ratio useless.
The SEC website mentions "Fair Access" but they don't go into too much detail. Basically, they're saying "Don"t be a dick and make your code in a way that there aren't too many requests per second or we'll shut you down for a period of time." (paraphrased :) ) With the company I work for we use SEC data a lot and even though mostly humans do the data collection sometimes we get blocked and then we have to hold for a few minutes (the same happens with Google searches too)
Hi there, thanks for the great video, it will be good if you can share how to download the html files or how to parse directly to the yahoo finance.
Maaan, Im gonna lose my mind, you have to do a tutorial on how you got the data from yahoo :)
How to parse data , I mean download data from yahoo finance.
@sentdex : How can we download all the stock datasets of NYSE from https://www.nyse.com/index. Please help
subtitles bro :P
How did you get the historical key statistics and earnings? I searched yahoo's site and I got the historical stock price but only the current key stat and earnings, nothing older than that... I was not able to find any old key stats like you did (which I understand you zipped up and offered for download) but I would like to learn how you gathered it together? or at least how to find any old key stats in yahoo's site?
Thanks!
Is it possible to use this in games?
Robots.txt usually covers the rules for spiders and automated downloads. For example: www.sec.gov/robots.txt

User-agent: * Allow: /Archives/edgar/data Disallow: /Archives/bin Disallow: /Archives/etc Disallow: /Archives/usr Disallow: /cgi-bin Disallow: /bin Disallow: /Archives/edgar/vprr/XXXX Disallow: /Archives/edgar/vprr/vprr_removal Disallow: /Archives/edgar/vprr/bin Disallow: /nb Disallow: /include Disallow: /0 Disallow: /video/samples Disallow: /video/live Disallow: /video2/samples Disallow: /video2/live Disallow: /Archives/edgar/data/1473971/000109181814000042/ex101002.gif sitemap: http://www.sec.gov/sitemap.xml
www.sec.gov/robots.txt
It will be really helpful if you show us the code for parsing or anything that might be helpful.
http://www.sec.gov/robots.txt

User-agent: *
Allow: /Archives/edgar/data
Disallow: /Archives/bin
Disallow: /Archives/etc
Disallow: /Archives/usr
Disallow: /cgi-bin
Disallow: /bin
Disallow: /Archives/edgar/vprr/XXXX
Disallow: /Archives/edgar/vprr/vprr_removal
Disallow: /Archives/edgar/vprr/bin
Disallow: /nb
Disallow: /include
Disallow: /0
Disallow: /video/samples
Disallow: /video/live
Disallow: /Archives/edgar/data/1473971/000109181814000042/ex101002.gif
sitemap: http://www.sec.gov/sitemap.xml

No rules listed on how much/often you can scrape.
Looks like you can:
1. use any user-agent,
2. scrape from /Archives/edgar/data only
3. get the sitemap from http://www.sec.gov/sitemap.xml
Thanks Man !!! It is always great watching your videos. Good Work. Can you tell us how to play with the yahoo finance URLs to get historical data of the companies. Thanks again
9:06 you ask "what are the terms for parsing". This is not a direct answer but a related info: One of the footer links says "Open Government". That page has a "dataset" section you may find interesting.
Hello +sentdex , I am coming here from your website, I just noticed the "for investing" part when I came here. I am not quite familiar with financial terms and it makes it a little hard understanding and so I was wondering what should be my path if I want to go through the tuts without dealing with all the financial part? Thanks for all your good work, you really helped me.

Additional Information:

Visibility: 36671

Duration: 24m 24s

Rating: 221