A small library to access files from SEC's edgar
A small library to access files from SEC's edgar.
pip install edgar
To get a company's latest 5 10-Ks, run
from edgar import Company company = Company("Oracle Corp", "0001341439") tree = company.get_all_filings(filing_type = "10-K") docs = Company.get_documents(tree, no_of_documents=5)
or ```python from edgar import Company, TXTML
company = Company("INTERNATIONAL BUSINESS MACHINES CORP", "0000051143") doc = company.get10K() text = TXTML.parsefull_10K(doc) ```
To get all companies and find a specific one, run
from edgar import Edgar edgar = Edgar() possible_companies = edgar.find_company_name("Cisco System")
To get XBRL data, run ```python from edgar import Company, XBRL, XBRLElement
company = Company("Oracle Corp", "0001341439") results = company.getdatafilesfrom10K("EX-101.INS", isxml=True) xbrl = XBRL(results[0]) XBRLElement(xbrl.relevantchildrenparsed[15]).to_dict() // returns a dictionary of name, value, and schemaRef ```
Company(name, cik, timeout=10)
get_filings_url(self, filing_type="", prior_to="", ownership="include", no_of_entries=100) -> str
Returns a url to fetch filings data * filingtype: The type of document you want. i.e. 10-K, S-8, 8-K. If not specified, it'll return all documents * priorto: Time prior which documents are to be retrieved. If not specified, it'll return all documents * ownership: defaults to include. Options are include, exclude, only. * noofentries: defaults to 100. Returns the number of entries to be returned. Maximum is 100.
get_all_filings(self, filing_type="", prior_to="", ownership="include", no_of_entries=100) -> lxml.html.HtmlElement
Returns the HTML in the form of lxml.html * filingtype: The type of document you want. i.e. 10-K, S-8, 8-K. If not specified, it'll return all documents * priorto: Time prior which documents are to be retrieved. If not specified, it'll return all documents * ownership: defaults to include. Options are include, exclude, only. * noofentries: defaults to 100. Returns the number of entries to be returned. Maximum is 100.
get_10Ks(self, no_of_documents=1, as_documents=False) -> List[lxml.html.HtmlElement]
Returns the HTML in the form of lxml.html of concatenation of all the documents in the 10-K * noofdocuments (default: 1): numer of documents to be retrieved * When
as_documentsis set to
True, it returns
-> List[edgar.document.Documents]a list of Documents
get_document_type_from_10K(self, document_type, no_of_documents=1) -> List[lxml.html.HtmlElement]
Returns the HTML in the form of lxml.html of the document within 10-K * documenttype: Tye type of document you want, i.e. 10-K, EX-3.2 * noof_documents (default: 1): numer of documents to be retrieved
get_data_files_from_10K(self, document_type, no_of_documents=1, isxml=False) -> List[lxml.html.HtmlElement]
Returns the HTML in the form of lxml.html of the data file within 10-K * documenttype: Tye type of document you want, i.e. EX-101.INS * noof_documents (default: 1): numer of documents to be retrieved * isxml (default: False): by default, things aren't case sensitive and is parsed with
htmlin
lxml. If this is True, then it is parsed withetree` which is case sensitive
get_documents(self, tree: lxml.html.Htmlelement, no_of_documents=1, debug=False, as_documents=False) -> List[lxml.html.HtmlElement]Returns a list of strings, each string contains the body of the specified document from input
as_documentsis set to
True, it returns
-> List[edgar.document.Documents]a list of Documents
Gets all companies from EDGAR
get_cik_by_company_name(company_name: str) -> str: Returns the CIK if given the exact name or the company
get_company_name_by_cik(cik: str) -> str: Returns the company name if given the CIK (with the
000s)
find_company_name(words: str) -> List[str]: Returns a list of company names by exact word matching
match_company_by_company_name(self, name, top=5) -> List[Dict[str, Any]]: Returns a list of dictionarys, with company names, CIK, and their fuzzy match score *
top (default: 5)returns the top number of fuzzy matches. If set to
None, it'll return the whole list (which is a lot)
Parses data from XBRL
relevant_children* get children that are not
context
relevant_children_parsed* get children that are not
context,
unit,
schemaRef* cleans tags
Filing and Documents Details for the SEC EDGAR Form (such as 10-K)
Documents(url, timeout=10)
url: str: URL of the document
content: dict: Dictionary of meta data of the document
content['Filing Date']: str: Document filing date
content['Accepted']: str: Document accepted datetime
content['Period of Report']: str: The date period that the document is for
element: lxml.html.HtmlElement: The HTML element for the Document (from the url) so it can be further parsed