XBRL Database
While individual XBRL filings can be processed, validated, and inspected by Arelle one-by-one, the real benefit of data in XBRL is when it can be processed across large collections of filings. Arelle implements a number of database strategies for this and supports three approaches, (1) Abstract Model, (2) XBRL-US, and (3) DPM database architectures.
Database backups of SEC filings (2012 to current) are available for Postgres on the downloads page, this is handy for one to experiment on their own laptop or server, without going through the loading process on their own.
Database bulk loading is provided by the xbrlDB.py plugin. Databases can be loaded in batch by an RSS feed file, by individual XBRL entry points or by scripted command files. Use of the RSS feed file provides additional filing information (such as SEC filing references and SIC code of company).
Currently three general approaches are implemented, the first two evolved from that presented in the KU 2013 XBRL Conference paper:
- XBRL Abstract Model Database. The XBRL Abstract Model represents an effort to step above the issues of syntax, specific models and taxonomies, and software strategy. It tries to bridge the modeling semantics, financial ontology, data points modeling, and other XBRL communities, with a view to fitting into big-iron databases and OLAP/BI systems. The Abstract Model is based on OMG’s CWM, of which OLAP is a subset, but the dimensions model in XBRL calls for a dynamic form of OLAP that hasn’t been implemented by relational databases. Specified is a meta-model, it defines the framework for constructing an instance model that can be implemented in an actual database. In this implementation, we chose to make an instance model closely track the organization of the meta model’s data dictionary and instance features, but did not complete building out an instance model of the dimensional validation or data typing features.(The abstract model implementation has not yet provided the view schema, initially based on a prior version of the Table Linkbase. It will be implemented basing on the models of the current Table Linkbase.) The SQL schema for the abstract model is (for now) denormalized from that of the XBRL-US schema, so that qnames are clark-notation strings (instead of separate tables for namespaces and qnames), but this is an experiement in query convenience and could be changed based on feedback.The abstract model, when stored into the SQL databases, also provides words mining from 10-Q and 10-K documents (e.g., the non-tagged MD&A section, using a document taxonomy) and form type UPLOAD comment letters of filed documents.
- XBRL-US Public Database. This represents an thorough and representation of XBRL, at a syntactical level, for US filings. The XML artifacts of XBRL are captured in a highly normalized set of tables that are sufficient for retention of present SEC filings. It isn’t used to handle non-US prudential filings, corporate action filings, or massively huge filings. Its highly normalized tables provide good access to retrieve and reconstruct single filings but need query support for cross-filing semantics (where data is bound to linkbase graphs of different structures). It’s currently useed by several Arelle users as a basis for mining or ETL of data.
- DPM Database. This database implements version 7 of the DPM database evolved from the EBA project and planned for EIOPA’s tool sets. The architecture is closely aligned to use of the XBRL Table Linkbase (not suited for current SEC Filings).
Database products supported
- Relational Database Interfaces Schema Diagram. The columns of each table are described on this CSV table.
- Postgres Schema DDL
- MySQL (e.g., Cloud SQL) Schema DDL
- SQLite 3.8.3+ Schema DDL
- MS SQL Server (2012+) Schema DDL
- Oracle (11g+) Schema DDL
- Graph Database Interfaces
- Rexster/Titan (interfaces to Cassandra, originally contributed by Facebook, and others) Schema Diagram.
- RDF (interfaces to NanoSparqlServer or RDF files in Turtle and XML)Schema Diagram
- JSON (currently in files, Mongo DB interface planned) Schema Diagram
(2) The XBRL-US Public Database is currently supported by a Postgres interface. Schema Diagram. Schema DDL.
(3) The DPM Database is currently supported by the SQLite interface.
SQL Schema Initialization
- Postgres. One may need to sudo to the postgres user id to manage tables. The XBRL-US and Abstract Model schemas are not compatible and should be in separate databases if both are used.
sudo -u postgres psql -d test_db
(for XBRL-US public database)\i xbrlPublicPostgresDB.ddl
(or alternatively, for abstract model database)\i xbrlSemanticPostgresDB.ddl
\q
- MySQL. One may need to sudo to MySql root id to manage tables.
sudo mysql --user=root --password=xxxxx test_db
source xbrlSemanticMySqlDB.ddl
\q
- SQLite. There are a number of utilities for SQLite. One alternative to load the database tables is to paste the portion of the file above standard industry code insertions (the CREATE TABLEs, etc) into an SQL editor for this product, such as SQLiteStudio.
- Microsoft SQL Server.
Start button -> Programs -> Microsoft SQL Server Management Studio (2012+)
Connect dialog
Expand tree view -> Databases -> select test_db
File menu -> Open -> xbrlSemanticMSSql.sql
Toolbar->Execute button or press F5 - Oracle 11g+
Command prompt window (e.g., C:> dialog box)sqlplus
enter user and password@xbrlSemanticOracle.sql
exit
xbrlDB plug-in setup
For a single taxonomy or entry point: open the DTS or instance. Validate if you wish
Tools -> Store to XBRL DB (parameters same as below for command line bulleted list).
arelleCmdLine -f file -v --plugins xbrlDB --store-to-XBRL-DB "host,port,user,password, database,timeout,technology"
-
file is the RSS feed file (local or web), or instance or DTS entry point file. (Taxonomy package mapping will be honored, if in effect.)
-
host is for Postgres, MySql, Rexster, and RDF server the TCP/IP host (such as foo.com, or localhost), for MS SQL the service (such as herm\SQLEXPRESS), for Oracle the service (such as orcl). For RDF to be stored in a file, use rdfTurtleFile or rdfXmlFile. For JSON to be stored in a file, use jsonFile. (File names go in the database field.) For SQLite leave this field empty.
-
port is for Postgres and MySql the port number if nonstandard (such as 8084, an integer), and omitted for SQLite, MS SQL and Oracle
-
database is the SQL database (all SQL interfaces), the Rexster database, or the Nanodata subdirectory. For SQLite, rdf or json file saving, this field is a full path file name on local file system.
-
timeout (when provided) overrides the default interface timeout (which may be too short or too long).
-
technology code descriptions:
technology description pgSemantic Abstract Model SQL (Postgres) mssqlSemantic Abstract Model SQL (MSSQL) mysqlSemantic Abstract Model SQL (MySQL) sqliteSemantic Abstract Model SQL (SQLite) orclSemantic Abstract Model SQL (Oracle) rexster Abstract Model Rexter (Titan, Cassandra) rdfDB Abstract Model RDF (Turtle, NanoSparqlServer) json Abstract Model JSON (Files, MongoDB) postgres XBRL-US Public SQL (Postgres) sqliteDpmDB DPM SQL (SQLite)
arelleCmdLine -f c:\temp\test.rss -v --disclosureSystem efm-pragmatic-all-years --plugins xbrlDB --store-to-XBRL-DB "myhost.com,8084,pgUserId,pgPassword,test_db,90,postgres"
arelleCmdLine -f c:\temp\test.rss -v --disclosureSystem efm-pragmatic-all-years --plugins xbrlDB --store-to-XBRL-DB "localhost\SQLEXPRESS,,sqlLogin,sqlPassword,,90,mssqlSemantic"
Preloaded Database
-
Categories
-
Articles
- October 2022
- March 2022
- February 2022
- January 2022
- March 2020
- September 2019
- July 2019
- June 2019
- February 2019
- June 2018
- November 2016
- March 2016
- January 2016
- December 2015
- November 2015
- October 2014
- March 2014
- October 2013
- May 2013
- March 2013
- January 2013
- November 2012
- August 2012
- April 2012
- February 2012
- January 2012
- December 2011
- July 2011
- June 2011
- May 2011
- March 2011
- February 2011
- November 2010
-
Meta