XBRL Database

While individual XBRL filings can be processed, validated, and inspected by Arelle one-by-one, the real benefit of data in XBRL is when it can be processed across large collections of filings.  Arelle implements a number of database strategies for this and supports three approaches, (1) Abstract Model, (2) XBRL-US, and (3) DPM database architectures.

Database backups of SEC filings (2012 to current) are available for Postgres on the downloads page, this is handy for one to experiment on their own laptop or server, without going through the loading process on their own.

Database bulk loading is provided by the xbrlDB.py plugin.  Databases can be loaded in batch by an RSS feed file, by individual XBRL entry points or by scripted command files.  Use of the RSS feed file provides additional filing information (such as SEC filing references and SIC code of company).

Currently three general approaches are implemented, the first two evolved from that presented in the KU 2013 XBRL Conference paper:

  • XBRL Abstract Model Database. The XBRL Abstract Model represents an effort to step above the issues of syntax, specific models and taxonomies, and software strategy. It tries to bridge the modeling semantics, financial ontology, data points modeling, and other XBRL communities, with a view to fitting into big-iron databases and OLAP/BI systems. The Abstract Model is based on OMG’s CWM, of which OLAP is a subset, but the dimensions model in XBRL calls for a dynamic form of OLAP that hasn’t been implemented by relational databases.  Specified is a meta-model, it defines the framework for constructing an instance model that can be implemented in an actual database. In this implementation, we chose to make an instance model closely track the organization of the meta model’s data dictionary and instance features, but did not complete building out an instance model of the dimensional validation or data typing features.(The abstract model implementation has not yet provided the view schema, initially based on a prior version of the Table Linkbase.  It will be implemented basing on the models of the current Table Linkbase.)  The SQL schema for the abstract model is (for now) denormalized from that of the XBRL-US schema, so that qnames are clark-notation strings (instead of separate tables for namespaces and qnames), but this is an experiement in query convenience and could be changed based on feedback.The abstract model, when stored into the SQL databases, also provides words mining from 10-Q and 10-K documents (e.g., the non-tagged MD&A section, using a document taxonomy) and form type UPLOAD comment letters of filed documents.
  • XBRL-US Public Database. This represents an thorough and representation of XBRL, at a syntactical level, for US filings. The XML artifacts of XBRL are captured in a highly normalized set of tables that are sufficient for retention of present SEC filings. It isn’t used to handle non-US prudential filings, corporate action filings, or massively huge filings. Its highly normalized tables provide good access to retrieve and reconstruct single filings but need query support for cross-filing semantics (where data is bound to linkbase graphs of different structures). It’s currently useed by several Arelle users as a basis for mining or ETL of data.
  • DPM Database. This database implements version 7 of the DPM database evolved from the EBA project and planned for EIOPA’s tool sets.  The architecture is closely aligned to use of the XBRL Table Linkbase (not suited for current SEC Filings).

Database products supported

(1) The Abstract Model, described in the above-referenced KU paper, is currently supported by these database interfaces:

(2) The XBRL-US Public Database is currently supported by a Postgres interface. Schema DiagramSchema DDL.

(3) The DPM Database is currently supported by the SQLite interface.

SQL Schema Initialization

When xbrlDB SQL interfaces encounter a database with none of the DDL tables, they attempt to upload the corresponding DDL, but this requires proper permissions and can be problematical.  We’ve found it easier to initialize the SQL databases manually through their usual command interfaces.  The DDL files are linked above, and included in the installation under plugin/xbrlDB.  Examples assume the database has been created and named “test_db”.
  • Postgres. One may need to sudo to the postgres user id to manage tables.   The XBRL-US and Abstract Model schemas are not compatible and should be in separate databases if both are used.
    sudo -u postgres psql -d test_db
    (for XBRL-US public database)
    \i xbrlPublicPostgresDB.ddl
    (or alternatively, for abstract model database)
    \i xbrlSemanticPostgresDB.ddl
    \q
  • MySQL.  One may need to sudo to MySql root id to manage tables.
    sudo mysql --user=root --password=xxxxx test_db
    source xbrlSemanticMySqlDB.ddl
    \q
  • SQLite.  There are a number of utilities for SQLite. One alternative to load the database tables is to paste the portion of the file above standard industry code insertions (the CREATE TABLEs, etc) into an SQL editor for this product, such as SQLiteStudio.
  • Microsoft SQL Server.
    Start button -> Programs -> Microsoft SQL Server Management Studio (2012+)
    Connect dialog
    Expand tree view -> Databases -> select test_db
    File menu -> Open -> xbrlSemanticMSSql.sql
    Toolbar->Execute button or press F5
  • Oracle 11g+
    Command prompt window (e.g., C:> dialog box)

    sqlplus
    enter user and password
    @xbrlSemanticOracle.sql
    exit

xbrlDB plug-in setup

Using GUI:
Help menu ->   Manage plug-ins -> Dialog.  Press “Locally” on left under “find plug-in modules” and select directory plugin, directory xbrlDB and click __init__.py.
For a single taxonomy or entry point: open the DTS or instance.  Validate if you wish
Tools -> Store to XBRL DB (parameters same as below for command line bulleted list).
Using command line interface:
For a single taxonomy, entry point, or RSS feed file:
arelleCmdLine -f file -v
  --plugins xbrlDB
  --store-to-XBRL-DB
  "host,port,user,password,
   database,timeout,technology"
where:
  • file is the RSS feed file (local or web), or instance or DTS entry point file.  (Taxonomy package mapping will be honored, if in effect.)
  • host is for Postgres, MySql, Rexster, and RDF server the TCP/IP host (such as foo.com, or localhost), for MS SQL the service (such as herm\SQLEXPRESS), for Oracle the service (such as orcl).  For RDF to be stored in a file, use rdfTurtleFile or rdfXmlFile. For JSON to be stored in a file, use jsonFile. (File names go in the database field.)  For SQLite leave this field empty.
  • port is for Postgres and MySql the port number if nonstandard (such as 8084, an integer), and omitted for SQLite, MS SQL and Oracle
  • database is the SQL database (all SQL interfaces), the Rexster database, or the Nanodata subdirectory.  For SQLite, rdf or json file saving, this field is a full path file name on local file system.
  • timeout (when provided) overrides the default interface timeout (which may be too short or too long).
  • technology code descriptions:
    technology description
    pgSemantic Abstract Model SQL (Postgres)
    mssqlSemantic Abstract Model SQL (MSSQL)
    mysqlSemantic Abstract Model SQL (MySQL)
    sqliteSemantic Abstract Model SQL (SQLite)
    orclSemantic Abstract Model SQL (Oracle)
    rexster Abstract Model Rexter (Titan, Cassandra)
    rdfDB Abstract Model RDF (Turtle, NanoSparqlServer)
    json Abstract Model JSON (Files, MongoDB)
    postgres XBRL-US Public SQL (Postgres)
    sqliteDpmDB DPM SQL (SQLite)
Examples:
Store into postgres database using XBRL-US schema:
arelleCmdLine -f c:\temp\test.rss -v
  --disclosureSystem efm-pragmatic-all-years
  --plugins xbrlDB
  --store-to-XBRL-DB
  "myhost.com,8084,pgUserId,pgPassword,test_db,90,postgres"
Store into local MSSQL database (2012+) using SQLEXPRESS instance:
arelleCmdLine -f c:\temp\test.rss -v
  --disclosureSystem efm-pragmatic-all-years
  --plugins xbrlDB 
  --store-to-XBRL-DB
  "localhost\SQLEXPRESS,,sqlLogin,sqlPassword,,90,mssqlSemantic"

Preloaded Database

Here is a preloaded database of SEC filings, as a Postgres pg_dump gzip’ed file. Install pgAdmin III, so that you may down load this file and begin with a full set of SEC filings, word-mined primary documents, and comment letters.
click for download instructions
(DB size 417GB as of 2014-10, does not include UPLOAD comment correspondence)
 

Comments are closed.