Block_Codes

This depository uses SEC EDGAR data in Schedule 13D and Schedule 13G data to find all positions above 5% in all US stocks between 1994 and 2018.

Block_Codes

This GitHub page describes construction of the data in the paper “Is Blockholder Diversity Detrimental?” by Miriam Schwartz-Ziv and Ekaterina Volkova (2020)

The most recent version of the paper is avaliable as SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3621939

Step 1. Download Files.

  • download_forms.R file downloads sc13d/13g files and their amendments and puts them into SQL database.
  • this file downloads the list of all forms for each year from SEC website,
    the only thing you need to specify is a range of years in loop and working directory
  • code is slow and takes up to several hours to complete. To make sure, that I get all posible files,
    I download each file twice from master file for filer and for subject.

Step 2. Extract and Convert Main Filings.

  • extract_body_form.R extracts main filing from complete submission files and convert .htm to plain text format if needed.
  • I put output into another SQL database.

Step 3. Parse SEC Header.

Step 4. Extract CUSIP from the filings.

  • extract_CUSIP.R script returns six and eight digit CUSIP from SEC filings.
  • Output of this part is a CIK-CUSIP map, which could be downloaded in .csv format from my website (www.evolkova.info)

Step 5. Extract size of the block positon.

  • parsing_prc_position.R extracts the aggregate block size from the filing.

Step 6. Extract identity of blockholders.

Step 7. Aggregate information into blockholder-company-year panel

Step 8. Download insider ownership transactions

  • Added in 2022 to improve data accuracy

Step 9. Add missing insider blocks

  • Added in 2022 to improve data accuracy