Block_Codes
This GitHub page describes construction of the data in the paper “Is Blockholder Diversity Detrimental?” by Miriam Schwartz-Ziv and Ekaterina Volkova (2020)
The most recent version of the paper is avaliable as SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3621939
Step 1. Download Files.
- download_forms.R file downloads sc13d/13g files and their amendments and puts them into SQL database.
- this file downloads the list of all forms for each year from SEC website,
the only thing you need to specify is a range of years in loop and working directory
- code is slow and takes up to several hours to complete. To make sure, that I get all posible files,
I download each file twice from master file for filer and for subject.
Step 2. Extract and Convert Main Filings.
- extract_body_form.R extracts main filing from complete submission files and convert .htm to plain text format if needed.
- I put output into another SQL database.
Step 3. Parse SEC Header.
Step 4. Extract CUSIP from the filings.
- extract_CUSIP.R script returns six and eight digit CUSIP from SEC filings.
- Output of this part is a CIK-CUSIP map, which could be downloaded in .csv format from my website (www.evolkova.info)
Step 5. Extract size of the block positon.
- parsing_prc_position.R extracts the aggregate block size from the filing.
Step 6. Extract identity of blockholders.
Step 7. Aggregate information into blockholder-company-year panel
Step 8. Download insider ownership transactions
- Added in 2022 to improve data accuracy
Step 9. Add missing insider blocks
- Added in 2022 to improve data accuracy