Document Sourcing Specialist | $20-$66/hr Remote
Overview
This role involves sourcing publicly available documents from government archives, academic repositories, and open datasets to train next-generation AI systems. The specialist verifies license types, logs metadata, and collaborates with data engineering and compliance teams to ensure adherence to open-source licensing requirements. No prior AI experience is required; domain expertise in document sourcing and licensing is key.
What You'll Do7
- 1Source publicly available documents from platforms such as government archives, academic repositories, open datasets, and licensed open-source documentation.
- 2Verify and document the license type of every sourced document, ensuring strict adherence to requirements such as CC0, CC-BY, MIT, or Apache 2.0.
- 3Log critical metadata for each submission, including source URLs and full license details, in designated tracking tools.
- 4Flag and annotate any issues related to ownership, unclear licensing, paywalled access, or content with non-commercial usage restrictions.
- 5Collaborate with data engineering and compliance teams to clarify requirements and resolve sourcing ambiguities.
- 6Maintain up-to-date knowledge of open data best practices, licensing changes, and repository navigation strategies.
- 7Communicate findings and unresolved issues clearly in both written and verbal form, supporting documentation integrity and compliance audits.
Requirements7
- 1Exceptional attention to detail and ability to accurately review complex licensing and compliance information.
- 2Experience sourcing documents from repositories such as SEC EDGAR, arXiv, Kaggle, and GitHub.
- 3Proficiency in academic research, data collection, and public records searching.
- 4Strong written and verbal communication skills, able to articulate findings and collaborate remotely.
- 5Demonstrated ability to distinguish between open and restricted content, and to identify potential sourcing risks.
- 6Comfort working independently in a fast-paced, remote environment with evolving priorities.
- 7Highly organized, reliable, and adept at managing and documenting large volumes of information.
Who Should Apply
Ideal candidates have strong domain knowledge in document sourcing, licensing, and compliance, with exceptional attention to detail. They are comfortable working independently in a remote environment and have experience with repositories like SEC EDGAR, arXiv, Kaggle, or GitHub. Prior experience in academic research, information science, or legal review is a plus, and no AI background is needed.
Salary Insight
The role offers $20-$66 per hour, based on experience and location.
Required Skills
Application Tip
Highlight your experience with specific public repositories (e.g., SEC EDGAR, arXiv, Kaggle, GitHub) and your ability to verify licenses (CC0, MIT, Apache 2.0) to demonstrate attention to detail and compliance expertise.