generated from hackforla/.github-hackforla-base-repo-template
-
-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Labels
epicfeature: missingthis tags is mutually exclusive with project: missing. Please use the correct labelthis tags is mutually exclusive with project: missing. Please use the correct labelmilestone: missingrole: data analysissize: epicsize: missingtime sensitive
Description
Overview
Project: Open Community Survey
Volunteer Opportunity: Create scraper to get information from builtwith.com on technologies used by neighborhood council websites. Organize the data (create categories for the tech), Automate scrape job to run periodically. Additionally, we want to display this information with a dashboard (see Google Data Studio Dashboard linked below under "Project Output" for an example).
Contact: Ryan Swan (data science), Kaylani (open community survey) Bonnie
Action Items
- Create a wiki page
- Build a scraper that can we reuse to get the data on the NC site technologies
- Add the scripts and other code to the data science repo or if another repo is required, let the leads know.
- Create a Spreadsheet from the results of initial scrape
-
Create a set of categories in the spreadsheet - Rework the script to grab category as well as technology (its available in the API)
- Add category to each technology, so that the data can be grouped and analyzed will happen automatically prior item is done
- Assess code for current scraper to determine if it still functions properly
- Perform additional analysis on Widget technology category: Which sites are using calendars? What are the calendars used for (events of the NC or local events)? Which sites use chatbots? Which sites have search functionality? How many sites use translation widgets?
- Finish analysis of following technology categories: Content Management system (cms), Mobile, SSL, Payment, Framework, and Copyright
- Fix directory issues with code. Currently, it's in the 311 directory but needs to be moved to the open community survey directory.
-
Create a reusable matching table of technology to category -
Create a script to be able to create a new spreadsheet with the matching table so that the technologies are already categorized (except of course the ones that are new). -
create instructions for updating matching table and running scripts. - make sure wiki is updated.
- release dependency on - Conduct Analysis of 99 NC Features open-community-survey#28
Resources/Instructions
External Tools
- Builtwith
- https://builtwith.com
- builtwith API
- API limitations: Some sites, are resistant to being crawled (WordPress, for instance https://atwatervillage.org/calendar/). So what we need is a list of all the sites that can't be put through the sitemap maker. See notes about WordPress site crawling: https://community.funnelback.com/knowledge-base/implementation/Gather-And-Index/integration/crawl-wordpress-sites
- Selenium
- Docker
Tutorial
Project input (data)
- Target Website List Here - this is one tab on a larger analysis workbook.
Project output
- Data Science wiki, 99 NC project
- Spreadsheet of Rajinder's script results
- Spreadsheet of updated script results
- Example of Google Data Studio Dashboard
Rajinder's code
- code on data-science repo with Rajinder + Willa's code - this will need to be moved to another directory. It has nothing to do with 311. Its a project for Open Community Survey.
- Rajinder's personal repo - this seems to be updated more recently than the one on data-science.
Current presentation
OCS: Tech usage insights NCs
Analytics Analysis Workbook
Widgets Analysis Workbook
Related issues from OCS
Past Collaborators:
@akibrhast, @ava li, @Sarah Williams, @wendywilhelm10 @rajindermavi @ShikaZzz @JessicaFB @Poorvi Rao
Metadata
Metadata
Assignees
Labels
epicfeature: missingthis tags is mutually exclusive with project: missing. Please use the correct labelthis tags is mutually exclusive with project: missing. Please use the correct labelmilestone: missingrole: data analysissize: epicsize: missingtime sensitive
Type
Projects
Status
In progress (actively working)
Status
Filled