Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete Project #1

Open
erolrecep opened this issue Oct 21, 2019 · 2 comments
Open

Complete Project #1

erolrecep opened this issue Oct 21, 2019 · 2 comments

Comments

@erolrecep
Copy link
Collaborator

Why don't we have a project that does every features on the tensorflow?

  • tf-record
  • data augmentation
  • training (later on hypterparameter-optimization)
  • monitoring training with tensorboard.
  • inference/validation while one set of data is being trained.
  • visualize result of validation set on tensorboard.
  • save best model weights and put into a web application.

We can do all these steps for the easiest CNN algorithm, LeNet5 with MNIST dataset.

@jkmackie
Copy link
Owner

jkmackie commented Oct 22, 2019 via email

@jkmackie
Copy link
Owner

jkmackie commented Nov 6, 2019

car_pricing_v2 is uploaded to Github. It replaces both v1 AND explore_json.

Project Scope: I did regression models on Honda for Sale By OWNER in Houston. There are 315 samples -- 70% train and 30% test (stratified by model Ex. Accord, CR-V, Civic).

The main modeling issue is we need more data. Here are proposed ways to get more:

  1. Pick for sale by all (owner or dealer) rather than just owner. Dealer listings are most common.
  2. Combine regions. For example, combine Houston with College Station and Galveston?
  3. Pick a more common manufacturer like Ford. But, will each manufacturer model be more common?

If we do (1), for sale by Owner vs Dealer will need to be a feature. This can be parsed from the vehicle url (Ex. houston.craigslist.org/cto vs houston.craigslist.org/ctd). Alternatively, we could do for sale by Dealer only, which is more common than Owner.

If we do (2), any regional pricing differences will be commingled.

If we do (3), I'll need to update the data scrubbing pipeline to see if Ford models are more common than Honda models.

There are 3,000 Fords and 1,307 Hondas for sale by owner/dealer in Houston. This is the route I'm inclined to try. We can always dump owner or dealer listings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants