Let's say you’ve gone through all the resources in Free Big Data Education: A Data Science Perspective here on BDR and you’re wondering what’s next? For those of you who want to go further in the field of data science, here is a list of unstructured educational methods that you can use to expand your knowledge.
Data science blogs
Monitoring blogs, written by experts in the field, is an excellent way to learn what the professionals are doing. Here are some of my favorites:
Conference proceedings
Data science conferences are happening all around the world. Although attending these events may incur significant costs, many times their proceedings' materials are made available for free, after-the-fact. Take, for example, Salford Systems, which offers a library of conference videos on a variety of data-science-related topics. I’ve watched many of them and they are top-rate.
Keeping up with academia
Keeping a close eye on academia is an important way for you to embrace leading trends in data science. Many researchers at universities and think-tanks produce a broad array of new techniques all the time, so it is a good idea to see what’s coming.
The Association for Computing Machinery (ACM) is the computer industry’s venerable professional society, and it maintains a number of relevant special interest groups (SIGs) such as ACM SIGKDD (Knowledge Discovery in Databases), which publishes an excellent journal “SIGKDD Explorations” that I greatly enjoy. SIGEVO (Genetic and Evolutionary Computation) is another favorite of mine for its special focus on evolutionary algorithms.
ACM and its SIGs require membership fees, but many of the research papers on machine learning destined for ACM journals first appear on the arXiv.org pre-print server. This is a tremendous resource that I can’t recommend highly enough. You should monitor the recent list every few days to be sure nothing good slips past.
Microsoft Research is another superb academic resource. The company maintains machine learning groups around the world. They attract the best and brightest talent and offer a broad range of research results.
Also, be sure to check out the Journal of Machine Learning Research, a free web-based publication sponsored by MIT.
Social media
A great way to learn more data science is to follow select data scientists on Twitter to get leads on the latest techniques and research. I follow a variety of people on Twitter and frequently see links to papers. Just use the Twitter search feature with keywords like “data science,” “big data,” “machine learning,” etc. One of my favorites is @kdnuggets, highly recommended. Some data scientists tweet from conferences, so you can benefit from monitoring their tweets to get the latest breaking information without actually attending.
Data repositories
A good way to continue your data science education is to examine data, all sorts of data, and then develop algorithms for classification, prediction, clustering, etc. Here is a list of free and open data repositories:
Competitions
Finally, a great way to gain real-world experience and learn a lot along the way is to participate in a data science competition. Kaggle is arguably the best known resource and offers a number of competitions running at any particular time. You’re given a complete description of the problem to be solved, training data sets, and an online forum to discuss the project with others. Maybe the best thing about the competitions is that many have prize money. For example, the Heritage Health Network competition has a grand prize of $3 million. You can start with one of the practice competitions for valuable experience, and then try a real one later. Other competition sites are DataKind and TunedIT.
The field of data science is running at such a frenetic pace, resources like those mentioned in this article will morph over time. Please be sure to share your favorite resources here.
Related posts:
— Daniel D. Gutierrez, Data Scientist, Amulet Analytics