Java/Scala Big Data
221 River Street, 8th Floor Hoboken, NJ 07030
What you will do:
- Play a pivotal design and hands on implementation role in improving the Data infrastructure in a project-oriented work environment.
- Influence cross functional architecture in sprint planning
- Gather and process raw data at scale - collect data across all business domains (our functional-first, event sourced, micro services backend) and expose mechanisms for large scale parallel processing
- Design, implement and manage a near real-time ingestion pipeline into a data warehouse and Hadoop data lake.
- Process unstructured data into a form suitable for analysis and then empower state-of-the-art analysis for analysts, scientists, and APIs
- Build efficient new Data Models and refactor existing ones. Partner with business to build right data models and analytics capabilities.
- Solve complex SQL and Big Data Performance challenges.
- Mitigate Risks in our data infrastructure by developing the best in class tools and processes.
- Implement controls, policies, processes and best practices in the Data Engineering space.
- Evangelize an extremely high standard of code quality, system reliability, and performance.
- Help us improve our database deployment and change management process.
- Provide reliable and efficient Data services as part of the database team.
- Work closely with the devs on development best practices and standards.
- Be a mentor.
What you will have:
- Degree in Computer Science, Information Technology, Math or related technical field.
- Natural inclination for designing well thought out data solutions as well as solid hands-on implementation capability
- 5+ years’ experience in engineering data solutions using technologies including Spark (batch/streaming), Scala/Java (building data pipelines vs frameworks), Hadoop, Hive, HBase, Kafka, Spark, Oozie, Yarn.
- Solid hands-on experience in building data pipelines, deploying and managing Big Data infrastructure, establishing deployment and operational excellence of Big Data clusters.
- Experience in Relational databases, Data Warehousing, SQL, ETL and/or NOSQL databases will be sweet.
- Proven experience in Building or improving large scale data infrastructure from the ground up. This could include building Data warehousing stores, Data Architecture, Data Integration, building highly performant data movement pipelines, building tools and automation to facilitate Data and operational governance, Data Lineage, Automation and Monitoring.
- Proven Performance Tuning Accomplishments and Advanced tuning and trouble shooting skills in Large data environments on the Hadoop or RDBMS stack.
- Advanced knowledge of internals of at least one RDBMS (MSSQL preferred) and best practices.
- Data Ops experience including hands on skills in scripting language such as Python, Perl or Bash
- Proven experience in trouble shooting, problem determination and rapid problem resolution
- Experience and ability to work under high pressure in a complex technical environment.